Understanding Python Regular Expressions
Regular expressions (regex) are powerful tools for pattern matching and manipulation of text in Python. They allow you to search, extract, and manipulate strings based on specific patterns or rules. Python provides a built-in module called re for working with regular expressions.
Basic Syntax
The syntax of regular expressions in Python consists of a combination of characters and metacharacters that define a pattern. Here are some commonly used metacharacters:
- . – Matches any character except a newline.
- ^ – Matches the start of a string.
- $ – Matches the end of a string.
- * – Matches zero or more occurrences of the preceding character.
- + – Matches one or more occurrences of the preceding character.
- ? – Matches zero or one occurrence of the preceding character.
- {} – Matches a specific number of occurrences of the preceding character.
- [] – Matches any one of the characters inside the square brackets.
- – Escapes special characters or indicates special sequences.
Example 1: Matching Phone Numbers
Let’s say we want to extract phone numbers from a text. We can use regular expressions to define the pattern of a phone number and search for matches.
import re text = "Please contact us at 123-456-7890 or 987-654-3210 for any inquiries." pattern = r"d{3}-d{3}-d{4}" matches = re.findall(pattern, text) for match in matches: print(match)
In this example, we define the pattern d{3}-d{3}-d{4}
to match phone numbers in the format of three digits, followed by a hyphen, followed by three digits, another hyphen, and finally four digits. The d
represents any digit.
The re.findall()
function returns a list of all matches found in the text. In this case, it will print:
123-456-7890 987-654-3210
Example 2: Extracting Email Addresses
Regular expressions can also be used to extract email addresses from a text. Here’s an example:
import re text = "Please email us at info@example.com or support@example.com for any assistance." pattern = r"b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}b" matches = re.findall(pattern, text) for match in matches: print(match)
In this example, we define the pattern b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}b
to match email addresses. The pattern consists of three parts:
[A-Za-z0-9._%+-]+
– Matches one or more alphanumeric characters, dots, underscores, percent signs, plus signs, or hyphens.@
– Matches the at symbol.[A-Za-z0-9.-]+.[A-Za-z]{2,}
– Matches one or more alphanumeric characters, dots, or hyphens, followed by a dot and two or more alphabetic characters.
The output of this code will be:
info@example.com support@example.com
Example 3: Replacing Text
Regular expressions can also be used to replace specific patterns in a text. Let’s say we want to replace all occurrences of the word “apple” with “orange” in a sentence:
import re text = "I have an apple, but I prefer oranges." pattern = r"apple" replaced_text = re.sub(pattern, "orange", text) print(replaced_text)
In this example, we use the re.sub()
function to replace all occurrences of the pattern “apple” with the word “orange”. The output will be:
I have an orange, but I prefer oranges.
Conclusion
Python regular expressions are a powerful tool for pattern matching and manipulation of text. They allow you to search, extract, and replace strings based on specific patterns or rules. By understanding the basic syntax and using the appropriate metacharacters, you can perform complex text operations with ease.