Python provides the re module to work with regular expressions efficiently β making tasks like validation, parsing, and data cleaning much easier.
Introduction to the re Module
To use regular expressions, you must import Python's built-in re module:import re
The re module provides several key functions for searching and manipulating strings.
Basic Functions in re Module
| Function | Description |
|---|---|
| re.search() | Searches for the first occurrence of a pattern |
| re.match() | Checks for a match only at the beginning of the string |
| re.findall() | Returns all matches as a list |
| re.finditer() | Returns an iterator of match objects |
| re.sub() | Replaces matches with a new string |
| re.split() | Splits string by the occurrences of a pattern |
Searching for Patterns
import re
text = "Python is amazing!"
pattern = "amazing"
match = re.search(pattern, text)
if match:
print("Pattern found at position:", match.start())
else:
print("Pattern not found.")
re.search() scans the entire string for the pattern.The 'match' object contains details like start and end positions.
Matching at the Beginning
import re
text = "Python is fun"
if re.match("Python", text):
print("Starts with Python")
re.match() only checks the start of the string.If the pattern appears later, it won't match.
Finding All Occurrences
import re
text = "cat, bat, mat, rat"
result = re.findall(r"bat", text)
print(result) # Output: ['bat']
Returns a list of all matches that exactly match the word βbatβ.
Using re.sub() to Replace Text
import re
text = "I love Python!"
new_text = re.sub("Python", "Java", text)
print(new_text) # I love Java!
re.sub() replaces all occurrences of the pattern with the provided string.
Using re.split()
import re
text = "apple,banana;cherry orange"
result = re.split(r"[,; ]", text)
print(result) # ['apple', 'banana', 'cherry', 'orange']
Splits a string using multiple delimiters.
Special Characters and Meta Characters
| Symbol | Meaning | Example |
|---|---|---|
| . | Any character except newline | a.b β matches acb, a1b |
| ^ | Start of string | ^Hello β matches if text starts with "Hello" |
| $ | End of string | world$ β matches if text ends with "world" |
| * | 0 or more occurrences | a*b β matches b, ab, aab |
| + | 1 or more occurrences | a+b β matches ab, aab |
| ? | 0 or 1 occurrence | colou?r β matches color, colour |
| {n} | Exactly n occurrences | a{3} β matches aaa |
| {n, m} | Between n and m occurrences | a{2,4} β matches aa, aaa, aaaa |
| [] | Character set | [aeiou] β matches any vowel |
| \d | Digit | Matches 0-9 |
| \w | Word character | Matches letters, digits, and underscores |
| \s | Whitespace | Matches space, tab, newline |
Example: Email Validation
import re
email = "example123@gmail.com"
pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"
if re.match(pattern, email):
print("Valid email")
else:
print("Invalid email")
Checks if the email format is valid.Uses meta characters like ^ (start), $ (end), and \w (word characters).
Using Raw Strings (r"")
Regular expressions often contain backslashes (\), which Python interprets as escape characters. To avoid confusion, use raw strings with prefix r.Example:
pattern = r"\d{3}-\d{2}-\d{4}"
Without r, you would need to write "\\d{3}-\\d{2}-\\d{4}".
Summary
Regular expressions are indispensable for pattern-based text processing β from validating input forms to cleaning and transforming datasets. Python's re module gives you immense control and efficiency when working with strings.| Function | Purpose |
|---|---|
| re.search() | Search for pattern anywhere |
| re.match() | Match only at the start |
| re.findall() | Return all matches as list |
| re.sub() | Replace text |
| re.split() | Split text using regex |
| Meta Characters | Define flexible pattern rules |
| Raw Strings (r"") | Prevent escape conflicts |