Regular Expressions in Python

Regular Expressions (RegEx) are powerful tools used to search, match, and manipulate text based on specific patterns.

Python provides the re module to work with regular expressions efficiently β€” making tasks like validation, parsing, and data cleaning much easier.

Introduction to the re Module

To use regular expressions, you must import Python's built-in re module:

import re
The re module provides several key functions for searching and manipulating strings.

Basic Functions in re Module

Function Description
re.search() Searches for the first occurrence of a pattern
re.match() Checks for a match only at the beginning of the string
re.findall() Returns all matches as a list
re.finditer() Returns an iterator of match objects
re.sub() Replaces matches with a new string
re.split() Splits string by the occurrences of a pattern

Searching for Patterns

import re

text = "Python is amazing!"
pattern = "amazing"

match = re.search(pattern, text)
if match:
    print("Pattern found at position:", match.start())
else:
    print("Pattern not found.")
re.search() scans the entire string for the pattern.
The 'match' object contains details like start and end positions.

Matching at the Beginning

import re

text = "Python is fun"
if re.match("Python", text):
    print("Starts with Python")
re.match() only checks the start of the string.
If the pattern appears later, it won't match.

Finding All Occurrences

import re

text = "cat, bat, mat, rat"
result = re.findall(r"bat", text)
print(result) # Output: ['bat']
Returns a list of all matches that exactly match the word β€œbat”.

Using re.sub() to Replace Text

import re

text = "I love Python!"
new_text = re.sub("Python", "Java", text)
print(new_text)  # I love Java!
re.sub() replaces all occurrences of the pattern with the provided string.

Using re.split()

import re

text = "apple,banana;cherry orange"
result = re.split(r"[,; ]", text)
print(result)  # ['apple', 'banana', 'cherry', 'orange']
Splits a string using multiple delimiters.

Special Characters and Meta Characters

Symbol Meaning Example
. Any character except newline a.b β†’ matches acb, a1b
^ Start of string ^Hello β†’ matches if text starts with "Hello"
$ End of string world$ β†’ matches if text ends with "world"
* 0 or more occurrences a*b β†’ matches b, ab, aab
+ 1 or more occurrences a+b β†’ matches ab, aab
? 0 or 1 occurrence colou?r β†’ matches color, colour
{n} Exactly n occurrences a{3} β†’ matches aaa
{n, m} Between n and m occurrences a{2,4} β†’ matches aa, aaa, aaaa
[] Character set [aeiou] β†’ matches any vowel
\d Digit Matches 0-9
\w Word character Matches letters, digits, and underscores
\s Whitespace Matches space, tab, newline

Example: Email Validation

import re

email = "example123@gmail.com"
pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"

if re.match(pattern, email):
    print("Valid email")
else:
    print("Invalid email")
Checks if the email format is valid.
Uses meta characters like ^ (start), $ (end), and \w (word characters).

Using Raw Strings (r"")

Regular expressions often contain backslashes (\), which Python interprets as escape characters. To avoid confusion, use raw strings with prefix r.

Example:

pattern = r"\d{3}-\d{2}-\d{4}"
Without r, you would need to write "\\d{3}-\\d{2}-\\d{4}".

Summary

Regular expressions are indispensable for pattern-based text processing β€” from validating input forms to cleaning and transforming datasets. Python's re module gives you immense control and efficiency when working with strings.

Function Purpose
re.search() Search for pattern anywhere
re.match() Match only at the start
re.findall() Return all matches as list
re.sub() Replace text
re.split() Split text using regex
Meta Characters Define flexible pattern rules
Raw Strings (r"") Prevent escape conflicts
In the next article, we'll explore Object-Oriented Programming (OOP) in Python β€” the foundation of structured and reusable code.
Share this Article