Regular expressions (regex or regexp) are powerful tools for pattern matching within strings. Python’s re
module offers robust functionality for regex operations, with wildcards playing a crucial role. This article explores how to effectively use wildcards with the re.sub()
function for various string manipulation tasks.
Table of Contents
- Basic Regex Substitutions with Wildcards
- Advanced Wildcard Usage and Quantifiers
- Combining Wildcards for Complex Patterns
- Real-World Examples: Email and Phone Number Extraction
- Conclusion
Basic Regex Substitutions with Wildcards
The re.sub()
function is fundamental for regex substitutions. Its syntax is re.sub(pattern, replacement, string, count=0, flags=0)
. The pattern
is a regular expression, replacement
is the substituting string, string
is the input, count
limits substitutions, and flags
modify matching behavior. Wildcards dramatically enhance the pattern
‘s flexibility.
Let’s replace all vowels in a string with “X”:
import re
text = "Hello, World!"
replaced_text = re.sub(r"[aeiou]", "X", text, flags=re.IGNORECASE)
print(f"Original: {text}")
print(f"Replaced: {replaced_text}")
[aeiou]
is a wildcard character set matching any vowel (case-insensitive due to re.IGNORECASE
).
Advanced Wildcard Usage and Quantifiers
re.sub()
supports complex wildcards. Let’s replace sequences of one or more digits with “NUMBER”:
import re
text = "My phone number is 123-456-7890 and my zip code is 90210."
replaced_text = re.sub(r"d+", "NUMBER", text)
print(f"Original: {text}")
print(f"Replaced: {replaced_text}")
d+
matches one or more digits (d
matches a digit, +
signifies one or more repetitions).
Here’s a table summarizing key wildcards:
Wildcard | Description |
---|---|
. |
Matches any character except newline. |
* |
Matches zero or more occurrences of the preceding element. |
+ |
Matches one or more occurrences of the preceding element. |
? |
Matches zero or one occurrence of the preceding element. |
[] |
Defines a character set (e.g., [abc] ). |
[^] |
Defines a negated character set (e.g., [^abc] ). |
() |
Creates a capturing group. |
|
Escapes special characters (e.g., . matches a literal dot). |
Combining Wildcards for Complex Patterns
Combining wildcards creates powerful patterns. Let’s replace words starting with “a” followed by any characters:
import re
text = "A apple a day keeps the doctor away."
replaced_text = re.sub(r"aw*", "WORD", text, flags=re.IGNORECASE)
print(f"Original: {text}")
print(f"Replaced: {replaced_text}")
aw*
matches “a” followed by zero or more word characters (w
).
Real-World Examples: Email and Phone Number Extraction
re.sub()
excels at handling complex patterns. Let’s replace email addresses with “EMAIL”:
import re
text = "Contact us at [email protected] or [email protected]."
replaced_text = re.sub(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}", "EMAIL", text)
print(f"Original: {text}")
print(f"Replaced: {replaced_text}")
This regex matches a common email format.
Conclusion
The re.sub()
function, combined with regex wildcards, offers a flexible and efficient method for string manipulation in Python. Mastering these techniques is valuable for text processing and data cleaning tasks. Careful regex construction is crucial to avoid unintended replacements. Experimentation and understanding wildcard nuances are key to effective string manipulation.