Regular expressions (regex) are powerful tools for pattern matching and text manipulation in Python. They enable developers to search, match, and replace text efficiently. In Python, the re
module provides the necessary functions to work with regex, including the re.sub()
function for substitution operations.
Using re.sub()
for String Replacement
The re.sub()
function is used to replace occurrences of a pattern within a string. Its syntax is:
codere.sub(pattern, replacement, string, count=0, flags=0)
pattern
: The regex pattern to search for.replacement
: The string to replace the matched pattern.string
: The input string where the search and replace will occur.count
: The maximum number of replacements. The default value is0
, which means replace all occurrences.flags
: Optional regex flags to modify the behavior of the pattern matching.
Example: Replacing Email Usernames
Suppose you have an email address and want to replace the username part with “abc”:
codeimport re
email = "[email protected]"
new_email = re.sub(r"[a-z]*@", "abc@", email)
print(new_email)
Output:
codeabc@gmail.com
In this example, the pattern [a-z]*@
matches any lowercase letters followed by the “@” symbol. The re.sub()
function replaces this pattern with “abc@”.
Replacing Multiple Patterns
To replace multiple patterns simultaneously, you can use the |
operator (which denotes logical OR) within the regex pattern.
Example: Replacing Hyphens and Spaces with Commas
codeimport re
text = "Joe-Kim Ema Max Aby Liza"
new_text = re.sub(r"(\s)|(-)", ", ", text)
print(new_text)
Output:
codeJoe, Kim, Ema, Max, Aby, Liza
Here, the pattern (\s)|(-)
matches any whitespace character \s
or a hyphen -
. The re.sub()
function replaces each match with “, “.
Replacing Multiple Patterns with Different Replacements
If you need to replace different patterns with distinct replacements, you can define a function to determine the replacement based on the matched pattern.
Example: Converting Case Based on Pattern
codeimport re
def convert_case(match_obj):
if match_obj.group(1) is not None:
return match_obj.group(1).lower()
if match_obj.group(2) is not None:
return match_obj.group(2).upper()
text = "jOE kIM mAx ABY lIzA"
new_text = re.sub(r"([A-Z]+)|([a-z]+)", convert_case, text)
print(new_text)
Output:
codejoe KIM max aby LIZA
In this example, the convert_case
function checks which group matched:
- If
group(1)
(uppercase letters) matched, it converts the text to lowercase. - If
group(2)
(lowercase letters) matched, it converts the text to uppercase.
The re.sub()
function then applies this conversion to each match in the input string.
Important Considerations
- Regex vs. String Methods: The
str.replace()
method does not support regex patterns. For regex-based replacements, always usere.sub()
. - Regex Flags: Utilize flags like
re.IGNORECASE
to modify the behavior of the pattern matching. - Escaping Special Characters: When your pattern includes special characters, ensure they are properly escaped to avoid unintended behavior.
By leveraging the re
module and its re.sub()
function, you can perform complex search and replace operations in Python, enhancing your text processing capabilities.