Python Regex: Replace Pattern in a string using re.sub()

Regular expressions (regex) are powerful tools for pattern matching and text manipulation in Python. They enable developers to search, match, and replace text efficiently. In Python, the re module provides the necessary functions to work with regex, including the re.sub() function for substitution operations.

Using re.sub() for String Replacement

The re.sub() function is used to replace occurrences of a pattern within a string. Its syntax is:

 codere.sub(pattern, replacement, string, count=0, flags=0)
  • pattern: The regex pattern to search for.
  • replacement: The string to replace the matched pattern.
  • string: The input string where the search and replace will occur.
  • count: The maximum number of replacements. The default value is 0, which means replace all occurrences.
  • flags: Optional regex flags to modify the behavior of the pattern matching.

Example: Replacing Email Usernames

Suppose you have an email address and want to replace the username part with “abc”:

codeimport re

email = "[email protected]"
new_email = re.sub(r"[a-z]*@", "abc@", email)
print(new_email)

Output:

 codeabc@gmail.com

In this example, the pattern [a-z]*@ matches any lowercase letters followed by the “@” symbol. The re.sub() function replaces this pattern with “abc@”.

Replacing Multiple Patterns

To replace multiple patterns simultaneously, you can use the | operator (which denotes logical OR) within the regex pattern.

Example: Replacing Hyphens and Spaces with Commas

codeimport re

text = "Joe-Kim Ema Max Aby Liza"
new_text = re.sub(r"(\s)|(-)", ", ", text)
print(new_text)

Output:

codeJoe, Kim, Ema, Max, Aby, Liza

Here, the pattern (\s)|(-) matches any whitespace character \s or a hyphen -. The re.sub() function replaces each match with “, “.

Replacing Multiple Patterns with Different Replacements

If you need to replace different patterns with distinct replacements, you can define a function to determine the replacement based on the matched pattern.

Example: Converting Case Based on Pattern

 codeimport re

def convert_case(match_obj):
if match_obj.group(1) is not None:
return match_obj.group(1).lower()
if match_obj.group(2) is not None:
return match_obj.group(2).upper()

text = "jOE kIM mAx ABY lIzA"
new_text = re.sub(r"([A-Z]+)|([a-z]+)", convert_case, text)
print(new_text)

Output:

codejoe KIM max aby LIZA

In this example, the convert_case function checks which group matched:

  • If group(1) (uppercase letters) matched, it converts the text to lowercase.
  • If group(2) (lowercase letters) matched, it converts the text to uppercase.

The re.sub() function then applies this conversion to each match in the input string.

Important Considerations

  • Regex vs. String Methods: The str.replace() method does not support regex patterns. For regex-based replacements, always use re.sub().
  • Regex Flags: Utilize flags like re.IGNORECASE to modify the behavior of the pattern matching.
  • Escaping Special Characters: When your pattern includes special characters, ensure they are properly escaped to avoid unintended behavior.

By leveraging the re module and its re.sub() function, you can perform complex search and replace operations in Python, enhancing your text processing capabilities.

Related blog posts