Regular expressions (RegEx) are patterns used to match character combinations in strings. Python's re module provides support for these powerful patterns.
To use regular expressions in Python, you need to import the re module:
import re
Metacharacters are characters with special meanings in regular expressions:
• .: Matches any character except a newline.
• ^: Matches the start of the string.
• $: Matches the end of the string.
• *: Matches zero or more occurrences of the preceding pattern.
• +: Matches one or more occurrences of the preceding pattern.
• ?: Matches zero or one occurrence of the preceding pattern.
• {}: Matches a specific number of occurrences of the preceding pattern.
• []: Matches any one of the characters inside the brackets.
• |: Matches either the pattern before or the pattern after the pipe.
Special sequences are shorthand representations for common patterns:
• \d: Matches any digit (equivalent to [0-9]).
• \D: Matches any non-digit character (equivalent to [^0-9]).
• \s: Matches any whitespace character (equivalent to [\t\n\r\f\v]).
• \S: Matches any non-whitespace character (equivalent to [^ \t\n\r\f\v]).
• \w: Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
• \W: Matches any non-alphanumeric character (equivalent to [^a-zA-Z0-9_]).
• \A: Matches the start of the string.
• \b: Matches the position between a word and a non-word character.
• \B: Matches the position where \b does not match.
• \Z: Matches the end of the string.
Here are some key functions provided by the re module:
• Compiles a regular expression pattern into a regex object, which can be used for matching.
pattern = re.compile(r'\d+')
• Matches the pattern at the beginning of the string.
• Returns a match object or None.
result = re.match(r'\d+', '123abc')
• Searches the entire string for the first occurrence of the pattern.
• Returns a match object or None.
result = re.search(r'\d+', 'abc123def')
• Matches the entire string against the pattern.
• Returns a match object or None.
result = re.fullmatch(r'\d+', '123')
• Splits the string by occurrences of the pattern.
• The maxsplit parameter controls the number of splits.
result = re.split(r'\W+', 'Hello, world! Python is awesome.')
• Finds all non-overlapping occurrences of the pattern in the string.
• Returns a list of matches.
result = re.findall(r'\d+', 'abc123def456')
• Returns an iterator yielding match objects for all non-overlapping matches.
for match in re.finditer(r'\d+', 'abc123def456'):
print(match.group())
• Replaces occurrences of the pattern with repl.
• The count parameter limits the number of replacements.
result = re.sub(r'\d+', 'NUMBER', 'abc123def456')
• Similar to sub(), but returns a tuple (new_string, number_of_substitutions).
result = re.subn(r'\d+', 'NUMBER', 'abc123def456')
• Escapes all special characters in the pattern to be treated as literals.
escaped_pattern = re.escape('Hello. How are you?')
• Clears the regular expression cache.
re.purge()