Python Regular Expressions - cheat sheet

Python Regular Expressions - cheat sheet

Many code examples + useful tips

Regular expressions is an extremely useful tool, and like any developer, I use it a lot when working with texts. Since I always forget the syntax related to regular expressions, I thought creating a simple cheat sheet might help me and maybe others as well :)

Python’s re most useful methods and differences between them

Reminder: re module implements Regular Expressions functionality in Python.

2.png

  • re.match: matches a regex pattern to the beginning of a string (returned: re.Match)

3.png

  • re.fullmatch: matches a regex pattern to the whole string (returned: re.Match)

4.png

  • re.search: searches a string for the presence of a regex pattern (returned: re.Match )

5.png

  • re.sub: substitutes occurrences of a regex pattern found in a string, searches for the pattern like in re.search (returned: string)

6.png

  • re.findall: searches for all occurrences of a regex pattern in a string, searches for the pattern like in re.search (returned: list of strings)

7.png

  • re.split: splits a string by the occurrences of a regex pattern (returned: list of strings). ❗️note the empty strings returned ❗️

8.png

Special meta-characters

9.png

10.png

Repetition meta-characters

A table below summarizes the amount of matches of (greedy) repetitions re module performs for the preceding Regular Expression. Greedy means that re module will match as many repetitions as possible.

11.png

12.png

Useful characters classes

13.png

Extremely useful tip: grouping and groups extraction

Grouping allows you not only to match text sequences inside strings but also to extract sub-sequences according to groups you define in a pattern.

You can define groups in a pattern by using round brackets — (), and you can extract the groups from the matched sequences by calling group() method on re.Match object.

For example, let’s say we want to match an email in a text, but we also want to easily extract the username, the domain, and the extension out of the email. So if we get the following text: “”, we want (1) to detect that this is an email (according to a pattern) and (2) to detect that the username in this email is “abc”, domain name is “gmail”, and the extension in “com”. First, let’s define a simple pattern that will detect this email. Note, I will use a simplified pattern that assumes that there are solely alphanumeric chars in each of the email component, which is not true in a real life, but it will work for our grouping example.

14.png Now, I will add groups to the pattern. Technically, I will just add brackets around different parts of my pattern: username, domain, extension:

15.png

We are now ready to match our pattern:

16.png And here comes the magic of groups! We can not only get the matched text, but also extract separate groups by group index as it has been defined in the pattern, just like this:

17.png Nice, ah?

By the way, group(0) will return string that represents the whole match. In our case: “

That’s all for now, folks. Happy programming!