Link Search Menu Expand Document

Regex for email validation

Let us explore how to validate if an email address is correctly formatted. Although humans can often identify this at a glance, validating it programmatically is a bit more complex.

  • ❌ this@is.not@a.valid@email? is an invalid email address.
  • βœ… name@domain.es is a valid email address.

To solve this problem, we will use regular expressions. These allow us to search for patterns in text. For example, the following code detects numbers in a string. As you can see, 2 and 3 are detected.

import re
r1 = re.findall(r"\d+", "I have 2 cats and 3 dogs")
print(r1) # ['2', '3']

We can also replace text. In this case, we replace 2 and 3 with two and three, respectively.

r2 = re.sub(r"2", "two", "I have 2 cats and 3 dogs")
r2 = re.sub(r"3", "three", r2)
print(r2)
# I have two cats and three dogs

We can also validate that a text complies with certain rules. For example, we know that an email like username@gmail.com is composed of:

  • πŸ‘€ A username.
  • 🌐 Followed by @.
  • 🏒 A domain like gmail.
  • πŸ”— A TLD such as .com.

We can write an expression that tells us whether an email address is valid as follows:

def validate_email(email):
    user = r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"
    at = r"@"
    domain = r"[a-zA-Z0-9-]+"
    tld = r"(?:\.[a-zA-Z0-9-]+)*"

    email_regex = rf"^{user}{at}{domain}{tld}$"

    return re.match(email_regex, email) is not None

print(validate_email("juan@gmail.com"))    # True
print(validate_email("juan@gmail.com.ar")) # True
print(validate_email("653fsaasd"))         # False
print(validate_email("juan@gma@.dom.com")) # False
print(validate_email("@@@@"))              # False

Let us break it down:

  • πŸ‘€ user: Numbers, lowercase, uppercase, and some special characters are allowed for the username. For example, [a-zA-Z0-9]+ indicates that numbers, lowercase, and uppercase letters are allowed. The + indicates that we are looking for one or more such characters. The rest, such as ! or $, indicates that these characters are also allowed.
  • 🌐 at: We ensure the @ is present in the correct place.
  • 🏒 domain: We require a domain with numbers, lowercase, uppercase, and hyphens.
  • πŸ”— tld: We require it to start with a . and allow multiple. For example, .com and co.uk are valid.

✏️ Exercises:

  • Modify the function to allow only .com domains.
  • Modify the function to disallow @google domains.