Regex for email validation
Let us explore how to validate if an email address is correctly formatted. Although humans can often identify this at a glance, validating it programmatically is a bit more complex.
- β
this@is.not@a.valid@email?is an invalid email address. - β
name@domain.esis a valid email address.
To solve this problem, we will use regular expressions. These allow us to search for patterns in text. For example, the following code detects numbers in a string. As you can see, 2 and 3 are detected.
import re
r1 = re.findall(r"\d+", "I have 2 cats and 3 dogs")
print(r1) # ['2', '3']
We can also replace text. In this case, we replace 2 and 3 with two and three, respectively.
r2 = re.sub(r"2", "two", "I have 2 cats and 3 dogs")
r2 = re.sub(r"3", "three", r2)
print(r2)
# I have two cats and three dogs
We can also validate that a text complies with certain rules. For example, we know that an email like username@gmail.com is composed of:
- π€ A
username. - π Followed by
@. - π’ A domain like
gmail. - π A TLD such as
.com.
We can write an expression that tells us whether an email address is valid as follows:
def validate_email(email):
user = r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"
at = r"@"
domain = r"[a-zA-Z0-9-]+"
tld = r"(?:\.[a-zA-Z0-9-]+)*"
email_regex = rf"^{user}{at}{domain}{tld}$"
return re.match(email_regex, email) is not None
print(validate_email("juan@gmail.com")) # True
print(validate_email("juan@gmail.com.ar")) # True
print(validate_email("653fsaasd")) # False
print(validate_email("juan@gma@.dom.com")) # False
print(validate_email("@@@@")) # False
Let us break it down:
- π€
user: Numbers, lowercase, uppercase, and some special characters are allowed for the username. For example,[a-zA-Z0-9]+indicates that numbers, lowercase, and uppercase letters are allowed. The+indicates that we are looking for one or more such characters. The rest, such as!or$, indicates that these characters are also allowed. - π
at: We ensure the@is present in the correct place. - π’
domain: We require a domain with numbers, lowercase, uppercase, and hyphens. - π
tld: We require it to start with a.and allow multiple. For example,.comandco.ukare valid.
βοΈ Exercises:
- Modify the function to allow only
.comdomains. - Modify the function to disallow
@googledomains.