Regex for email validation
Let us explore how to validate if an email address is correctly formatted. Although humans can often identify this at a glance, validating it programmatically is a bit more complex.
- β
this@is.not@a.valid@email?
is an invalid email address. - β
name@domain.es
is a valid email address.
To solve this problem, we will use regular expressions. These allow us to search for patterns in text. For example, the following code detects numbers in a string. As you can see, 2
and 3
are detected.
import re
r1 = re.findall(r"\d+", "I have 2 cats and 3 dogs")
print(r1) # ['2', '3']
We can also replace text. In this case, we replace 2
and 3
with two
and three
, respectively.
r2 = re.sub(r"2", "two", "I have 2 cats and 3 dogs")
r2 = re.sub(r"3", "three", r2)
print(r2)
# I have two cats and three dogs
We can also validate that a text complies with certain rules. For example, we know that an email like username@gmail.com
is composed of:
- π€ A
username
. - π Followed by
@
. - π’ A domain like
gmail
. - π A TLD such as
.com
.
We can write an expression that tells us whether an email address is valid as follows:
def validate_email(email):
user = r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+"
at = r"@"
domain = r"[a-zA-Z0-9-]+"
tld = r"(?:\.[a-zA-Z0-9-]+)*"
email_regex = rf"^{user}{at}{domain}{tld}$"
return re.match(email_regex, email) is not None
print(validate_email("juan@gmail.com")) # True
print(validate_email("juan@gmail.com.ar")) # True
print(validate_email("653fsaasd")) # False
print(validate_email("juan@gma@.dom.com")) # False
print(validate_email("@@@@")) # False
Let us break it down:
- π€
user
: Numbers, lowercase, uppercase, and some special characters are allowed for the username. For example,[a-zA-Z0-9]+
indicates that numbers, lowercase, and uppercase letters are allowed. The+
indicates that we are looking for one or more such characters. The rest, such as!
or$
, indicates that these characters are also allowed. - π
at
: We ensure the@
is present in the correct place. - π’
domain
: We require a domain with numbers, lowercase, uppercase, and hyphens. - π
tld
: We require it to start with a.
and allow multiple. For example,.com
andco.uk
are valid.
βοΈ Exercises:
- Modify the function to allow only
.com
domains. - Modify the function to disallow
@google
domains.