100 Common Python Mistakes
We present a compilation of the 100 most common mistakes in Python and how to avoid them. We say it is a mistake in any of the following cases:
- 🐝 Bug: A logic flaw or vulnerability.
- 🐍 Pythonic: The code is not idiomatic to Python.
- ⚡ Speed: The code is not efficient.
- 🎯 Simplicity: The code can be simplified.
However, there is no absolute truth. Establish your own criteria based on your specific case. The most important thing is that the code works and solves your problem. In cases where the best approach is debatable, it is better to focus on what works rather than engaging in endless debates that do not solve the real problem.
Get last list item
❌ Although you can access the last item like this:
my_list = [1, 2, 3, 3, 4]
print(my_list[len(my_list)-1]) # 4
✅ It is more Pythonic to do it this way:
print(my_list[-1]) # 4
List append returns None
❌ The append
method returns nothing, None
. It is common to mistakenly think it returns the list with the new element, but it does not.
my_list = [1, 2, 3]
my_list = my_list.append(4)
print(my_list) # None
✅ Access the list
to view it with the new item added.
my_list = [1, 2, 3]
my_list.append(4)
print(my_list) # [1, 2, 3, 4]
List reference instead of copy
❌ Be cautious with the following. When you modify one value in the array, you actually modify multiple. This is because they are not really copies, but references to the same object.
matrix = [[0, 0]] * 2
print(matrix) # [[0, 0], [0, 0]]
matrix[0][0] = 99
print(matrix) # [[99, 0], [99, 0]]
✅ Create your array as follows to ensure they are independent elements and not references to the same object.
matrix = [[0, 0], [0, 0]]
matrix[0][0] = 99
print(matrix) # [[99, 0], [0, 0]]
Confusing append and extend
❌ Do not confuse append
with extend
. The following code fails because extend
is intended for use with lists.
x = [1, 2, 3]
x.extend(4)
print(x) # TypeError
✅ If you want to add a single element, use append
.
x = [1, 2, 3]
x.append(4)
print(x) # [1, 2, 3, 4]
Single value in tuples
❌ If you want to define a single-valued tuple
, this does not work. It is interpreted as a string.
a = ("black")
print(type(a)) # <class 'str'>
✅ You must include a comma. This way, you will have a single-valued tuple
.
a = ("black",)
print(type(a)) # <class 'tuple'>
Use dict instead of if
❌ In some cases, an if
and else
structure can be replaced by a dictionary.
def greeting(language):
if language == 'en':
return 'Hello'
if language == 'es':
return 'Hola'
if language == 'fr':
return 'Bonjour'
else:
return 'Invalid language'
print(greeting('es')) # Hello
✅ Use a dictionary for cleaner code.
def greeting(language):
return {
'en': 'Hello',
'es': 'Hola',
'fr': 'Bonjour'
}.get(language, 'Invalid language')
print(greeting('es')) # Hello
Join two dictionaries
❌ If you want to join two dictionaries, the +
operator is not defined.
dict1 = {'a': 1}
dict2 = {'b': 2}
print(dict1 + dict2) # TypeError: unsupported operand type(s)
✅ You can join them with |
.
dict1 = {'a': 1}
dict2 = {'b': 2}
print(dict1 | dict2) # {'a': 1, 'b': 2}
Conditions order matters
❌ Be careful with the order of the operators. This example will always be True
and is probably not what you are looking for.
if x == 5 or 6:
print("It's 5 or 6")
✅ What you are looking for is this.
if x == 5 or x == 6:
print("It's 5 or 6")
Unnecessary parentheses
❌ There is no need to define two sets of parentheses to check if the age is in the range (12, 19)
.
age = 15
if (age > 12) and (age < 19):
print("He is a teenager")
✅ We can rewrite it as follows, which is easier to read.
age = 15
if 12 < age < 19:
print("He is a teenager")
Condition with boolean
❌ If we have a value
that we want to use as a condition, it is common to do the following, checking if it is True
to act accordingly.
value = True
if value == True:
print("value is True")
✅ But we can simplify it to the following. The same but simpler.
value = True
if value:
print("value is True")
Condition with list
❌ If we want to evaluate if the list is empty, we can use len()
and see if it is 0
.
my_list = []
if len(my_list) == 0:
print("List is empty")
✅ However, it is not necessary. We can do the following. It is more Pythonic.
my_list = []
if not my_list:
print("List is empty")
The ternary operator
❌ If we want to write a function that returns whether a number is even or not, we can do it like this.
def is_even(a):
if a % 2 == 0:
return "Even"
else:
return "Odd"
✅ But we can make use of the ternary operator to summarize everything in one line. Not always better, but worth considering.
def is_even(a):
return "Even" if a % 2 == 0 else "Odd"
Order if conditions
❌ Imagine you have two conditions. One is faster to verify, and the other is slower. We will enter the if
if both conditions are met.
def check_cheap():
print("Execute check_cheap")
return False
def check_expensive():
print("Run check_expensive")
return True
if check_expensive() and check_cheap():
print("Enter")
# Run check_expensive
# Run check_cheap
✅ It is better to run the check_cheap
first. If this is False
, Python will no longer run the other check, saving us from running check_expensive
.
if check_cheap() and check_expensive():
print("Enter")
# Run check_cheap
The walrus operator
❌ If we have a result
and we want to use it as a condition, we can do it as follows.
result = my_function()
if result:
print(result)
✅ But thanks to the :=
operator, Python’s walrus operator, we can save a line. The walrus symbolizes the eyes and teeth. Not always better this way, but an interesting resource.
if result := my_function():
print(result)
Use match instead of if
❌ If you have multiple conditions in an if
where several have the same result, do not do this.
def permissions(role):
if role == 1:
return "all"
if role == 2:
return "all"
if role == 3:
return "some"
elif role == 4:
return "none"
else:
return "none"
print(permissions(1)) # all
print(permissions(2)) # all
✅ Better exploit the potential of match
.
def permissions(role):
match role:
case 1 | 2:
return "all"
case 3:
return "some"
case _:
return "none"
Operator priority
❌ As in mathematics, the *
operator is evaluated first.
print(6 + 6 * 2) # 18
✅ If you want the sum to go first, use parentheses.
print((6 + 6) * 2) # 24
Confusing bitwise and logical
❌ Do not confuse the &
operator with the and
operator. If you are looking for the logical operator, do not use &
, as this is the bitwise operator. This is not what you are looking for.
if True & False:
print("Enter")
✅ For the logical operator, use and
.
if True and False:
print("Enter")
Use in operator
❌ If we want to know if a letter is a vowel, we can see if it is any of the vowels.
def is_vowel(letter):
if letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u':
return True
return False
print(is_vowel("a")) # True
print(is_vowel("z")) # False
✅ But we can simplify it as follows. Since a string like aeiou
is iterable, we can use in
.
def is_vowel(letter):
return letter in 'aeiou'
Use iterators instead of indexes
❌ If you want to iterate a list to access all of its elements, do not do it as is done in other programming languages.
names = ['Anna', 'John', 'Peter']
for i in range(len(names)):
print(names[i])
# Anna, John, Peter
✅ It is better to do it with iterators. It is more Pythonic.
names = ['Anna', 'John', 'Peter']
for name in names:
print(name)
# Anna, John, Peter
Iterate elements with indexes
❌ If you want to access each element and also its index, do not do it like in other programming languages.
names = ['Anna', 'John', 'Peter']
index = 0
for name in names:
print(f"index={index} name={name}")
index += 1
✅ Use enumerate
to save the index
variable. Simpler, less error-prone, and of course, more Pythonic.
names = ['Anna', 'John', 'Peter']
for index, name in enumerate(names):
print(f"index={index} name={name}")
# index=0 name=Anna
# index=1 name=John
# index=2 name=Peter
Iterate backwards
❌ If you want to iterate a list backwards, do not do this.
colors = ["red", "green", "blue"]
for i in range(len(colors) - 1, -1, -1):
print(colors[i])
✅ Use reversed
.
colors = ["red", "green", "blue"]
for color in reversed(colors):
print(color)
Iterate two lists
❌ If you want to iterate two lists at the same time, do not do it this way.
names = ['Anna', 'Beatriz', 'Carlos']
ages = [25, 30, 35]
for i in range(len(names)):
print(f"{names[i]} is {ages[i]} years old")
✅ Use zip()
. It is more Pythonic. Do not forget that if they are not the same size, the shorter one will be used.
names = ['Anna', 'Beatriz', 'Carlos']
ages = [25, 30, 35]
for name, age in zip(names, ages):
print(f"{name} is {age} years old")
# Anna is 25 years old
# Beatriz is 30 years old
# Carlos is 35 years old
Modify list while iterating
❌ Do not modify a list while iterating it. You will notice an erratic behavior. If you remove an element, the indexes change, so the index of the next element will be different. These problems are hard to detect and debug.
my_list = [1, 2, 3, 4, 4, 4, 4, 4, 4, 5]
for x in my_list:
if x == 4:
my_list.remove(x)
print(my_list)
# [1, 2, 3, 4, 4, 5]
✅ If you want to filter elements, do it like this. In this case, we filtered 4
correctly.
my_list = [1, 2, 3, 4, 4, 4, 4, 4, 4, 5]
print([i for i in my_list if i != 4])
# [1, 2, 3, 5]
Iterate dictionaries with items
❌ If you want to iterate the key-value pairs of a dictionary, do not do this.
my_dictionary = {'a': 1, 'b': 2, 'c': 3}
for key in my_dictionary.keys():
print(key, my_dictionary[key])
✅ Use items
to access the key
and value
.
my_dictionary = {'a': 1, 'b': 2, 'c': 3}
for key, value in my_dictionary.items():
print(key, value)
Using else in for
❌ If you are looking for a number in a list and you want to know if it has been found or not, you could do it in the following way.
numbers = [1, 2, 3, 3, 4, 5]
found = False
for num in numbers:
if num == 7:
print("Found!")
found = True
break
if not found:
print("Not found!")
✅ But you can save some lines by using for-else
. The else
section will be executed if the loop ends without finding the number. As a note, in this case, it is better to just use in
.
numbers = [1, 2, 3, 3, 4, 5]
for num in numbers:
if num == 7:
print("Found!")
break
else:
print("Not found!")
Generate lists with comprehensions
❌ If you want to create a list with consecutive numbers, you can do it like this.
numbers = []
for i in range(1, 4):
numbers.append(i)
print(numbers)
# [1, 2, 3]
✅ But you can narrow it down to a line with list comprehension. It is more Pythonic.
numbers = [i for i in range(1, 4)]
print(numbers)
# [1, 2, 3]
Modify lists with comprehensions
❌ To create a list with the squares of numbers from 1
to 3
, you can do it like this.
squares = []
for i in range(1, 4):
squares.append(i**2)
print(squares)
# [1, 4, 9]
✅ But it is more Pythonic using the list comprehension.
squares = [n**2 for n in range(1, 4)]
print(squares)
# [1, 4, 9]
Filter lists with comprehensions
❌ If you want to create a list of a set of values where each is transformed and filtered according to a criterion, you can simplify this.
even_squares = []
for i in range(1, 10):
if not i % 2:
even_squares.append(i**2)
print(even_squares)
# [4, 16, 36, 64]
✅ In this. In both cases, we get the square of the first 9 even numbers. Do not forget that sometimes the previous form is more readable and can become better.
even_squares = [i**2 for i in range(1, 10) if not i % 2]
print(even_squares)
# [4, 16, 36, 64]
Use lambda functions
❌ If you have a very simple function to use with sorted
, it is not necessary to define a function.
l = ['b', 'a', 'Z']
def to_lower(x):
return x.lower()
print(sorted(l, key=to_lower))
# ['a', 'b', 'Z']
✅ You can save some lines of code by using lambda functions.
print(sorted(l, key=lambda x: x.lower()))
# ['a', 'b', 'Z']
Use of early return
❌ We define a function that searches for the first positive in the list. However, it is not very efficient, since even if it has found it, it keeps iterating all the numbers.
def search(numbers):
found = None
for number in numbers:
if found is None and number > 0:
found = number
return found
positive = search([-3, -2, 0, 5, 10])
print(positive)
# 5
✅ Make use of the early return
. When you have found what you are looking for, stop iterating and return what you were looking for. In practice, for this you can just use in
.
def find(numbers):
for number in numbers:
if number > 0:
return number
return None
positive = find([-3, -2, 0, 5, 10])
print(positive)
# 5
Dangerous unimplemented function
❌ If you have a function to implement and you define it using pass
, it can be dangerous. Someone could use it without knowing that it is unimplemented.
def function_not_implemented():
pass
✅ It is better to throw an exception, so that if someone uses it, they will realize that it is not implemented.
def function_not_implemented():
raise NotImplementedError("Not implemented")
Avoid global variables
❌ Avoid using global variables, especially if you modify them. This produces what is known as side effects and causes a lot of problems.
counter = 0
def increment():
global counter
counter += 1
✅ Better pass the argument, modify it, and return it.
def increment(counter):
return counter + 1
counter = 0
counter = increment(counter)
Hidden None return
❌ Do not forget that any function returns None
by default. In the following case, if we use a role not taken into account, we will get None
, and this could cause problems.
def can_enter(role):
if role == "cop":
return True
if role == "visitor":
return False
print(can_enter("role_not_defined"))
# None
✅ Better to be explicit and handle all cases.
def can_enter(role):
if role == "cop":
return True
if role == "visitor":
return False
return False
print(can_enter("role_not_defined"))
# False
Use the main function
❌ Although for quick scripts it is not usually respected, the following code is not quite correct.
def main():
# your code here
print("Hello World")
main()
✅ Run your main
only if you are in the __main__
. Otherwise your code will be executed when importing the module.
def main():
# your code here
print("Hello World")
if __name__ == "__main__":
main()
Function annotations
❌ For simple examples, the following code is correct.
def sum(a, b):
return a + b
✅ But if you want to make your code more professional, use type annotations to indicate the type of each input and output argument. This acts as documentation, and a linter can use it to protect you.
def sum(a: int, b: int) -> int:
return a + b
Use sum with generators
❌ If you want to sum all these elements, you can save yourself from converting the sequence into a list.
sum_pairs = sum([x for x in range(1000000) if not x % 2])
print(sum_pairs)
# 249999500000
✅ You can use sum
directly without the brackets.
sum_pairs = sum(x for x in range(1000000) if not x % 2)
print(sum_pairs)
# 249999500000
Confusing generators with lists
❌ Do not confuse the range
. It hardly takes up any memory, it is a generator. It lazily generates the numbers on demand.
my_range = range(5)
print(my_range)
# range(0, 5)
✅ With a list. In this case, the list does occupy memory. All items are stored in memory.
my_range = range(5)
print(list(my_range))
# [0, 1, 2, 3, 4]
Use a finished generator
❌ If we create a generator, we can only iterate it once. The second time, since the end has already been reached, there will be nothing left to iterate.
def count_four():
for i in range(5):
yield i
gen = count_four()
for i in gen:
print(i)
for i in gen:
# Does not enter here.
# Nothing to iterate.
print(i)
✅ You will need to create a new generator.
for i in count_four():
print(i) # 0 1 2 3 4
for i in count_four():
print(i) # 0 1 2 3 4
Use static methods
❌ If you have generic methods that do not act on the object.
class Utilities:
def generate_id():
# logic to generate ID...
return "XYZ123"
✅ It may be better to define it as static.
class Utilities:
@staticmethod
def generate_id():
# logic to generate ID...
return "XYZ123"
Define eq for comparison
❌ Be careful if you compare two objects without having defined the criteria to use.
class MyClass:
def __init__(self, value):
self.value = value
a, b = MyClass(1), MyClass(1)
print(a == b)
# False
✅ If you want to be able to compare two objects, you have to define the __eq__
method. This sets the criterion of what is considered equal.
class MyClass:
def __init__(self, value):
self.value = value
def __eq__(self, other):
if isinstance(other, MyClass):
return self.value == other.value
return False
a, b = MyClass(1), MyClass(1)
print(a == b)
# True
Define str for printing
❌ When you define a class, if you do not define the __str__
method, when you print
it, you will get the following. Not very useful.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
c = Point(1, 2)
print(c)
# <__main__.Point object at 0x100f3ff40>
✅ Define the __str__
method for more meaningful information.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __str__(self):
return f"Point: x={self.x} y={self.y}"
c = Point(1, 2)
print(c)
# Point: x=1 y=2
Handle exceptions to prevent crashes
❌ Be careful when using input
. This allows the user to enter an input. You can never rely on what comes from the outside. If instead of a number you enter text
, you will get an exception that breaks the program.
age = int(input("State your age: "))
if age >= 18:
print("Older than age")
else:
print("Under age")
# ValueError: invalid literal for int() with base 10: 'text'.
✅ Whenever you convert between types, use a try
block and handle possible exceptions. This way, the program will not terminate abruptly.
try:
age = int(input("State your age: "))
if age >= 18:
print("Older than age")
else:
print("Under age")
except ValueError as e:
print("The entry is not a correct age.")
Do not handle generic exceptions
❌ It is possible to use Exception
to handle exceptions, but it is too generic. That will handle all exceptions, and you may want to handle them differently.
a, b = 10, 0
try:
print(a/b)
except Exception as e:
print("Error:", e)
✅ It is usually advisable to handle each exception separately, as you may want to handle them differently.
a, b = 10, "0"
try:
print(a/b)
except ZeroDivisionError as e:
print("Error ZeroDivisionError:", e)
except TypeError as e:
print("Error TypeError:", e)
Never ignore exceptions
❌ It is very dangerous to ignore exceptions with continue
. This exception is silenced, and nobody will know if it happened or not. The same happens if you use pass
.
try:
print(10/0)
except Exception:
continue
✅ At the very least, show a log of the exception so that it is not forgotten.
try:
print(10/0)
except Exception as e:
print(f"Error: {e}")
# Error: division by zero
Use context managers for safety
❌ If you want to open a file and display its contents, you can do it like this. Although it is not recommended because this code is prone to forgetting or not executing the close
. And it is very important to close the file after finishing.
file = open('file.txt', 'r')
content = file.read()
print(content)
file.close()
✅ Use context managers. Thanks to them, the file is automatically closed without having to indicate it explicitly. It guarantees us that outside of the block, the file will be closed. No matter how the block ends, if an exception occurs or not, the file will be closed.
with open('file.txt', 'r') as file:
content = file.read()
print(content)
Do not name files as packages
❌ Do not call your files the same as other packages you use. For example, do not use numpy.py
or math.py
. This will cause conflicts.
✅ Use different names for your packages and modules.
Include your license
❌ If you publish your code or package on the Internet without specifying any license, it will be copyrighted by default. This limits the use that others can make of it.
✅ It is best to include a license that allows others to use your code freely. Just as you benefit from other people’s code, allow others to benefit from yours. Examples of free licenses are MIT, Apache, or GPL.
Provide a requirements.txt file
❌ If you distribute your code for other people to use, it is important that you include the external packages it needs. You can do this with pip freeze > requirements.txt
. But the version should not be missing.
numpy
pandas
requests
✅ Always include the exact dependency version. This way, you guarantee that the code will work.
numpy==2.1.2
pandas==2.2.3
requests==2.32.3
Do not use redundant comments
❌ Avoid using redundant comments that contribute nothing.
# We set the variable x with the value 5.
x = 5
✅ If it does not contribute anything, do not post a comment. And if you do post a comment, make sure it adds value.
x = 5
Do not confuse docstrings with comments
❌ If you want to add a comment that spans multiple lines, do not use triple quotes unless it is a docstring.
"""
This is a multi-line
comment. But it consumes memory.
"""
✅ If it is a normal comment, use #
.
# Do this better if you want to have
# a multi-line comment.
Document functions
❌ It is better not to document functions in this way.
def calculate_area(width, length):
# calculate the area of a rectangle.
return width * length
✅ Use docstrings with triple quotes and indicate the inputs and outputs.
def calculate_area(width, length):
"""
Calculates the area of a rectangle
Parameters:
width (int or float): The width of the rectangle.
length (int or float): The length of the rectangle.
Returns:
int or float: The area of the rectangle.
"""
return width * length
Avoid unnecessary calculations
❌ Imagine a function that converts from degrees to radians. Although it works perfectly, we are calculating pi/180
every time. And this is actually a constant.
import math
def degrees_to_radians(degrees):
radians = degrees * (math.pi / 180)
return radians
print(degrees_to_radians(45))
# 0.7853981633974483
✅ It is better to precalculate the pi/180
value and use it directly. It will save us some calculations.
FACTOR_DEGREES_RAD = 0.017453292519943295
def degrees_to_radians(degrees):
radians = degrees * FACTOR_DEGREES_RAD
return radians
print(degrees_to_radians(45))
# 0.7853981633974483
Benchmark multiple times
❌ If you want to measure the time it takes to execute a piece of code, do not do it only once. It is better to measure it many times and take an average.
import time
start_time = time.time()
sum(x * x for x in range(10**6))
print(f"time={time.time() - start_time}")
✅ You can use timeit
for this.
from timeit import timeit
time = timeit('sum(x * x for x in range(10**6))', number=50)/50
print(f"time={time}")
Escape quotation marks
❌ If you want to include quotes inside a string, you cannot do it in the following way because you will get a SyntaxError
.
s = "The expresion "ad hoc" comes form Latin"
print(s)
# SyntaxError: invalid syntax
✅ You can do this in several ways. Using single quotes for the string or using escape characters.
s1 = 'The expresion "ad hoc" comes form Latin'
s2 = "The expresion \"ad hoc\" comes form Latin"
Define sorting logic
❌ Be careful when ordering a list with sorted
. When using ASCII code, the Z
is considered to come before the a
.
l = ['a', 'b', 'c', 'Z']
print(sorted(l))
# ['Z', 'a', 'b', 'c']
✅ To use sorted
, define your sorting logic.
l = ['a', 'b', 'c', 'Z']
print(sorted(l, key=lambda x: x.lower()))
# ['a', 'b', 'c', 'Z']
Confusing identity with equality
❌ Identity is not the same as equality. The is
operator tells us if both variables are the same. It is understood to be the same if both refer to the same memory location.
a = [1, 2, 3]
b = [1, 2, 3]
print(a is b)
# False
✅ If we want to see if they are the same, we must use the ==
operator to compare.
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b)
# True
Naming variables wrong
❌ PEP8 establishes a convention for how to name variables in Python. The following examples are incorrect.
def WrongFunctionName(a, b):
return a + b
class some_class:
pass
✅ The correct way to name a function is with snake_case
and a class with CamelCase
.
def correct_function_name(a, b):
return a + b
class SomeClass:
pass
Do not use asserts in production
❌ Be careful with the use of assert
. Imagine that we only want authorized users to run a task. This may seem to work fine, but if we run the code with python -O example.py
, the -O
removes the assert
, and unauthorized user3
can log in.
# example.py
AUTHORIZED = ("user1", "user2")
def restricted_task(user):
assert user in AUTHORIZED
print("Execute restricted task")
restricted_task("user3")
✅ Better perform the checks as follows.
def restricted_task(user):
if user in AUTHORIZED:
print("Execute restricted task")
else:
raise Exception(f'{user} not authorized')
OS specific code
❌ Do not write code that only works on one platform. If you want to create a path to a file, it will not work on Windows. Some systems use /
, but others like Windows use \\
.
path = "folder" + "/" + "Documents" + "/" + "file.txt"
✅ Use the join
function, which will adapt depending on the operating system.
import os
path = os.path.join("folder", "Documents", "file.txt")
Do not use magic numbers
❌ Avoid using magic numbers. These are numbers that appear in the code without knowing where they come from. It can be confusing for a reader to see 0.3048
without knowing where it comes from.
height_feet = 3006
height_meters = height_feet * 0.3048
print(height_meters)
# 916.2
✅ Better assign that number to a constant variable. Also, use a comment.
# conversion factor from feet to meters
FACTOR_FEET_METERS = 0.3048
height_meters = height_feet * FACTOR_FEET_METERS
Fill in your gitignore
❌ If you work with git, do not have an empty .gitignore
.
✅ Tell it to ignore unneeded files like __pycache__
. This will make your public code repository free of unnecessary files. It will protect you from accidentally committing private files like .env
.
# .gitignore content
__pycache__/
*.py[cod]
*$py.class
env/
.env/
*.pyc
Careful with float accuracy
❌ The float
type does not have infinite precision. Although mathematics tells us that 0.1
added 10
times is 1
, this is not true in Python.
sum = sum(0.1 for i in range(10))
print(sum == 1.0)
# False
✅ In some cases, when comparing float
, it may be better to use isclose
.
import math
sum = sum(0.1 for i in range(10))
print(math.isclose(sum, 1.0))
# True
Use aliases with imports
❌ If you want to import pandas
, numpy
, or matplotlib
, you can do it like this.
import pandas
import numpy
import matplotlib.pyplot
✅ But it is common to assign a shorter alias. You will find that everyone does it.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Use f-strings for string formatting
❌ If you want to bind a variable to text, you can do it like this.
name = "World"
greeting = "Hello " + name
print(greeting)
# Hello World
✅ But it is more Pythonic to do it this way, using the f-strings.
name = "World"
greeting = f"Hello {name}"
print(greeting)
# Hello World
Use logging instead of print
❌ Although the use of print
is widely used, it is not recommended to use it in production environments, as it does not offer any granularity.
print("Log message")
✅ It is preferable to use logging
with info
or debug
depending on the importance of the content being displayed. If set to info
mode, only logging.info
messages will be displayed.
import logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(message)s')
logging.info("Message level info")
logging.debug("Message debug level")
Exchange variables in one line
❌ If you want to swap the value of two variables, put a
in b
and vice versa, you do not need to use a temporary variable.
a, b = 0, 10
temp = a
a = b
b = temp
print(a, b)
# 10 0
✅ You can do it in one line of code.
a, b = b, a
Extract tuples to variables
❌ If you have a tuple with several values, you do not need to assign them to a variable one by one.
data = (50, "Python")
age = data[0]
language = data[1]
✅ You can do this as follows.
data = (50, "Python")
age, language = data
Use of all function
❌ If you have a list of bool
and you want to see if they are all True
, do not do it this way.
v = [True, True, False]
if v[0] and v[1] and v[2]:
print("All are true")
✅ Use the all
function. If you want to see if at least one is True
, you can use the any
function.
v = [True, True, False]
if all(v):
print("All are true")
Do not repeat code
❌ If you want to execute a function whether or not a condition is met, you can avoid duplicate code.
if x:
do_a()
do_b()
else:
do_b()
✅ Since do_b
is executed in both cases, you can simplify it.
if x:
do_a()
do_b()
Remove duplicated code with loops
❌ Imagine you have a text from which you want to remove exclamation and question marks. You can use replace
repeatedly.
def process_text(text):
text = text.replace('?', '')
text = text.replace('¿', '')
text = text.replace('!', '')
text = text.replace('¡', '')
return text
✅ But it might look better to do it in the following way.
def process_text(text):
for char in ['?', '¿', '!', '¡']:
text = text.replace(char, '')
return text
Use tuples for constants
❌ If you want to store a constant set of values in your code, it is better not to use lists. A list can be accidentally modified.
http_status = [200, 404, 403, 500]
✅ Use tuples, since they are immutable elements. Once declared, they cannot be changed.
http_status = (200, 404, 403, 500)
Environment variables are strings
❌ Be careful if you use environment variables. Their content is of type str
, so the following will never happen. It is not the same 6
as "6"
.
import os
if os.getenv("MYVAR") == 6:
print("Is 6")
✅ If you want to compare with 6
, which is an int
, do not forget to convert it first.
import os
if int(os.getenv("MYVAR")) == 6:
print("It is 6")
Do not confuse exponentiation with XOR
❌ Do not confuse the ^
operator and the **
operator. If you want to raise to the square, do not do this. The ^
operator is not what you are looking for; it is the XOR operator. Since 5
is 101
in binary and 2
is 010
, the result is 111
, or in decimal, 7
.
print(5^2)
# 7
✅ What you are looking for to square is **
.
print(5**2)
# 25
Follow PEP8 format
❌ Format your code according to the PEP8 recommendation. For example, add spaces between numbers and operators.
result = 2+3*4
✅ This code follows PEP8.
result = 2 + 3 * 4
Saturate a value to a maximum
❌ When you want to saturate a value, i.e., use the value
itself if it is less than a limit or saturate to the limit if greater, do not do the following.
def saturate(value, max_value):
saturated = max_value
if value < max_value:
saturated = value
return saturated
print(saturate(30, 15)) # 15
print(saturate(13, 15)) # 13
✅ It can be simplified as follows.
def saturate(value, max_value):
return min(value, max_value)
print(saturate(30, 15)) # 15
print(saturate(13, 15)) # 13
Fix circular imports
❌ If you have a mod_a
module that imports mod_b
and in turn that mod_b
module imports mod_a
, you will have a circular import ImportError
problem.
# mod_a.py
from mod_b import function_b
def function_a():
pass
On the other hand, mod_b
.
# mod_b.py
from mod_a import function_a
def function_b():
pass
✅ To avoid having a circular import, it is recommended that you define a third module that contains those common functions to be imported by both modules.
Import only what you need
❌ Do not use *
to import the entire contents of the module.
from your_module import *
✅ It is best to be explicit and state exactly what you want to import.
from your_module import function_a, function_b
Do not mix types in numpy
❌ If you define an array
of numpy
with different types, Python will change the types to accommodate everyone. In this case, it converts 1
and 2
to str
.
import numpy as np
arr = np.array([1, 2, '3'])
print(arr)
# ['1' '2' '3']
✅ If you want to force a particular type to be used, use dtype
. Be careful when mixing types.
arr = np.array([1, 2, '3'], dtype=int)
print(arr)
# [1 2 3]
Use numpy vectorized operations
❌ If you have an array
and you want to add 10
to each element, you do not need to iterate it all.
import numpy as np
arr = np.array([1, 2, 3])
for i in range(len(arr)):
arr[i] = arr[i] + 10
print(arr)
# [11 12 13]
✅ You can benefit from the vectorized operations of numpy
. Easier.
print(arr + 10)
# [11 12 13]
Use numpy boolean indexing
❌ If you have an array and you want to keep the elements >0
, do not do it this way.
import numpy as np
arr = np.array([-3, 3, -5, 10, 20, 1])
filtered = []
for i in range(len(arr)):
if arr[i] > 0:
filtered.append(arr[i])
print(np.array(filtered))
# [ 3 10 20 1]
✅ Take advantage of the boolean indexing or masking of numpy
and do it like this. The result is the same and benefits from vectorization, which makes it faster.
import numpy as np
print(arr[arr > 0])
# [ 3 10 20 1]
Do not confuse matrix multiplication
❌ If you want to multiply two matrices as you would in algebra, do not use the *
operator, since it multiplies them element by element.
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
print(arr1 * arr2)
# [[ 5 12]
# [21 32]]
✅ To multiply two matrices, use @
.
print(arr1 @ arr2)
# [[19 22]
# [43 50]]
If size is known do not use append
❌ If you know the size of your numpy
array beforehand, do not do this. It is very inefficient, since each append
has to allocate new memory and in some cases copy all the previous stuff again.
import numpy as np
arr = np.array([])
for i in range(1000):
arr = np.append(arr, i)
✅ It is better to define a priori the size. The difference in efficiency is very large.
size = 1000
arr = np.empty(size)
for i in range(size):
arr[i] = i
Careful with numpy overflow
❌ In the following example, we have an overflow. Python assigns a maximum of 6
letters to each record, so if you use a larger name, it will be cut off. The 6
comes from Alice
. This is dangerous. We lose information.
import numpy as np
a = np.rec.fromrecords(
[('Alice', 25), ('Bob', 30)],
names=['name', 'age'])
a["name"][0] = "NewNameLong"
print(a["name"])
# ['NewNa' 'Bob']
✅ It may be better to be explicit with the dtype
we want. In this case, we assign 20
letters.
a = np.rec.fromrecords(
[('Alice', 25), ('Bob', 30)],
dtype=[('name', 'U20'), ('age', 'i4')])
a["name"][0] = "NewLongName"
print(a["name"])
# ['NewLongName' 'Bob']
Optimize memory in numpy
❌ In numpy
, when we store a number such as age, int64
is used by default. In cases where we store small numbers, we can optimize this.
arr1 = np.rec.fromrecords(
[('Alice', 25), ('Bob', 30)],
names=['name', 'age'])
print(f"{arr1.nbytes} bytes")
# 64 bytes
✅ We can use uint8
, which allows us to reduce the memory it occupies. 22 bytes may not seem like a lot, but if we store millions of values, we will save many GB.
arr1 = np.rec.fromrecords(
[('Alice', 25), ('Bob', 30)],
dtype=[('name', 'U5'), ('age', np.uint8)])
print(f"{arr1.nbytes} bytes")
# 42 bytes
Use chunksize in pandas
❌ If you work with huge files of several GB, you may not be able to load the entire file into a DataFrame
for processing.
df = pd.read_csv('huge_file.csv')
# Process
✅ If your file is too large, consider processing it in chunks. This way, you will not load everything into memory, but the chunks.
for chunk in pd.read_csv('huge_file.csv', chunksize=1000):
# process 1000 by 1000
Rename columns in pandas
❌ If you want to rename several columns, such as removing spaces and using lowercase letters, it is not necessary to rename them one by one.
import pandas as pd
data = {' Name ': [1, 2],
' AGE ': [4, 5],
' city ': [6, 7]}
df = pd.DataFrame(data)
df = df.rename(columns={' Name ': 'name',
' AGE ': 'age',
' city ': 'city'})
✅ You can use rename
to remove spaces (strip()
) and use lowercase (lower()
).
df = df.rename(columns=lambda x: x.strip().lower())
print(df)
Optimize memory in pandas with category
❌ If you want to store a DataFrame
whose elements can take certain known values, doing so is not very efficient.
import pandas as pd
df = pd.DataFrame({
'products': ['A', 'B', 'C'] * 100000
})
print("Memory", df.memory_usage(deep=True).sum())
# Memory 17400128
✅ Consider using the category
type, which reduces memory consumption significantly.
df['products'] = df['products'].astype('category')
print("Memory", df.memory_usage(deep=True).sum())
# Memory 300410
Use set to remove duplicates
❌ If you want to remove duplicates from a list, it is not necessary to write it from scratch.
l = [1, 2, 2, 3, 4, 4, 5]
no_duplicates = []
for element in l:
if element not in no_duplicates:
no_duplicates.append(element)
# [1, 2, 3, 4, 5]
✅ You can use a set to remove duplicates and convert to list again.
l = [1, 2, 2, 3, 4, 4, 5]
no_duplicates = list(set(l))
# [1, 2, 3, 4, 5]
Use counter to count elements
❌ If you want to count the elements of a list, you do not need to write it from scratch.
elements = ['a', 'b', 'c', 'a', 'b', 'b']
count = {}
for e in elements:
if e in count:
count[e] += 1
else:
count[e] = 1
print(count)
# {'a': 2, 'b': 3, 'c': 1}
✅ You can use Counter
.
from collections import Counter
elements = ['a', 'b', 'c', 'a', 'b', 'b']
count = Counter(elements)
print(count)
# Counter({'b': 3, 'a': 2, 'c': 1})
Use startswith instead of slicing
❌ If you want to know if a string starts with something specific.
x = "The Python Book"
print(x[0:3] == "The")
# True
✅ It is better to do it with startswith
.
x = "The Python Book"
print(x.startswith("The"))
# True
Use in to check if contained
❌ If you want to see if an item is contained in a list, you can do it like this.
def contains(names, name):
found = False
for n in names:
if n == name:
found = True
break
return found
names = ['Anna', 'John', 'Peter']
print(contains(names, 'John'))
# True
✅ But it is easier to use in
.
print('John' in names)
# True
Join elements in a list
❌ If you want to join the elements of a list by ,
, you can do it like this.
names = ['Anna', 'John', 'Peter']
result = ''
for index, name in enumerate(names):
if index < len(names) - 1:
result += name + ', '
else:
result += name
print(result)
# Anna, John, Peter
✅ But it is easier to use join
.
result = ', '.join(names)
print(result)
# Anna, John, Peter
Read csv with numpy
❌ If you want to read a CSV, you can create your own function.
def read_csv(path):
with open(path, 'r') as file:
lines = file.readlines()
data = [line.strip().split(',') for line in lines]
return data
✅ But it is better to use some external library like numpy
.
import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', skip_header=1)
Iterate a list in chunks
❌ If you want to iterate a very large list in chunks, instead of defining the code yourself.
def chunks(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i + n]
for chunk in chunks(huge_list, 100):
print(chunk)
✅ You can make use of batched
.
import itertools as it
for chunk in it.batched(huge_list, 100):
print(chunk)
Do not use pickle
❌ Do not use the pickle
library for serializing and deserializing. It allows unauthorized code execution. If we get a user to open our malicious_data
, we can execute code on their machine. In this example, we just list with ls
your files, but you can do much more.
import pickle
import os
class Malicious:
def __reduce__(self):
return (os.system, ('ls',))
malicious_data = pickle.dumps(Malicious())
pickle.loads(malicious_data)
✅ Use other tools to serialize, such as json
.
Careful with SQL injection
❌ If you have a database with a function that allows you to get the users
, do not do it this way. This code is vulnerable to SQL injection. If the input is "admin' OR '1'='1"
, an attacker will be able to get all your data.
import sqlite3
def get_user(user):
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
query = f"SELECT * FROM users WHERE user = '{user}'"
cursor.execute(query)
result = cursor.fetchall()
conn.close()
return result
✅ Better do it this way. This code is no longer vulnerable.
def get_user(user):
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
query = "SELECT * FROM users WHERE user = ?"
cursor.execute(query, (user,))
result = cursor.fetchall()
conn.close()
return result
Avoid eval and exec
❌ Be very careful with eval
and exec
. It allows executing arbitrary code, which is the perfect gateway for any attacker. If you pass this __import__('os').system("ls")
as input, you can know the contents of the folder. You could even access files, delete them, etc.
user_input = input("command: ")
eval(user_input)
✅ Unless you know what you are doing, do not use eval
or exec
. And if you do use them, at least limit what can be executed.
Use pip packages you trust
❌ Be very careful with your dependencies. Just because a package is published in pip
does not mean anything. It can have bugs. It may have malicious code.
✅ Make sure your dependencies are legit. Look to see if they are known packages, if there are other people using them, who is behind them.
Use timeout in requests
❌ You can make a request to an external server as follows. Now imagine that server does not respond to you, and leaves you waiting forever.
import requests
response = requests.get('https://example.com')
print(response.status_code)
✅ To prevent this type of attack, it is advisable to use a timeout
. This is the time we will wait. After that time, if we have not received a response, we will give an error and move on.
import requests
response = requests.get('https://example.com', timeout=5)
print(response.status_code)
Use secrets instead of random
❌ If you want to generate a private key, do not use random
. This package is fine when we want a random number for testing, but it is not safe for generating private keys. It is not as random as it should be.
import random
private_key = random.getrandbits(256)
✅ Better use the secrets
module, which provides better randomness or entropy.
import secrets
private_key = secrets.randbits(256)