Test your code
Tests are code that verify other code is working correctly. They allow you to:
- β Ensure that your code functions as expected today.
- π‘οΈ Safeguard your code against future changes, ensuring it continues to work tomorrow.
Writing tests involves:
- π© Demonstrating that given known inputs, the outputs are as expected.
- β Ensuring that unexpected inputs yield expected outputs. For example, ordering
%&?
beers in a bar should not cause the bartender to crash, and neither should your program.
Let us explore how to use pytest
to protect your code from bugs.
Introduction to testing
The purpose of testing is to detect bugs, which are errors or defects where the code does not behave as expected. The term bug is said to originate from an actual moth found in a computer in 1947, causing it to malfunction.
Consider a sum
function. Any code, no matter how simple, should be accompanied by tests to verify its correct behavior. It is as simple as this.
# sum.py
def sum(a, b):
return a + b
def test_sum():
assert sum(1, 2) == 3
assert sum(5, 5) == 10
In test_sum
, we verify that:
- The output is
3
when the input is1
and2
. - The output is
10
when the input is5
and5
.
This is our first unit test. Now, let us run them. First, install pytest
.
pip install pytest
With pytest
, you can run the tests. The PASSED
message indicates that all asserts
were successful.
pytest -v sum.py
# sum.py::test_sum PASSED
It is crucial to push tests to their limits by testing unexpected inputs. For instance, adding 1
and "2"
should raise an exception, as these values should not be added together.
import pytest
def test_sum_exception():
with pytest.raises(TypeError):
sum(1, "2")
Testing the addition of %
and ?
might also be necessary. While it may seem nonsensical, our function currently returns %?
instead of an error. It is up to you to decide if this is acceptable.
import pytest
def test_sum_exception():
with pytest.raises(TypeError):
sum("%", "?")
Generally, more tests are better, but they do not guarantee anything, especially if they are incomplete or do not test enough combinations. Remember:
- β οΈ A test can prove the presence of a bug, but not its absence.
If you can write a test that fails, you have proven something does not work. However, if all tests pass, it does not guarantee the code is 100% bug-free. The more comprehensive the tests, the more secure the code.
Tests are also code and are usually separated from the main code. Here are some tips for organizing your tests:
- π Create a
tests
folder to store all your tests. - π If a module is named
module.py
, place its tests intest_module.py
.
Before diving into testing, let us clarify some jargon:
- π Bug: A defect in the code where something does not work as expected. Tests aim to eliminate them.
- π Test Case: A test that verifies a specific use case, like
test_sum
. - π Test Suite: A collection of test cases for better organization.
- π² Flaky Test: A test that sometimes passes and sometimes fails, exhibiting random behavior. Ideally, there should be none, but they occasionally occur.
- π Coverage: A metric measuring the percentage of code covered by tests. The goal is 100% coverage, but it does not guarantee anything.
- πͺοΈ Fuzzing: A testing technique providing random and unexpected inputs to detect bugs, similar to asking a waiter for
%&?
beers. - π Black Box: A type of testing viewing the system as a black box, where inputs are given, outputs are received, but the internal workings are unknown.
- π Continuous Integration: A software development practice where every code change is verified by running all tests. If tests fail, the change is not included. This provides immediate feedback and improves code quality.
- π Mock: Used in testing to replace an object with a mock. For example, if testing communications with an aircraft, a mock can simulate the aircraft. That allows you to test the code without the need for an actual aircraft.
- π§ Test Vector: A set of expected inputs and outputs. For example, for the sum function, a test vector could be
5, 3, 8
, meaning if the inputs are5
and3
, the expected output is8
.
The concept of test is broad, with many types:
- 𧩠Unit: Focus on testing small units of code like functions or classes.
- π Integration: Focus on testing the integration of all system components.
- β‘οΈ Performance: Focus on testing speed and efficiency.
Depending on your programβs complexity, you may not need all these types, but any program should have at least unit tests. Tests are not just for advanced developers; even simple programs benefit from them.
Writing tests is an art. It is often better if the person writing the code does not write the tests, as they may be biased. An outsider with an unbiased perspective is ideal.
A person who thinks outside the box is usually good at writing tests. It is not enough to test the happy path; you must explore rare cases where the code might fail.
One advantage of writing tests as code is that they run automatically, making it easy to verify nothing is broken.
Ideally, code changes should not be supported until all tests pass. Tools like Jenkins or GitHub Actions facilitate this. Until all tests are passing, new changes cannot be added.
A common misconception is that tests guarantee the absence of bugs. They can detect presence but not absence. If not written correctly, bugs can still occur.
With this, we are ready to start writing our first tests with pytest
.
Testing with pytest
The pytest
package allows you to write and run tests for your code easily. It is a de facto industry standard. If you ever thought you did not have time to write tests, pytest
removes that excuse. Write tests. Always. It is very easy.
First, we need something to test. Everything starts with code that has requirements and expected logic. Once we have it, tests ensure these requirements are always met.
For our example, we will write code to detect if the International Space Station (ISS) is flying overhead. Then, we will write tests to ensure it works correctly.
Our project requirements are:
- π°οΈ The ISS position is obtained from the API http://api.open-notify.org/iss-now.json.
- π Our position on Earth, with longitude and latitude, is known and passed in the constructor.
- π Knowing our coordinates and those of the ISS, we determine if it is passing overhead.
The API returns the following JSON, including the ISS position, time, and request success status.
{
"message": "success",
"iss_position": {
"longitude": "31.9896",
"latitude": "-26.6497"
},
"timestamp": 1732180867
}
Using this API requires Internet access. We will later see how to make tests independent of external services.
The ISS orbits at a low altitude, about 418 km, circling the Earth every 90 minutes.
With this information, we are ready to write our code. We define a class DistanceISS
.
# iss.py
from geopy.distance import geodesic
from math import sqrt
import requests
class DistanceISS:
URL = "http://api.open-notify.org/iss-now.json"
T = 0.01
def __init__(self, lat, lon):
self.lat = lat
self.lon = lon
def coordinates_iss(self):
try:
response = requests.get(self.URL)
response.raise_for_status()
iss_data = response.json()
iss_lat = float(iss_data['iss_position']['latitude'])
iss_lon = float(iss_data['iss_position']['longitude'])
return iss_lat, iss_lon
except (requests.RequestException, ValueError, KeyError) as e:
raise ValueError(f"Error getting ISS position: {str(e)}")
def above(self):
iss_lat, iss_lon = self.coordinates_iss()
lat_diff = abs(iss_lat - self.lat)
lon_diff = abs(iss_lon - self.lon)
return lat_diff <= self.T and lon_diff <= self.T
The methods are as follows:
__init__
: Constructor that takes our geographical coordinates of latitude and longitude.coordinates_iss
: Uses the API to return the current position of the ISS in latitude and longitude.above
: ReturnsTrue
if the ISS is above us, with a tolerance ofT
.
We can use it by indicating our coordinates, corresponding to Buenos Aires. If it returns True
, the ISS is passing over Buenos Aires.
iss = DistanceISS(-34.6037, -58.3816)
print(iss.above())
# False
Now that we have the code, let us write tests to ensure it works correctly and protect it from future modifications. We start with a simple test for __init__
, verifying that the latitude lat
and longitude lon
are stored correctly.
# iss_test.py
import pytest
from iss import DistanceISS
def test_init():
distance_iss = DistanceISS(10, 20)
assert distance_iss.lat == 10
assert distance_iss.lon == 20
To run the test, use the following command in the terminal.
pytest -v
You will get a report indicating that the test has passed, meaning all asserts
meet the expected condition.
iss_test.py::test_init PASSED [100%]
The above command automatically searches for all tests in the folder. To run tests in a specific file, use:
pytest -v iss_test.py
To run a specific test:
pytest -v iss_test.py::test_init
Congratulations, you have your first test and can run it. Now, every time you make a change, you can run it to ensure nothing is broken. But there is more to test.
Next, we write a test for coordinates_iss
. We want to verify that we correctly extract the latitude and longitude from the API response. However, there is an external dependency: the API requires Internet access.
A good practice is to use a mock to simulate the API, allowing the test to run without Internet access.
It is important to define the boundaries of what we want to test. In unit tests, we focus on our code. If the API stops working, our test should still work, as the code is fine.
We can mock the API as follows, instructing Python to return a specific value when requests.get
is called in the iss
module. We indicate None
in raise_for_status
to signify no issues.
# iss_test.py
import pytest
from unittest.mock import patch
from iss import DistanceISS
def test_coordinates_iss():
with patch('iss.requests.get') as mock_get:
mock_get.return_value.json.return_value = {
'iss_position': {
'latitude': '-50.0',
'longitude': '-30.1',
},
'message': 'success',
'timestamp': 1596563200
}
mock_get.return_value.raise_for_status = lambda: None
distance_iss = DistanceISS(0, 0)
assert distance_iss.coordinates_iss() == (-50.0, -30.1)
The test verifies that for the mocked response, the longitude and latitude are correctly extracted. But we cannot trust anyone; we must see what happens when the result is unexpected.
Imagine the API is modified and mistakenly returns incorrect longitude and latitude parameters, such as text
instead of numbers. A test should ensure an exception is raised. It may seem obvious, but imagine that the code processes text
and returns a valid coordinate pair. This would be dangerous.
def test_coordinates_iss_non_numerics():
with patch('iss.requests.get') as mock_get:
mock_get.return_value.json.return_value = {
'iss_position': {
'latitude': 'text',
'longitude': 'text',
},
'message': 'success',
'timestamp': 1596563200
}
mock_get.return_value.raise_for_status = lambda: None
distance_iss = DistanceISS(0, 0)
with pytest.raises(Exception, match="Error"):
distance_iss.coordinates_iss()
Now imagine the API is offline. We can simulate this with our mock by using raise_for_status
with an exception, verifying that coordinates_iss
propagates the exception and does not return coordinates.
def test_coordinates_iss_error():
with patch('iss.requests.get') as mock_get:
mock_get.return_value.json.return_value = {}
mock_get.return_value.raise_for_status.side_effect = Exception("Error")
distance_iss = DistanceISS(0, 0)
with pytest.raises(Exception, match="Error"):
distance_iss.coordinates_iss()
Let us continue writing tests for above
. This function calls coordinates_iss
, which uses the API requiring Internet, so we will use a mock.
With this mock, we simulate coordinates_iss
returning the coordinates of Madrid.
def test_above():
with patch.object(DistanceISS, 'coordinates_iss', return_value=(40.4167, -3.7033)):
iss = DistanceISS(40.4167, -3.7033)
assert iss.above() == True
At this point, we have several tests to verify our code works correctly. We have covered expected paths and potential issues, ensuring behavior is as desired in all cases.
Multiple tests with parametrize
Although we have written tests for all functions, they are not exhaustive. More value combinations need testing.
We can write a test with these coordinates.
def test_above_1():
with patch.object(DistanceISS, 'coordinates_iss', return_value=(1, 1)):
distance_iss = DistanceISS(1.001, 1.001)
assert distance_iss.above() == True
And another test with different coordinates.
def test_above_2():
with patch.object(DistanceISS, 'coordinates_iss', return_value=(1, 1)):
distance_iss = DistanceISS(0.999, 0.999)
assert distance_iss.above() == True
Notice the duplicate code; only one line changes, but four are repeated.
pytest
offers parametrize
to test different combinations without repeating code. This introduces the concept of test vectors, which have two parts:
- π Input: The function arguments, in our case, our position and the ISS coordinates.
- π Expected Output: The expected result, the expected distance between our position and the ISS.
Test vectors continuously verify that given inputs yield expected outputs. In parametrize
, our test vector includes many combinations.
@pytest.mark.parametrize("lat, lon, iss_lat, iss_lon, above", [
(40.4, -3.7, 40.4, -3.702, True),
(34.0, -118.2, 34.001, -118.2, True),
(51.5, -0.1, 0.4, -3.3, False)
])
def test_distance_iss(lat, lon, iss_lat, iss_lon, above):
with patch.object(DistanceISS, 'coordinates_iss', return_value=(iss_lat, iss_lon)):
distance_iss = DistanceISS(lat, lon)
assert distance_iss.above() == above
Running it generates multiple tests, one for each input.
iss_test.py::test_distance_iss[40.4--3.7-40.4--3.702-True]
iss_test.py::test_distance_iss[34.0--118.2-34.001--118.2-True]
iss_test.py::test_distance_iss[51.5--0.1-0.4--3.3-False]
Using fixtures in pytest
Imagine multiple tests use the same data. In this case, both tests use the same list.
def test_sum():
assert sum([1, 2, 3, 3, 4, 5]) == 15
def test_length():
assert len([1, 2, 3, 3, 4, 5]) == 5
This results in duplicate code. If you have 10 functions, even more. If you want to change the data, you must modify it everywhere.
You can create a data
variable and use it in all tests. This solution is valid and may suffice in simple cases.
data = [1, 2, 3, 3, 4, 5]
def test_sum():
assert sum(data) == 15
def test_length():
assert len(data) == 5
However, pytest
offers fixture
for similar functionality with more control. The following code behaves the same, but you can pass the fixture
to each test individually.
import pytest
@pytest.fixture
def data():
return [1, 2, 3, 4, 5]
def test_sum(data):
assert sum(data) == 15
def test_length(data):
assert len(data) == 5
An interesting property is that pytest
maintains the fixture
value. If a test changes or deletes data
, nothing happens. Each test accesses a separate copy, unlike the previous method.
@pytest.fixture
def data():
return [1, 2, 3, 4, 5]
def test_sum(data):
assert sum(data) == 15
# We modify data here ...
data.append(6)
def test_length(data):
# ... but data does not change here
assert len(data) == 5
Returning to our ISS example, we can use the fixture
for the response
as follows.
@pytest.fixture
def response():
return {
'iss_position': {'latitude': 0, 'longitude': 0},
'message': 'success',
'timestamp': 1596563200
}
@pytest.fixture
def iss():
return DistanceISS(0, 0)
def test_coordinates_iss_non_numerics(response, iss):
with patch('iss.requests.get') as mock_get:
mock_get.return_value.json.return_value = response
mock_get.return_value.raise_for_status = lambda: None
assert iss.above() == True
Finally, you can configure the scope
of a fixture
as follows.
@pytest.fixture(scope="session")
def data():
return [1, 2, 3, 4, 5]
def test_sum(data):
assert sum(data) == 15
# We modify data here ...
data.append(6)
def test_length(data):
# ... and since it is a session fixture,
# it is modified here as well
assert len(data) == 5
The scope
determines when it is recreated. By default, function
is used, meaning each function receives a new fixture
.
Other options include session
, creating a single fixture
for the entire session. If modified in a test, it affects the rest.
Other pytest features
pytest
has a wide variety of plugins to extend its functionality. Some are default, others require installation.
- βοΈ skip
- β³ timeout
- β οΈ flaky
- β±οΈ freeze time
- π benchmark
To install them:
pip install pytest-timeout
pip install pytest-rerunfailures
pip install pytest-freezegun
pip install pytest-benchmark
βοΈ skip
: Skip a test so it does not run. Remember, if a test fails, the solution is not to skip it but to understand and fix the problem.
@pytest.mark.skip(reason="Does not run")
def test_that_does_not_run():
pass
Running the tests shows which are skipped and why.
iss_test.py::test_that_does_not_run SKIPPED (Does_not_run)
β³ timeout
: Set a maximum time for tests. If not finished within that time, it fails.
In this example, the test should pass, but it takes 2
seconds, and the timeout is 1
second, so it fails.
import time
@pytest.mark.timeout(1)
def test_with_timeout():
time.sleep(2)
assert 1 == 1
Running it results in:
iss_test.py::test_with_timeout FAILED
β οΈ flaky
: Mark a test as flaky or unstable, if it fails sometimes. Specify the maximum retry attempts. This will try to run the test multiple times. If not passed after all attempts, it fails.
Tests should be deterministic, but sometimes flaky tests are unavoidable.
import random
@pytest.mark.flaky(reruns=10)
def test_flaky():
die = random.choice([1, 2, 3, 4, 5, 5, 6])
assert die == 1
Running it shows multiple attempts, up to 10
. If at least one passes, it is considered passed.
iss_test.py::test_flaky RERUN
iss_test.py::test_flaky RERUN
iss_test.py::test_flaky PASSED
1 passed, 2 rerun in 0.09s
β±οΈ freeze_time
: Freeze the current date, similar to a mock.
@pytest.mark.freeze_time('2023-01-01')
def test_today_date():
assert date.today() == date(2023, 1, 1)
π benchmark
: Measure test execution time.
def heavy_task():
return sum(i * i for i in range(100000000))
@pytest.mark.benchmark()
def test_heavy_task_benchmark(benchmark):
result = benchmark(heavy_task)
assert result == 333333328333333350000000
The time taken is displayed when the test runs, averaged over multiple runs. In this case, it takes about 4.4
seconds.
Name (time in s) Min Max Mean
----------------------------------------------
test_heavy_task_benchmark 4.48 4.61 4.54
You can set a condition with an assert
where the test only passes if it takes less than a certain time. In this example, it fails if it takes more than 5
seconds.
@pytest.mark.benchmark()
def test_heavy_task_benchmark(benchmark):
result = benchmark(heavy_task)
assert result == 333333328333333350000000
assert benchmark.stats['mean'] < 5
This protects against future code modifications that may reduce efficiency.
Coverage with pytest-cov
A crucial metric in testing is code coverage, indicating the percentage of code covered by at least one test. The goal is 100% code coverage.
Define an operate
function that takes two parameters a
and b
and performs an addition or subtraction based on the operation
.
# example.py
def operate(a, b, operation):
if operation == "sum":
return a + b
if operation == "subtraction":
return a - b
else:
raise ValueError(f"Invalid operation: {operation}")
Create a file with a test.
# example_test.py
import pytest
from example import operate
def test_operate():
assert operate(2, 3, "sum") == 5
Install the pytest-cov
package.
pip install pytest-cov
Running tests with --cov
provides a coverage report. Here, it is 50%.
pytest --cov=example
# Name Stmts Miss Cover
# --------------------------------
# example.py 6 3 50%
It is 50% because only the sum
operation is tested, not other paths. It is an incomplete test. To achieve 100%, test all paths:
- β Addition.
- β Subtraction.
- β οΈ An unrecognized operator.
The following test covers all three paths.
def test_operate():
assert operate(2, 3, "sum") == 5
assert operate(2, 3, "subtraction") == -1
with pytest.raises(ValueError):
operate(2, 3, "multiplication")
Now, coverage is 100%.
pytest --cov=example
# Name Stmts Miss Cover
# --------------------------------
# example.py 6 0 100%
# --------------------------------
# TOTAL 6 0 100%
Although 100% coverage is ideal, it does not guarantee anything. Code with 100% coverage may still have bugs. A test shows the presence of a bug but not its absence.
Let us prove it. Imagine a mistake where the operation
subtract uses +
instead of -
. This code is incorrect.
def operate(a, b, operation):
if operation == "sum":
return a + b
if operation == "subtraction":
return a + b # <- Mistake. WRONG!
else:
raise ValueError(f"Invalid operation: {operation}")
In the test, if 0
and 0
are subtracted, the result is 0
. The correct result is achieved with incorrect code.
def test_operate():
assert operate(2, 3, "sum") == 5
# This assert passes, but there is a bug.
assert operate(0, 0, "subtraction") == 0
This test passes, and coverage is 100%. This could falsely assure that the code is correct. But it is not.
The bug can be detected with the following code, a counterexample showing something is wrong.
def test_operate():
assert operate(5, 1, "subtraction") == 4
It is important to test as many combinations as possible, including positive numbers, negative numbers, zero, decimals, and unexpected inputs.
To exclude a function from the coverage metric, use the comment # pragma: no cover
. Python will not consider it in the percentage calculation.
def operate(a, b, operation): # pragma: no cover
# ...
pytest
generates interesting coverage reports. When you have hundreds of functions, a nice user interface to navigate is helpful.
pytest --cov=example --cov-report=html
Fuzzing with hypothesis
Fuzzing generates numerous random inputs with various values to find cases where behavior is unexpected. Unlike previous tests, fuzzing is automatic.
Automatic generation is advantageous, allowing thousands of combinations. Writing this manually would be tedious.
Fuzzing detects:
- π― Possible edge cases not considered.
- β‘ Performance issues. Some inputs may take too long.
- π Inconsistencies, known as invariants and idempotency. For example, sorting a
list
and re-sorting it should yield the same result. Reversing a list twice should return the initial list.
Here is how to use the hypothesis
package. First, install it.
pip install hypothesis
Generate an example of an int
, a random int
.
import hypothesis.strategies as st
print(st.integers().example())
# -14251
Add restrictions, such as between 0
and 100
.
print(st.integers(min_value=0, max_value=100).example())
# 95
Generate text in str
, including all kinds of letters.
import hypothesis.strategies as st
print(st.text().example())
# ΓΒͺΒͺ󫦬aûæ4c
For multiple examples:
for _ in range(5):
print(st.integers().example())
hypothesis
can generate almost anything:
integers
: Integers orint
.text
: Text orstr
.floats
: Float numbers.booleans
:bool
types.lists
: Lists orlist
.tuples
: Tuples ortuple
.dictionaries
: Dictionaries ordict
.sets
: Sets orset
.
Now let us focus on integrating hypothesis
with pytest
.
Define a function that reverses an str
, changing abc
to cba
.
def reverse(s: str) -> str:
return s[::-1]
A good fuzzing example is verifying that using reverse
twice returns the initial value.
The following test generates 10000
random str
examples, verifying the property.
from hypothesis import given, settings
from hypothesis import strategies
@settings(max_examples=10000)
@given(strategies.text())
def test_reverse(s):
assert reverse(reverse(s)) == s
Run it as follows, and it will show statistics of the tested examples. In this case, everything worked correctly.
pytest -v --hypothesis-show-statistics
# 10000 passing examples
# 0 failing examples
# 0 invalid examples
In summary:
- Always write tests, even for simple code. They help verify functionality and protect against future changes.
- Include tests with unexpected inputs to ensure code resilience.
- Use coverage metrics as a guide, but remember that 100% coverage does not guarantee bug-free code.