Link Search Menu Expand Document

Test your code

Testing Image

Tests are code that verify other code is working correctly. They allow you to:

  • βœ… Ensure that your code functions as expected today.
  • πŸ›‘οΈ Safeguard your code against future changes, ensuring it continues to work tomorrow.

Writing tests involves:

  • πŸ“© Demonstrating that given known inputs, the outputs are as expected.
  • ❓ Ensuring that unexpected inputs yield expected outputs. For example, ordering %&? beers in a bar should not cause the bartender to crash, and neither should your program.

Let us explore how to use pytest to protect your code from bugs.

Introduction to testing

The purpose of testing is to detect bugs, which are errors or defects where the code does not behave as expected. The term bug is said to originate from an actual moth found in a computer in 1947, causing it to malfunction.

Consider a sum function. Any code, no matter how simple, should be accompanied by tests to verify its correct behavior. It is as simple as this.

# sum.py
def sum(a, b):
    return a + b

def test_sum():
    assert sum(1, 2) == 3
    assert sum(5, 5) == 10

In test_sum, we verify that:

  • The output is 3 when the input is 1 and 2.
  • The output is 10 when the input is 5 and 5.

This is our first unit test. Now, let us run them. First, install pytest.

pip install pytest

With pytest, you can run the tests. The PASSED message indicates that all asserts were successful.

pytest -v sum.py
# sum.py::test_sum PASSED

It is crucial to push tests to their limits by testing unexpected inputs. For instance, adding 1 and "2" should raise an exception, as these values should not be added together.

import pytest

def test_sum_exception():
    with pytest.raises(TypeError):
        sum(1, "2")

Testing the addition of % and ? might also be necessary. While it may seem nonsensical, our function currently returns %? instead of an error. It is up to you to decide if this is acceptable.

import pytest

def test_sum_exception():
    with pytest.raises(TypeError):
        sum("%", "?")

Generally, more tests are better, but they do not guarantee anything, especially if they are incomplete or do not test enough combinations. Remember:

  • ⚠️ A test can prove the presence of a bug, but not its absence.

If you can write a test that fails, you have proven something does not work. However, if all tests pass, it does not guarantee the code is 100% bug-free. The more comprehensive the tests, the more secure the code.

Tests are also code and are usually separated from the main code. Here are some tips for organizing your tests:

  • πŸ“ Create a tests folder to store all your tests.
  • πŸ“ If a module is named module.py, place its tests in test_module.py.

Before diving into testing, let us clarify some jargon:

  • πŸ› Bug: A defect in the code where something does not work as expected. Tests aim to eliminate them.
  • πŸ“ Test Case: A test that verifies a specific use case, like test_sum.
  • πŸ“š Test Suite: A collection of test cases for better organization.
  • 🎲 Flaky Test: A test that sometimes passes and sometimes fails, exhibiting random behavior. Ideally, there should be none, but they occasionally occur.
  • πŸ“Š Coverage: A metric measuring the percentage of code covered by tests. The goal is 100% coverage, but it does not guarantee anything.
  • πŸŒͺ️ Fuzzing: A testing technique providing random and unexpected inputs to detect bugs, similar to asking a waiter for %&? beers.
  • 🎁 Black Box: A type of testing viewing the system as a black box, where inputs are given, outputs are received, but the internal workings are unknown.
  • πŸ”„ Continuous Integration: A software development practice where every code change is verified by running all tests. If tests fail, the change is not included. This provides immediate feedback and improves code quality.
  • πŸƒ Mock: Used in testing to replace an object with a mock. For example, if testing communications with an aircraft, a mock can simulate the aircraft. That allows you to test the code without the need for an actual aircraft.
  • 🧭 Test Vector: A set of expected inputs and outputs. For example, for the sum function, a test vector could be 5, 3, 8, meaning if the inputs are 5 and 3, the expected output is 8.

The concept of test is broad, with many types:

  • 🧩 Unit: Focus on testing small units of code like functions or classes.
  • πŸ”— Integration: Focus on testing the integration of all system components.
  • ⚑️ Performance: Focus on testing speed and efficiency.

Depending on your program’s complexity, you may not need all these types, but any program should have at least unit tests. Tests are not just for advanced developers; even simple programs benefit from them.

Writing tests is an art. It is often better if the person writing the code does not write the tests, as they may be biased. An outsider with an unbiased perspective is ideal.

A person who thinks outside the box is usually good at writing tests. It is not enough to test the happy path; you must explore rare cases where the code might fail.

One advantage of writing tests as code is that they run automatically, making it easy to verify nothing is broken.

Ideally, code changes should not be supported until all tests pass. Tools like Jenkins or GitHub Actions facilitate this. Until all tests are passing, new changes cannot be added.

A common misconception is that tests guarantee the absence of bugs. They can detect presence but not absence. If not written correctly, bugs can still occur.

With this, we are ready to start writing our first tests with pytest.

Testing with pytest

The pytest package allows you to write and run tests for your code easily. It is a de facto industry standard. If you ever thought you did not have time to write tests, pytest removes that excuse. Write tests. Always. It is very easy.

First, we need something to test. Everything starts with code that has requirements and expected logic. Once we have it, tests ensure these requirements are always met.

For our example, we will write code to detect if the International Space Station (ISS) is flying overhead. Then, we will write tests to ensure it works correctly.

Our project requirements are:

  • πŸ›°οΈ The ISS position is obtained from the API http://api.open-notify.org/iss-now.json.
  • πŸ“ Our position on Earth, with longitude and latitude, is known and passed in the constructor.
  • πŸ“ Knowing our coordinates and those of the ISS, we determine if it is passing overhead.

The API returns the following JSON, including the ISS position, time, and request success status.

{
    "message": "success",
    "iss_position": {
        "longitude": "31.9896",
        "latitude": "-26.6497"
    },
    "timestamp": 1732180867
}

Using this API requires Internet access. We will later see how to make tests independent of external services.

The ISS orbits at a low altitude, about 418 km, circling the Earth every 90 minutes.

With this information, we are ready to write our code. We define a class DistanceISS.

# iss.py
from geopy.distance import geodesic
from math import sqrt
import requests

class DistanceISS:
    URL = "http://api.open-notify.org/iss-now.json"
    T = 0.01

    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon
    
    def coordinates_iss(self):
        try:
            response = requests.get(self.URL)
            response.raise_for_status()
            iss_data = response.json()
            iss_lat = float(iss_data['iss_position']['latitude'])
            iss_lon = float(iss_data['iss_position']['longitude'])
            return iss_lat, iss_lon
        except (requests.RequestException, ValueError, KeyError) as e:
            raise ValueError(f"Error getting ISS position: {str(e)}")
        
    def above(self):
        iss_lat, iss_lon = self.coordinates_iss()
        lat_diff = abs(iss_lat - self.lat)
        lon_diff = abs(iss_lon - self.lon)
        return lat_diff <= self.T and lon_diff <= self.T

The methods are as follows:

  • __init__: Constructor that takes our geographical coordinates of latitude and longitude.
  • coordinates_iss: Uses the API to return the current position of the ISS in latitude and longitude.
  • above: Returns True if the ISS is above us, with a tolerance of T.

We can use it by indicating our coordinates, corresponding to Buenos Aires. If it returns True, the ISS is passing over Buenos Aires.

iss = DistanceISS(-34.6037, -58.3816)
print(iss.above())
# False

Now that we have the code, let us write tests to ensure it works correctly and protect it from future modifications. We start with a simple test for __init__, verifying that the latitude lat and longitude lon are stored correctly.

# iss_test.py
import pytest
from iss import DistanceISS

def test_init():
    distance_iss = DistanceISS(10, 20)
    assert distance_iss.lat == 10
    assert distance_iss.lon == 20

To run the test, use the following command in the terminal.

pytest -v

You will get a report indicating that the test has passed, meaning all asserts meet the expected condition.

iss_test.py::test_init PASSED [100%]

The above command automatically searches for all tests in the folder. To run tests in a specific file, use:

pytest -v iss_test.py

To run a specific test:

pytest -v iss_test.py::test_init

Congratulations, you have your first test and can run it. Now, every time you make a change, you can run it to ensure nothing is broken. But there is more to test.

Next, we write a test for coordinates_iss. We want to verify that we correctly extract the latitude and longitude from the API response. However, there is an external dependency: the API requires Internet access.

A good practice is to use a mock to simulate the API, allowing the test to run without Internet access.

It is important to define the boundaries of what we want to test. In unit tests, we focus on our code. If the API stops working, our test should still work, as the code is fine.

We can mock the API as follows, instructing Python to return a specific value when requests.get is called in the iss module. We indicate None in raise_for_status to signify no issues.

# iss_test.py
import pytest
from unittest.mock import patch
from iss import DistanceISS

def test_coordinates_iss():
    with patch('iss.requests.get') as mock_get:
        mock_get.return_value.json.return_value = {
            'iss_position': {
                'latitude': '-50.0',
                'longitude': '-30.1',
            },
            'message': 'success',
            'timestamp': 1596563200
        }
        mock_get.return_value.raise_for_status = lambda: None
        
        distance_iss = DistanceISS(0, 0)
        assert distance_iss.coordinates_iss() == (-50.0, -30.1)

The test verifies that for the mocked response, the longitude and latitude are correctly extracted. But we cannot trust anyone; we must see what happens when the result is unexpected.

Imagine the API is modified and mistakenly returns incorrect longitude and latitude parameters, such as text instead of numbers. A test should ensure an exception is raised. It may seem obvious, but imagine that the code processes text and returns a valid coordinate pair. This would be dangerous.

def test_coordinates_iss_non_numerics():
    with patch('iss.requests.get') as mock_get:
        mock_get.return_value.json.return_value = {
            'iss_position': {
                'latitude': 'text',
                'longitude': 'text',
            },
            'message': 'success',
            'timestamp': 1596563200
        }
        mock_get.return_value.raise_for_status = lambda: None
        
        distance_iss = DistanceISS(0, 0)
        with pytest.raises(Exception, match="Error"):
            distance_iss.coordinates_iss()

Now imagine the API is offline. We can simulate this with our mock by using raise_for_status with an exception, verifying that coordinates_iss propagates the exception and does not return coordinates.

def test_coordinates_iss_error():
    with patch('iss.requests.get') as mock_get:
        mock_get.return_value.json.return_value = {}
        mock_get.return_value.raise_for_status.side_effect = Exception("Error")
        
        distance_iss = DistanceISS(0, 0)
        with pytest.raises(Exception, match="Error"):
            distance_iss.coordinates_iss()

Let us continue writing tests for above. This function calls coordinates_iss, which uses the API requiring Internet, so we will use a mock.

With this mock, we simulate coordinates_iss returning the coordinates of Madrid.

def test_above():
    with patch.object(DistanceISS, 'coordinates_iss', return_value=(40.4167, -3.7033)):
        iss = DistanceISS(40.4167, -3.7033)
        assert iss.above() == True

At this point, we have several tests to verify our code works correctly. We have covered expected paths and potential issues, ensuring behavior is as desired in all cases.

Multiple tests with parametrize

Although we have written tests for all functions, they are not exhaustive. More value combinations need testing.

We can write a test with these coordinates.

def test_above_1():
    with patch.object(DistanceISS, 'coordinates_iss', return_value=(1, 1)):
        distance_iss = DistanceISS(1.001, 1.001)
        assert distance_iss.above() == True

And another test with different coordinates.

def test_above_2():
    with patch.object(DistanceISS, 'coordinates_iss', return_value=(1, 1)):
        distance_iss = DistanceISS(0.999, 0.999)
        assert distance_iss.above() == True

Notice the duplicate code; only one line changes, but four are repeated.

pytest offers parametrize to test different combinations without repeating code. This introduces the concept of test vectors, which have two parts:

  • πŸ“ Input: The function arguments, in our case, our position and the ISS coordinates.
  • πŸ“ Expected Output: The expected result, the expected distance between our position and the ISS.

Test vectors continuously verify that given inputs yield expected outputs. In parametrize, our test vector includes many combinations.

@pytest.mark.parametrize("lat, lon, iss_lat, iss_lon, above", [
    (40.4, -3.7, 40.4, -3.702, True),
    (34.0, -118.2, 34.001, -118.2, True),
    (51.5, -0.1, 0.4, -3.3, False)
])
def test_distance_iss(lat, lon, iss_lat, iss_lon, above):
    with patch.object(DistanceISS, 'coordinates_iss', return_value=(iss_lat, iss_lon)):
        distance_iss = DistanceISS(lat, lon)
        assert distance_iss.above() == above

Running it generates multiple tests, one for each input.

iss_test.py::test_distance_iss[40.4--3.7-40.4--3.702-True]
iss_test.py::test_distance_iss[34.0--118.2-34.001--118.2-True]
iss_test.py::test_distance_iss[51.5--0.1-0.4--3.3-False]

Using fixtures in pytest

Imagine multiple tests use the same data. In this case, both tests use the same list.

def test_sum():
    assert sum([1, 2, 3, 3, 4, 5]) == 15

def test_length():
    assert len([1, 2, 3, 3, 4, 5]) == 5

This results in duplicate code. If you have 10 functions, even more. If you want to change the data, you must modify it everywhere.

You can create a data variable and use it in all tests. This solution is valid and may suffice in simple cases.

data = [1, 2, 3, 3, 4, 5]

def test_sum():
    assert sum(data) == 15

def test_length():
    assert len(data) == 5

However, pytest offers fixture for similar functionality with more control. The following code behaves the same, but you can pass the fixture to each test individually.

import pytest

@pytest.fixture
def data():
    return [1, 2, 3, 4, 5]

def test_sum(data):
    assert sum(data) == 15

def test_length(data):
    assert len(data) == 5

An interesting property is that pytest maintains the fixture value. If a test changes or deletes data, nothing happens. Each test accesses a separate copy, unlike the previous method.

@pytest.fixture
def data():
    return [1, 2, 3, 4, 5]

def test_sum(data):
    assert sum(data) == 15
    # We modify data here ...
    data.append(6)

def test_length(data):
    # ... but data does not change here
    assert len(data) == 5

Returning to our ISS example, we can use the fixture for the response as follows.

@pytest.fixture
def response():
    return {
        'iss_position': {'latitude': 0, 'longitude': 0},
        'message': 'success',
        'timestamp': 1596563200
    }

@pytest.fixture
def iss():
    return DistanceISS(0, 0)

def test_coordinates_iss_non_numerics(response, iss):
    with patch('iss.requests.get') as mock_get:
        mock_get.return_value.json.return_value = response
        mock_get.return_value.raise_for_status = lambda: None
        
        assert iss.above() == True

Finally, you can configure the scope of a fixture as follows.

@pytest.fixture(scope="session")
def data():
    return [1, 2, 3, 4, 5]

def test_sum(data):
    assert sum(data) == 15
    # We modify data here ...
    data.append(6)

def test_length(data):
    # ... and since it is a session fixture,
    # it is modified here as well
    assert len(data) == 5

The scope determines when it is recreated. By default, function is used, meaning each function receives a new fixture.

Other options include session, creating a single fixture for the entire session. If modified in a test, it affects the rest.

Other pytest features

pytest has a wide variety of plugins to extend its functionality. Some are default, others require installation.

  • ⏭️ skip
  • ⏳ timeout
  • ⚠️ flaky
  • ⏱️ freeze time
  • πŸ“ benchmark

To install them:

pip install pytest-timeout
pip install pytest-rerunfailures
pip install pytest-freezegun
pip install pytest-benchmark

⏭️ skip: Skip a test so it does not run. Remember, if a test fails, the solution is not to skip it but to understand and fix the problem.

@pytest.mark.skip(reason="Does not run")
def test_that_does_not_run():
    pass

Running the tests shows which are skipped and why.

iss_test.py::test_that_does_not_run SKIPPED (Does_not_run)

⏳ timeout: Set a maximum time for tests. If not finished within that time, it fails.

In this example, the test should pass, but it takes 2 seconds, and the timeout is 1 second, so it fails.

import time
@pytest.mark.timeout(1)
def test_with_timeout():
    time.sleep(2)
    assert 1 == 1

Running it results in:

iss_test.py::test_with_timeout FAILED

⚠️ flaky: Mark a test as flaky or unstable, if it fails sometimes. Specify the maximum retry attempts. This will try to run the test multiple times. If not passed after all attempts, it fails.

Tests should be deterministic, but sometimes flaky tests are unavoidable.

import random

@pytest.mark.flaky(reruns=10)
def test_flaky():
    die = random.choice([1, 2, 3, 4, 5, 5, 6])
    assert die == 1

Running it shows multiple attempts, up to 10. If at least one passes, it is considered passed.

iss_test.py::test_flaky RERUN
iss_test.py::test_flaky RERUN
iss_test.py::test_flaky PASSED
1 passed, 2 rerun in 0.09s

⏱️ freeze_time: Freeze the current date, similar to a mock.

@pytest.mark.freeze_time('2023-01-01')
def test_today_date():
    assert date.today() == date(2023, 1, 1)

πŸ“ benchmark: Measure test execution time.

def heavy_task():
    return sum(i * i for i in range(100000000))

@pytest.mark.benchmark()
def test_heavy_task_benchmark(benchmark):
    result = benchmark(heavy_task)
    assert result == 333333328333333350000000

The time taken is displayed when the test runs, averaged over multiple runs. In this case, it takes about 4.4 seconds.

Name (time in s) Min Max Mean
----------------------------------------------
test_heavy_task_benchmark 4.48 4.61 4.54

You can set a condition with an assert where the test only passes if it takes less than a certain time. In this example, it fails if it takes more than 5 seconds.

@pytest.mark.benchmark()
def test_heavy_task_benchmark(benchmark):
    result = benchmark(heavy_task)
    assert result == 333333328333333350000000
    assert benchmark.stats['mean'] < 5

This protects against future code modifications that may reduce efficiency.

Coverage with pytest-cov

A crucial metric in testing is code coverage, indicating the percentage of code covered by at least one test. The goal is 100% code coverage.

Define an operate function that takes two parameters a and b and performs an addition or subtraction based on the operation.

# example.py
def operate(a, b, operation):
    if operation == "sum":
        return a + b
    if operation == "subtraction":
        return a - b
    else:
        raise ValueError(f"Invalid operation: {operation}")

Create a file with a test.

# example_test.py
import pytest
from example import operate

def test_operate():
    assert operate(2, 3, "sum") == 5

Install the pytest-cov package.

pip install pytest-cov

Running tests with --cov provides a coverage report. Here, it is 50%.

pytest --cov=example
# Name Stmts Miss Cover
# --------------------------------
# example.py 6 3 50%

It is 50% because only the sum operation is tested, not other paths. It is an incomplete test. To achieve 100%, test all paths:

  • βž• Addition.
  • βž– Subtraction.
  • ⚠️ An unrecognized operator.

The following test covers all three paths.

def test_operate():
    assert operate(2, 3, "sum") == 5
    assert operate(2, 3, "subtraction") == -1
    with pytest.raises(ValueError):
        operate(2, 3, "multiplication")

Now, coverage is 100%.

pytest --cov=example
# Name Stmts Miss Cover
# --------------------------------
# example.py 6 0 100%
# --------------------------------
# TOTAL 6 0 100%

Although 100% coverage is ideal, it does not guarantee anything. Code with 100% coverage may still have bugs. A test shows the presence of a bug but not its absence.

Let us prove it. Imagine a mistake where the operation subtract uses + instead of -. This code is incorrect.

def operate(a, b, operation):
    if operation == "sum":
        return a + b
    if operation == "subtraction":
        return a + b # <- Mistake. WRONG!
    else:
        raise ValueError(f"Invalid operation: {operation}")

In the test, if 0 and 0 are subtracted, the result is 0. The correct result is achieved with incorrect code.

def test_operate():
    assert operate(2, 3, "sum") == 5
    
    # This assert passes, but there is a bug.
    assert operate(0, 0, "subtraction") == 0

This test passes, and coverage is 100%. This could falsely assure that the code is correct. But it is not.

The bug can be detected with the following code, a counterexample showing something is wrong.

def test_operate():
    assert operate(5, 1, "subtraction") == 4

It is important to test as many combinations as possible, including positive numbers, negative numbers, zero, decimals, and unexpected inputs.

To exclude a function from the coverage metric, use the comment # pragma: no cover. Python will not consider it in the percentage calculation.

def operate(a, b, operation): # pragma: no cover
    # ...

pytest generates interesting coverage reports. When you have hundreds of functions, a nice user interface to navigate is helpful.

pytest --cov=example --cov-report=html

Fuzzing with hypothesis

Fuzzing generates numerous random inputs with various values to find cases where behavior is unexpected. Unlike previous tests, fuzzing is automatic.

Automatic generation is advantageous, allowing thousands of combinations. Writing this manually would be tedious.

Fuzzing detects:

  • 🎯 Possible edge cases not considered.
  • ⚑ Performance issues. Some inputs may take too long.
  • πŸ”’ Inconsistencies, known as invariants and idempotency. For example, sorting a list and re-sorting it should yield the same result. Reversing a list twice should return the initial list.

Here is how to use the hypothesis package. First, install it.

pip install hypothesis

Generate an example of an int, a random int.

import hypothesis.strategies as st
print(st.integers().example())
# -14251

Add restrictions, such as between 0 and 100.

print(st.integers(min_value=0, max_value=100).example())
# 95

Generate text in str, including all kinds of letters.

import hypothesis.strategies as st
print(st.text().example())
# ÈΒͺΒͺ󫦬aûæ4c

For multiple examples:

for _ in range(5):
    print(st.integers().example())

hypothesis can generate almost anything:

  • integers: Integers or int.
  • text: Text or str.
  • floats: Float numbers.
  • booleans: bool types.
  • lists: Lists or list.
  • tuples: Tuples or tuple.
  • dictionaries: Dictionaries or dict.
  • sets: Sets or set.

Now let us focus on integrating hypothesis with pytest.

Define a function that reverses an str, changing abc to cba.

def reverse(s: str) -> str:
    return s[::-1]

A good fuzzing example is verifying that using reverse twice returns the initial value.

The following test generates 10000 random str examples, verifying the property.

from hypothesis import given, settings
from hypothesis import strategies

@settings(max_examples=10000)
@given(strategies.text())
def test_reverse(s):
    assert reverse(reverse(s)) == s

Run it as follows, and it will show statistics of the tested examples. In this case, everything worked correctly.

pytest -v --hypothesis-show-statistics
# 10000 passing examples
# 0 failing examples
# 0 invalid examples

In summary:

  • Always write tests, even for simple code. They help verify functionality and protect against future changes.
  • Include tests with unexpected inputs to ensure code resilience.
  • Use coverage metrics as a guide, but remember that 100% coverage does not guarantee bug-free code.