Asynchronous programming with aiohttp

In this example, we will demonstrate how to make asynchronous requests to an external API. Specifically, we will use the GitHub API to retrieve the number of followers for different accounts. These requests can be made in two ways:

🐌 Synchronous: We send a request and do not send the next one until the previous one is completed. This is usually the slowest method because we are idle while waiting for the server’s response.
🏎️ Asynchronous: We send all the requests simultaneously and wait for the responses. This is generally faster because the processing occurs in parallel.

🐌 The simplest way is to perform the requests synchronously using requests. We make an HTTP request to the API and extract the followers field. Error handling is simplified here, so you may want to handle exceptions based on your specific needs.

import requests

url_base = "https://api.github.com/users/{}"

def get_followers(user):
    url = url_base.format(user)
    r = requests.get(url)
    return user, r.json()["followers"]

Now, we can retrieve the followers for three different users. You can access the same content by entering the following URLs in your browser:

api.github.com/users/python
api.github.com/users/google
api.github.com/users/firebase

for user in ["python", "google", "firebase"]:
    user, followers = get_followers(user)
    print(f"Followers of {user}: {followers}")

# Followers of python: 22840
# Followers of google: 46406
# Followers of firebase: 4113

However, this solution processes all requests sequentially, which is inefficient. It performs a request, waits, then performs the next request, and so on. This approach would be very slow with hundreds of users.

🏎️ To improve efficiency, we can perform the requests asynchronously using asyncio and aiohttp. This allows us to launch multiple requests simultaneously and then wait for the responses. This method is more efficient because the requests are processed in parallel.

import asyncio
import aiohttp

url_base = "https://api.github.com/users/{}"
users = ["python", "google", "firebase"]

async def get_followers(session, user):
    url = url_base.format(user)
    async with session.get(url) as r:
        return user, int((await r.json())["followers"])

async def get_all(users):
    async with aiohttp.ClientSession() as s:
        tasks = [get_followers(s, u) for u in users]
        return await asyncio.gather(*tasks)

r = asyncio.run(get_all(users))
for user, rr in r:
    print(f"Followers of {user}: {rr}")

If you try both ways you will see that the asynchronous approach is faster. The larger the users list, the more noticeable the difference.

✏️ Exercises:

Modify get_all to process a maximum of MAX_TASKS at a time. This may be necessary because launching millions of asynchronous functions at once is not advisable if you have millions of users.
Handle all exceptions that may occur in the asynchronous case. For example, if a request fails, manage it in a way that does not disrupt the entire process.