Asynchronous programming with aiohttp
In this example, we will demonstrate how to make asynchronous requests to an external API. Specifically, we will use the GitHub API to retrieve the number of followers for different accounts. These requests can be made in two ways:
- π Synchronous: We send a request and do not send the next one until the previous one is completed. This is usually the slowest method because we are idle while waiting for the serverβs response.
- ποΈ Asynchronous: We send all the requests simultaneously and wait for the responses. This is generally faster because the processing occurs in parallel.
π The simplest way is to perform the requests synchronously using requests
. We make an HTTP request to the API and extract the followers
field. Error handling is simplified here, so you may want to handle exceptions based on your specific needs.
import requests
url_base = "https://api.github.com/users/{}"
def get_followers(user):
url = url_base.format(user)
r = requests.get(url)
return user, r.json()["followers"]
Now, we can retrieve the followers for three different users. You can access the same content by entering the following URLs in your browser:
api.github.com/users/python
api.github.com/users/google
api.github.com/users/firebase
for user in ["python", "google", "firebase"]:
user, followers = get_followers(user)
print(f"Followers of {user}: {followers}")
# Followers of python: 22840
# Followers of google: 46406
# Followers of firebase: 4113
However, this solution processes all requests sequentially, which is inefficient. It performs a request, waits, then performs the next request, and so on. This approach would be very slow with hundreds of users.
ποΈ To improve efficiency, we can perform the requests asynchronously using asyncio
and aiohttp
. This allows us to launch multiple requests simultaneously and then wait for the responses. This method is more efficient because the requests are processed in parallel.
import asyncio
import aiohttp
url_base = "https://api.github.com/users/{}"
users = ["python", "google", "firebase"]
async def get_followers(session, user):
url = url_base.format(user)
async with session.get(url) as r:
return user, int((await r.json())["followers"])
async def get_all(users):
async with aiohttp.ClientSession() as s:
tasks = [get_followers(s, u) for u in users]
return await asyncio.gather(*tasks)
r = asyncio.run(get_all(users))
for user, rr in r:
print(f"Followers of {user}: {rr}")
If you try both ways you will see that the asynchronous approach is faster. The larger the users
list, the more noticeable the difference.
βοΈ Exercises:
- Modify
get_all
to process a maximum ofMAX_TASKS
at a time. This may be necessary because launching millions of asynchronous functions at once is not advisable if you have millions of users. - Handle all exceptions that may occur in the asynchronous case. For example, if a request fails, manage it in a way that does not disrupt the entire process.