Link Search Menu Expand Document

Yield and generators

Generators and yield were introduced in PEP255. Their use returns a lazy iterator. Next, we will see what lazy means.

A lazy iterator does not store the values in memory. It allows them to be generated β€œon the fly.” Hence its name. It does not store them. Let’s see two examples:

  • [: Creates a list of 1000 items. This is NOT lazy.
  • (: Creates a generator of 1000 elements. This is lazy.
not_lazy = [i for i in range(1000)] # NOT lazy
is_lazy = (i for i in range(1000))  # IS lazy

At first glance, it might seem that they are the same since the result of the following code is identical.

for i in not_lazy: print(i)
for i in is_lazy: print(i)

However, they are two different things. The type of the first is list and the second generator.

print(type(not_lazy)) # <class 'list'>
print(type(is_lazy))  # <class 'generator'>

We can see that the size they occupy in memory is different. Here we have the first advantage of lazy iterators. They hardly occupy any memory.

import sys
print(f"a: {sys.getsizeof(not_lazy)} bytes")
# a: 8856 bytes

print(f"b: {sys.getsizeof(is_lazy)} bytes")
# b: 112 bytes

But there are disadvantages. The generator cannot be indexed.

print(not_lazy[50]) # 50
print(is_lazy[50])  # TypeError

The generator cannot be iterated twice.

for i in is_lazy: pass
for i in is_lazy: print(i)

Having seen the differences, let’s get down to practicalities. Generators allow us to generate values on demand. This allows us to save resources. As you have seen, generators consume a fixed amount of memory. The disadvantage is that the values are not there but are generated as they are needed.

This is very useful when you work with very large data and files. It is common that when you open a file, like the open() function, you are not loading the whole file into memory but creating a kind of generator. This allows you to iterate the file, but it is not loaded into memory. It is loaded as needed.

We have seen how to create a generator with ( using list comprehensions, but you can also create it using yield.

def my_generator():
    yield 0
    yield 1
    yield 2

As you can see, it is a generator.

print(type(my_generator()))
# <class 'generator'>

And you can use them as follows.

for i in my_generator():
    print(i)
# 0, 1, 2

Since a generator is an iterator, we can manually access the following element

g = my_generator()
print(next(g)) # 0
print(next(g)) # 1
print(next(g)) # 2

When you get to the end, you will get StopIteration. That is, at the 4 time you call next.

print(next(g)) # StopIteration

For example, you can create a prime number generator. First, we define is_prime to find out if a number is prime.

def is_prime(n: int) -> bool:
    return n > 1 and all(n % i != 0 for i in range(2, int(n**0.5) + 1))

And we create our prime number generator. Notice that we use yield. Whenever you think of generators, think of yield.

def primes(n):
    num, counter = 2, 0
    while counter < n:
        if is_prime(num):
            yield num
            counter += 1
        num += 1

In this example, we generate the first 5 prime numbers. The advantage of this is that they are generated as needed.

for p in primes(5):
    print(p)
# 2, 3, 5, 7, 11

In addition, there are many generator-friendly functions such as sum(), any(), all(), min(), max(), map(), filter(), or enumerate() among others. This is very useful since they allow working with the generator data without having to load all its elements in memory.

For example, we can add the first 100000 prime numbers with sum() using our generator.

print(sum(primes(100000)))
# 62260698721

Finally, we can use a list comprehension as we have seen above, but using generators. An example:

  • We generate the first 1000 prime numbers.
  • We square each one **2.
  • We add all with sum.
x = sum(x**2 for x in primes(100))
print(x)
# 8384727