Itertools is a Python module which has a set of tools to build complex and customized iterators to exactly suit your application. But first, we need to know what are iterators. Let’s have a look.
By the end of this tutorial, you will know the following:
- Difference between iterators and iterables
- Advantage of Itertools module
- Various functions in the itertools module
- Code along examples for using the functions
What are iterators and iterables?
Iterators and Iterables have a clear difference in meaning when it comes to Python. In very simple terms, iterables are objects on which you can iterate. For example, list, tuple, and dictionary are Python’s built-in iterable container data types.
So, when we want to iterate upon these iterables, we write a “for loop” or a list/dictionary comprehension. And with these, Python internally creates what we call as Iterators which essentially are responsible for “Iteration”.
Strictly speaking, Iterables are objects that contain the __iter__ method, which return an iterator. Now, to iterate upon the iterable, the iterators have the __next__ method which is responsible to fetch the next item from the iterable and track the indexes.
Check out what books helped 20+ successful data scientists grow in their career.
What is the Itertools module?
“Iter” in Itertools stands for Iterables. The itertools module in Python offers a set of tools or functions which are designed for a specific task but can also be used in combinations to build a more complex iterator.
You can think of these as blocks of efficient functions which make our lives easy as we don’t have to write functions for common and repetitive tasks manually. And you can also use them as an “Iterator Algebra” when you combine some of these for your use cases.
So with these functions, you can create clean, efficient and smart code rather than creating a messy and less efficient one.
Now, let’s have a look at the functions it has so that you can actually appreciate its presence and incorporate itertools in your day to day coding and competitive coding as well!
Before getting to the first function, let’s see the zip() function in Python which is not a part of the itertools module, but a very smart and handy iterator.
Zip function takes multiple iterables like lists, tuples or dictionaries and then iterates on all of them parallelly. That is, you can do the same operation on the elements of the same index of all these iterables.
a = [1, 2, 3] b = [4, 5, 6] c =  for x,y in zip(a,b): c.append(x+y) print(c)
>> [5, 7, 9]
As you saw, the zip function literally zipped the two lists and operated on the elements at the same indices. But what if the number of elements was not the same in the lists?
zip_longest() is a function of the itertools module which lets you work with multiple iterables of different sizes while zipping. With zip, the iterable of the shortest length is considered and the rest of the elements of all other iterables are ignored. Let’s see how we can tackle this with zip_longest.
a = [1, 2, 3] b = [4, 5, 6, 7, 8] c =  for x,y in zip(a,b): c.append(x+y) print(c)
# Output: [5, 7, 9]
Do you see it? It ignored values 7 and 8.
Now let’s see how zip_longest works.
from itertools import zip_longest a = ['A', 'B', 'C'] b = ['4', '5', '6', '7', '8', '9'] c =  for x,y in zip_longest(a,b, fillvalue = 'X'): c.append(x+y) print(c)
>> [‘A4’, ‘B5’, ‘C6’, ‘X7’, ‘X8’, ‘X9’]
So the argument fill value is needed to be passed to specify with which value you need Python to fill the shorter iterables with.
The accumulate() function takes in an iterable and a function which does some operation on the elements of the iterables. The results are then accumulated and operated upon. Sounds confusing? Let’s have a look at the code.
from itertools import accumulate from operator import mul A = [1, 2, 3, 4, 5] print(list(accumulate(A, mul)))
>> [1, 2, 6, 24, 120]
So accumulate took the list A and a function from the operator module which multiplies two numbers. Then it iterates and produces output in the following manner:
- 1 is the first element, so it as is 1
- 1 multiplied by 2 is 2
- 2 multiplied by 3 is 6
- 6 multiplied by 4 is 24
- 24 multiplied by 5 is 120
See how it accumulates the outputs? The default function is- sum. So if you don’t pass any function, by default accumulate will give the accumulated sums. You can also write your own custom functions which take 2 inputs and do an operation on them. Moreover, you can also pass a lambda function directly.
The filterfalse function does exactly what its name is- it filters the elements which give false against a certain condition. That is, only the elements not matching the condition will be in the output.
from itertools import filterfalse A = range(10) print(list(filterfalse(lambda x: x%2==0, A)))
>> [1, 3, 5, 7, 9]
As you see, our condition checks for even numbers. And filterfalse only gives the output where the condition holds false.
Starmap function creates an iterator that takes in a function and an iterable with iterables within it as arguments. The function is then applied on the iterables inside just like the map function.
The difference between map and starmap is, literally, of a “star”. What does this star do?
In essence, to apply the map function, the argument passed to it must not be container types themselves. Let’s understand this with an example.
lis = [(2, 2), (3, 2), (4, 2)] print(list(map(pow, lis)))
The above code tries to map the function pow which takes in 2 arguments x and y and returns x**y. However, the above code will throw an error as it gets one tuple (2, 2) as the argument and it doesn’t know the other argument.
>> TypeError: pow expected at least 2 arguments, got 1
Here’s where starmap comes in handy. It takes tuples within the list and unpacks them.
from itertools import starmap lis = [(2, 2), (3, 2), (4, 2)] print(list(starmap(pow, lis)))
>> [4, 9, 16]
It worked just as we wanted. So, as I mentioned earlier, the map function is something like this – function(a,b), whereas, the starmap is a function(*c). Where the * is used to unpack the iterables within the outer list.
Apart from these iterator functions, Iterto.5.5ols also has other functions like chain(), chain.from iterable(), groupby(), islice(), compress(), dropwhile(), takewhile() and tee().
The itertools module also has a set of iterators which can generate values until they’re interrupted. And these won’t overhaul your RAM as it generates one value at a time and doesn’t process all of it at once. Which means it can successfully generate an infinite stream of data. Let’s have a quick look over these:
count(start, step) generates numbers till infinity starting from the start argument with increments defined by step argument.
for i in itertools.count(1,2): print(i)
>> 1 3 5 7…
cycle(arg) returns an infinite cycle of the argument passed.
for i in itertools.cycle(‘ABC’): print(i)
>> ABC ABC ABC ABC ABC…
repeat(elem, number) returns the element the number of times specified.
for i in itertools.repeat(10, 4): print(i)
>> 10 10 10 10
Apart from the above iterators, Itertools also offer functions which generate permutations and combinations of the inputs provided. These come in very handy during competitive programming too. Repeat specifies the length of combinations.
product('AB', repeat=2) #>> (AA, AB, BA, BB)
permutations('ABC', 2) #>> (AB, AC, BA, BC, CA, CB)
combinations('ABC', 2) #>> (AB, AC, BC)
combinations_with_replacement('ABC', 2) #>> (AA, AB, AC, BB, BC, CC)
Before you go
That’s it for this tutorial. You can practice these examples, see how the functions work and then implement your more complex examples to get a better hand at these. Itertools is clearly one of the most handy and crucial strengths that you can have to get better at Python.
Chayan is a creative Data Scientist with an eye for details. An everyday learner and blogger, he has extreme eagerness to share knowledge and support the Data Science community. Connect with him on LinkedIn to get in touch and don’t forget to check out his Medium blogs.
Data Science | Machine Learning | Tech Blogger – upGrad