Comprehensions in Python: The hard way

November 23, 2019

<rant>

The internet has exactly a gazillion articles on what python comprehensions are, what their use cases are, and the fact that they are faster¹. Most of them go as far as saying they’re Advanced Python features, along with map, lambda etc. How can you freaking say that about built-in, well documented feature of one of the most popular languages of the current time? I mean just how??

But… I digress. Coming back to the topic at hand, I’ve unfortunately, I’ve found exactly zero articles on why these are faster than the standard Python loops.

</rant>

So, adding my hopefully better thoughts to the sea of those gazillion articles, let’s deep dive into whatever the hell they are. I’m just going to cover the basics of a list comprehension with a couple of use-cases and then deep-dive into dissecting the code. If you understand everything towards the end of this article, I’m sure you’ll be able to write complex comprehensions. If not, please feel free to comment and I’ll be happy to help you out!

Introduction: Whatever the hell are comprehensions

Let’s start with a question here first:

If I have a list in Python with first integer element, are all the elements integer? If you know this, you’ll almost know why simple for loops are slow.

Okay, so you have a simple boring old list of integers, say:

le_ints = [1, 4, 8, 10, 1231421]  # I know, that escalated quickly

And we want another simple boring list that has 1 added to these integers, what’d we do? Easy peasy!

la_ints = []
for i in le_ints:
    la_ints.append(i + 1)

Well, not really, no!

We’re working with one of the most popular languages of the century, which begs the conclusion that

And there is Joey! And here it is, the lovely list comprehension:

la_ints = [i + 1 for i in le_ints]

I know, so beautiful, so easy, so blissful, so… zen, right?

Now, this is the very basic syntax, but it supports conditionals and can obviously be used with any python expression as long as you wrap it up within that single line.

Some more use-cases

Let’s just go over the other interesting things you can do with comprehensions - no deep dive, just the very basics. Refer to the other gazillion articles to see everything else you can do.

Conditional comprehensions

Create a list such that even numbers are divided by 2 & odd numbers are kept as is

what_are_the_odds = [i/2 if i%2 == 0 else i for i in source_ints]

Flattening a 2D array

flat_earth_society = [i for j in source_2d_array for i in j]

Transposing a 2D matrix

transponster = [[row[i] for row in source_2d_ints] for i in range(0, len(source_2d_ints))]

That’s all cool, but why are they faster?

Let’s get the very obvious bits out of the way - comprehensions are NOT simple syntactic sugars. This fact should be crystal clear since you now know they’re faster, which implies they work differently behind the scenes. If you don’t know this already, python is built on top of C - the language most of computer science graduates learn in their sophomore year. C itself is a lot faster than Python due to a myriad of reasons (which is out of scope of this blog). Coming back, the short answer to why comprehensions are faster is simple - using list comprehensions pushes the loop execution from the Python interpreter into compiled C code. Let’s see how exactly that happens - time to actually talk about the actual advanced stuff. We’ll be using the Python bytecode disassembler dis.

Note: The things we talk about in this section are for Python 3.8 and might look different for other versions.

The test functions

Let’s look at the ~~hyphothetical lab rats~~ functions we’ll use to show what’s happening in the lower layers

The basic implementation

This is the basic implementation using a boring old for loop - nothing too crazy so far.

1
2
3
4
5
6
7
8
9


def le_basic_fun():
    """
    A basic function that creates a list of integers using another list.
    Adds 1 to all the integers using a basic for loop
    """
    la_ints = []
    for i in range(10**7):
        la_ints.append(i + 1)
    return la_ints

Now, let’s looks at the disassembled bits to understand what exactly happens behind the scenes. A quick guide to understand what each column below means: 1. the line number, for the first instruction of each line 2. the current instruction, indicated as –>, 3. a labelled instruction, indicated with >>, 4. the address of the instruction, 5. the operation code name, 6. operation parameters, and 7. interpretation of the parameters in parentheses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


In [6]: dis.dis(le_basic_fun)
  6           0 BUILD_LIST               0
              2 STORE_FAST               0 (la_ints)

  7           4 SETUP_LOOP              30 (to 36)
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               1 (10000000)
             10 CALL_FUNCTION            1
             12 GET_ITER
        >>   14 FOR_ITER                18 (to 34)
             16 STORE_FAST               1 (i)

  8          18 LOAD_FAST                0 (la_ints)
             20 LOAD_METHOD              1 (append)
             22 LOAD_FAST                1 (i)
             24 LOAD_CONST               2 (1)
             26 BINARY_ADD
             28 CALL_METHOD              1
             30 POP_TOP
             32 JUMP_ABSOLUTE           14
        >>   34 POP_BLOCK

  9     >>   36 LOAD_FAST                0 (la_ints)
             38 RETURN_VALUE

I’ll cover the opcodes relevant to the problem at hand briefly, but if you want to know about all the opcodes, check out the documentation of dis (unintended pun, I promise :D).

All of this might not make sense right away, so bear with me here. Let’s remember the fact that function calls add significant overhead². The first call function is the call to get the evaluate range, so let’s ignore that as it’ll stay the same across different test subjects. There are a couple of things you need to look at here:

On Line 3 (instruction address 2), there’s the variable la_ints is initialized
On Line 10 (instruction address 14), is where the real fun begins: it’s where the for loop executes (between addresses 18 to 34)
On Line 13 (instruction address 18), the variable la_ints is loaded followed by it’s method append on Line 14 / address 20

Basically, while executing line 8, through the range chosen, the function has to load the variable followed by it’s append method - this is the effort our loop is doing during all the iterations to fully qualify our append method for the list. We can obviously cache the append method, but unfortunately, it doesn’t scale well for large for loops. I do talk about this approach here.

Let’s move to the comprehensive way of doing this

The list comprehension implementation

Now, let’s look at how list comprehension will work behind the scenes.

1
2
3
4
5
6


def le_comprehensive_fun():
    """
    A function that creates a list of integers using another list.
    Adds 1 to all the integers using a list comprehension
    """
    return [i + 1 for i in range(10**7)]

And the disassembly of the comprehension

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


In [8]: dis.dis(le_comprehensive_fun)
  6           0 LOAD_CONST               1 (<code object <listcomp> ...)
              2 LOAD_CONST               2 ('le_comprehensive_fun.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               3 (10000000)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

Disassembly of <code object <listcomp> at ...>:
  6           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LOAD_CONST               0 (1)
             12 BINARY_ADD
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE

As you see we still have a for loop, just like the basic approach - but there’s no temporary variable, no method resolution for append method. Add to that the fact that there’s lesser stack management (no POP_TOP, POP_BLOCK etc.) since there’s a single scope. We save up on little memory too, since there’s no temporary variable involved now - but it does help when you have a more complicated list of elements. The real fun happens at instruction 14 here - our CPython interpreter uses in the built-in low-level optimized function call to C.

Speed comparisons

With all the disassembly we’ve done here - it begs the question. How much of a difference does it really make? Here’s a simple timeit comparison of the two implementations side-by-side.

Cranking it up a notch: Memory

Yes, the advanced ain’t over yet! We’ve just seen why list comprehensions are fun and faster! Unfortunately, the approaches above will definitely lead to memory leaks when you’re working with complex objects and bigger datasets. If you’re working on a data science application, this is definitely gonna drive your infrastructure bills since you’ll need to have enough memory for processing them objects. For the experienced folks out there, you know where we’re going - it’s time to generate (:D) better solutions.

I present to you, generator expessions:

1
2
3
4
5


def le_generated_fun():
    """
    A function that returns a generator for a list of integers based on another list
    """
    return (x+1 for x in range(10**6))

What this returns to you is a generator, which is a function that returns an iterator - basically something you can loop over. If you’re not familiar with generators, I urge you to read up on them - it’s an important tool to have in your arsenal.

Just another small tidbit before I wrap this up, you might have the thought that list comprehensions are syntactic sugars for generator expressions wrapped in a list. This is not correct. Basically,

la_ints = [i + 1 for i in le_ints]

is not equivalent to

la_ints = list(i + 1 for i in le_ints)

Since the generator expression will use extend method internally to add iterator elements. Check out the disassembly to see how they’re different.

References

Appendix

As a sidenote, 81.52% of facts on the internet are also false
Function call overhead comparision

Caching the append method

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


    def le_mediocre_fun():
        """
        A function that creates a list of integers using another list.
        Appends 1 to all the integers using a basic for loop
        with a cached append function call
        """
        la_ints = []
        la_appender = la_ints.append
        for i in range(10**6):
            la_appender(i + 1)
        return la_ints
    

Here’s what the disassembly looks like for this implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


    In [7]: dis.dis(le_mediocre_fun)
      7           0 BUILD_LIST               0
                  2 STORE_FAST               0 (la_ints)

      8           4 LOAD_FAST                0 (la_ints)
                  6 LOAD_ATTR                0 (append)
                  8 STORE_FAST               1 (la_appender)

      9          10 SETUP_LOOP              28 (to 40)
                 12 LOAD_GLOBAL              1 (range)
                 14 LOAD_CONST               1 (1000000)
                 16 CALL_FUNCTION            1
                 18 GET_ITER
            >>   20 FOR_ITER                16 (to 38)
                 22 STORE_FAST               2 (i)

     10          24 LOAD_FAST                1 (la_appender)
                 26 LOAD_FAST                2 (i)
                 28 LOAD_CONST               2 (1)
                 30 BINARY_ADD
                 32 CALL_FUNCTION            1
                 34 POP_TOP
                 36 JUMP_ABSOLUTE           20
            >>   38 POP_BLOCK

     11     >>   40 LOAD_FAST                0 (la_ints)
                 42 RETURN_VALUE
    ```