Introduction¶
Contents
Simple example¶
Note
You can execute any code below from the code folder using the regular python shell or from inside an IPython session or Jupyter notebook. In such a case, you might want to use the magic command %timeit instead of the custom one I wrote.
NumPy is all about vectorization. If you are familiar with Python, this is the main difficulty you’ll face because you’ll need to change your way of thinking and your new friends (among others) are named “vectors”, “arrays”, “views” or “ufuncs”.
Let’s take a very simple example, random walk. One possible object oriented approach would be to define a RandomWalker class and write a walk method that would return the current position after each (random) step. It’s nice, it’s readable, but it is slow:
Object oriented approach
class RandomWalker:
def __init__(self):
self.position = 0
def walk(self, n):
self.position = 0
for i in range(n):
yield self.position
self.position += 2*random.randint(0, 1) - 1
walker = RandomWalker()
walk = [position for position in walker.walk(1000)]
Benchmarking gives us:
>>> from tools import timeit
>>> walker = RandomWalker()
>>> timeit("[position for position in walker.walk(n=10000)]", globals())
10 loops, best of 3: 15.7 msec per loop
Procedural approach
For such a simple problem, we can probably save the class definition and concentrate only on the walk method that computes successive positions after each random step.
def random_walk(n):
position = 0
walk = [position]
for i in range(n):
position += 2*random.randint(0, 1)-1
walk.append(position)
return walk
walk = random_walk(1000)
This new method saves some CPU cycles but not that much because this function is pretty much the same as in the object-oriented approach and the few cycles we saved probably come from the inner Python object-oriented machinery.
>>> from tools import timeit
>>> timeit("random_walk(n=10000)", globals())
10 loops, best of 3: 15.6 msec per loop
Vectorized approach
But we can do better using the itertools Python module that offers a set of functions creating iterators for efficient looping. If we observe that a random walk is an accumulation of steps, we can rewrite the function by first generating all the steps and accumulate them without any loop:
def random_walk_faster(n=1000):
from itertools import accumulate
# Only available from Python 3.6
steps = random.choices([-1,+1], k=n)
return [0]+list(accumulate(steps))
walk = random_walk_faster(1000)
In fact, we’ve just vectorized our function. Instead of looping for picking sequential steps and add them to the current position, we first generated all the steps at once and used the accumulate function to compute all the positions. We got rid of the loop and this makes things faster:
>>> from tools import timeit
>>> timeit("random_walk_faster(n=10000)", globals())
10 loops, best of 3: 2.21 msec per loop
We gained 85% of computation-time compared to the previous version, not so bad. But the advantage of this new version is that it makes NumPy vectorization super simple. We just have to translate itertools call into NumPy ones.
def random_walk_fastest(n=1000):
# No 's' in NumPy choice (Python offers choice & choices)
steps = np.random.choice([-1,+1], n)
return np.cumsum(steps)
walk = random_walk_fastest(1000)
Not too difficult, but we gained a factor 500x using NumPy:
>>> from tools import timeit
>>> timeit("random_walk_fastest(n=10000)", globals())
1000 loops, best of 3: 14 usec per loop
This book is about vectorization, be it at the code or problem level. We’ll see this difference is important before looking at custom vectorization.
Readability vs speed¶
Before heading to the next chapter, I would like to warn you about a potential problem you may encounter once you’ll have become familiar with NumPy. It is a very powerful library and you can make wonders with it but, most of the time, this comes at the price of readability. If you don’t comment your code at the time of writing, you won’t be able to tell what a function is doing after a few weeks (or possibly days). For example, can you tell what the two functions below are doing? Probably you can tell for the first one, but unlikely for the second (or your name is Jaime Fernández del Río and you don’t need to read this book).
def function_1(seq, sub):
return [i for i in range(len(seq) - len(sub) +1) if seq[i:i+len(sub)] == sub]
def function_2(seq, sub):
target = np.dot(sub, sub)
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(sub))
mask = np.all((np.take(seq, check) == sub), axis=-1)
return candidates[mask]
As you may have guessed, the second function is the vectorized-optimized-faster-NumPy version of the first function. It is 10 times faster than the pure Python version, but it is hardly readable.