Common performance optimizations

We’ll quickly cover a number of common performance optimizations in Python. Most of these things deal with heavy-duty number crunching, and not all have examples (was a bit pressed for time when writing these lessons).

Use optimized libraries like Numpy wherever possible

Libraries like Numpy and Pandas are filled with lots of hyper-optimized C/C++/Fortran code. Using these libraries will almost always be faster than using pure Python equivalents:

%%timeit
array = np.arange(1000000)
array = array ** 2

100 loops, best of 3: 4.62 ms per loop

%%timeit
slow = range(1000000)
slow = list(map(lambda x: x ** 2, slow))

1 loop, best of 3: 589 ms per loop

Hard-disk no-nos

Writing to the hard disk will always be slower than writing objects to memory. There are no exceptions.

Also, wherever possible, one big write to disk will always be faster than lots of little writes to disk.

numba and JIT-compiled code

numba is a JIT compiler for Python. A JIT compiler is a special type of compiler that compiles normally slow code from an interpreted language like Python into ultra-fast machine code for your CPU right before it runs.

Using numba is rather easy - first we import it:

from numba import jit

Then we place the @jit decorator above any functions we want to be compiled (I modified this bubble sort implementation from the numba documentation):

def bubblesort(Y):
	X = Y.copy()
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
	return X


@jit
def bubblesort_jit(Y):
	X = Y.copy()
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
	return X

randomized = np.random.shuffle(np.arange(1000))

Now lets see the performance difference between the two versions:

%timeit bubblesort(randomized)

1 loop, best of 3: 313 ms per loop

%timeit bubblesort_jit(randomized)

1000 loops, best of 3: 1.52 ms per loop

The @jit version was considerably faster.

Common performance optimizations

Use optimized libraries like Numpy wherever possible

Hard-disk no-nos

numba and JIT-compiled code

Next section

HPC Python

Navigation