(One)-minute geek news

Group talks, Laboratoire de Chimie, ENS de Lyon

Python profiling


Sometimes, you want your Python (Python 3, obviously) scripts to be efficient (and yes, it is possible!). To achieve good performance, you need a tool to evaluate it.

Ipython console

Before we go any further, let me introduce you to the IPython console (you can evaluate your code's performance without, but it will be far less convenient). Let me convince you to use the IPython console (ipython3 command) instead of the default console (python3 command) for your python experimentations. (Note: I am talking about using an interactive python console. Your scripts should still be executed with a python3 my_script.py command)

The IPython console (Interactive Python) display features like:

  • Simpler syntax:
    run my_script.py arg1
    exit
    instead of
    import sys
    sys.argv = ['my_script.py', 'arg1'] # trick to use arguments
    with open('my_script.py') as file:
       exec(file.read())
    exit()
  • Smarter auto-completion (on module names, and paths).
  • Convenient help and function definition display (e.g. time.time? or np.dot? instead of help(time.time) or np.info(np.dot))
  • Better history manipulation (especially with multiline commands), and editing shortcuts (Ctrl-r, Ctrl-Up, automatic identation, ...).
  • Basic bash command support (ls, cd, rm, ... within the Python console!)
  • Last but not least: magic commands!
Try the ipython3 command in your terminal, or install it (sudo apt-get install ipython3) and thank me later!

Advice for intermediate users: use the Ctrl-o shortcut for enabling multiline editing, or use Ctrl-q Ctrl-j to add the first newline. It comes pretty handy in some cases, especially for timing measures...

Basic timing

It is time to try one of those magic commands: %timeit. With this, timing is a easy as:

%timeit A = np.random.rand(1000,1000)
And it is even better than what you expected: it performed statistical average over multiple executions. How many executions? By default, it is optimized so that the whole experiment doesn't last less than around a second!

Line by line profiling

Basic timings are fine for simple experimentations, but when dealing with scripts or non-trivial functions, one needs a more precise tool. Here is when profiling comes in. Profiling a script/function is basically timing each line of your program. And with the line_profiler module, it is quite convenient!

First let us install the module (if you do not have root access on your machine, simply add the --user option to your pip3 commands: pip3 install --user line_profiler).

Now, the profiling is as easy as:

%load_ext line_profiler
%lprun -f sub_function_to_profile [-f sub_function_2 ...] your_command

Let us see it in action with a complete test:

import numpy as np


# Functions definition

def matrix_operations(A, B):
    A.sort(axis=-1) # Sort each line separately O(n²log(n))
    for _ in range(3):
        B = np.dot(A, B) # Matrix multiplication O(n²)
    
    return(B)

def some_algebra(n):
    A = np.random.rand(n, n) # Create nxn random matrix (uniform)
    B = np.random.rand(n) # Create random vector (size n)
    
    C = matrix_operations(A, B)
    
    return(C)


# Profiling

%load_ext line_profiler
%lprun -f matrix_operations some_algebra(500)

Memory usage profiling

If you want to monitor/profile your memory usage, just use the memory_profiler module, and the associated %mprun magic command:

%load_ext memory_profiler
%mprun -f sub_function_to_profile your_command

References

help(line_profiler)
help(memory_profiler)