Introduction to Cython
In this article I will give you a brief introduction to Cython -- a Python library enabling interaction with C code. And it does not matter if you use C code from Python or call Python code from C.
Sounds magical? Well, it is not as complicated as it seems but let's dig a bit deeper and see what we can achieve with it.
Cython is a library, it is not the same as CPython, the reference implementation of the Python programming language. It converts Python code into C code which can be then linked to the CPython runtime.
To install this library use the following command:
pip install cython
The idea behind Cython is to achieve more speed with Python by converting it to C/C++ code and to integrate already existing C/C++ code to your Python code. Optimising Python code should happen because Python itself is an interpreted language and therefore it has some overhead interpreting itself and function calls cost more time. Another reason can be the GIL, the Global Interpreter Lock which reduces the execution of threads in CPython code to 1, this means that CPU-intensive tasks do not benefit from multithreading (for more information on this topic take a look at (this article)[http://www.discoversdk.com/blog/parallel-processing-in-python]).
The drawback with using Cython is that you have to know how to code C, although the language itself is a superset of Python.
As a side-note, what I really like about Cython is when you compile some code and it fails you will end up with a .c file with content similar like this:
#error Do not use this file, it is the result of a failed Cython compilation.
A simple example
After this introduction it is time to take a look at a simple example. I have to tell you right at the beginning that I will provide only a basic example to give you a brief glimpse of how Cython code is written and how it works.
The code we will convert to Cython code is the following snippet:
__author__ = 'GHajba' def factors(n, result): if n <= 1: return result for i in range(2, n + 1): if n % i == 0: result.append(i) return factors(n // i, result) def main(): numbers = range(1, 50001) for n in numbers: factors(n, ) if __name__ == '__main__': main()
As you can see above this code is real trivial and could be optimised on its own: the function factors calculates the factors of a given number, the whole code calculates the factors of the numbers from 1 to 50000 inclusive.
I saved this code to the file factorizer.pyx. This is valid Python 3 code, so we can turn and measure it's runtime with Python 3:
GHajba$ time python3.5 factorizer.pyx real 0m26.515s user 0m26.369s sys 0m0.073s
As you can see, it takes around 26 seconds to run this piece of code.
Now it is time to compile it with Cython:
cythonize -b -i factorizer.pyx
The -b switch builds extension modules which you can load into the Python interpreter at runtime. The -i flag builds this module in-place. This implies that the -b flag is set. If we omit the -i flag we get a directory structure like you would when using distutils.
After this we can import the generated code to Python and see how it works:
GHajba$ time python3.5 -c "import factorizer;factorizer.main()" real 0m17.192s user 0m17.154s sys 0m0.019s
As you can see, the code executes much faster only by compiling it with Cython. So we have eliminated some of the overhead mentioned previously. The drawback is the size of the code:
GHajba$ wc -l factorizer.pyx 18 factorizer.pyx GHajba$ wc -l factorizer.c 3384 factorizer.c
The generated C code has a bit more lines and if you take a look at it, it is a bit confusing unless you are a seasoned C developer. It contains a lot of #defines for portability, helpful comments with Python code snippets to understand what parts of your code generated the C code. In sum, it is a lot of code you do not want to write yourself.
Well, the example above was really a basic one: we did not use any features of Cython there. Let's re-write the application to see if we can gain more speed. For this we have to modify our code a bit by using C type definitions to let the compiler make the best out of our code.
The first code snippet will show you the code in pure python and we will re-write it to valid Cython code:
__author__ = 'GHajba' def factors(n, counter, result): if n <= 1: return result for i in range(2, n + 1): if n % i == 0: result[counter] = i return factors(n // i, counter + 1, result) def main(): max_value = 50000 for n in range(1, max_value+1): f =  * max_value factors(n, 0, f) if __name__ == '__main__': main()
This code block uses arrays as you would expect them in C or Java: through their index. Naturally this is not memory-efficient in Python because every time in the for loop we re-allocate the whole f array. If I run this code with Python 3.5 I get around 37 seconds runtime. Slightly performance loss.
After converting the code block to Cython and executing it, the execution time is around 27 seconds which means some change to the execution time as C code too but it is faster than pure Python.
But as I mentioned previously, we will convert this code block to be more efficient. Here is the first approach:
import array __author__ = 'GHajba' def factors(n, counter, result): if n <= 1: return result for i in range(2, n + 1): if n % i == 0: result[counter] = i return factors(n // i, counter + 1, result) def main(): cdef int max_value = 50000 cdef object a cdef int[:] f cdef int n for n in range(1, max_value+1): a = array.array('i', )*max_value f = a factors(n, 0, a) if __name__ == '__main__': main()
The code above looks almost identical to the Python version but we have defined a concrete type for max_value, created a memory view of the array which will enable us to generate code which will access the data in the array directly.
And as you can see, I have left the factors function unchanged. Let's "cythonize" this code located in factorizer_cython.pyx and execute it:
GHajba$ time python3.5 -c "import factorizer_cython;factorizer_cython.main()" real 0m17.504s user 0m17.439s sys 0m0.036s
Well, we have reached the time we have had previously with more pythonic code. I see you are disappointed because you were waiting for more speed gain. Well, remember: we have still a function to change. Let's add some type definitions and see what the result will be:
def factors(int n, counter, result): if n <= 1: return result cdef int i for i in range(2, n + 1): if n % i == 0: result[counter] = i return factors(n // i, counter + 1, result)
As the first step I defined i as an int. This did not bring a big change so I added type definitions to the arguments of the function too, first for count. But as you can guess, this did not do the trick either. The final result shown in the code block above has the type defined only for i and n. If I convert and run this example I get the following results:
GHajba$ time python3.5 -c "import factorizer_cython;factorizer_cython.main()" real 0m1.268s user 0m1.241s sys 0m0.019s
Nice. This is the performance gain we wanted! Factorizing 50000 numbers done in less than 2 seconds.
As we have seen, Cython is mainly used to optimise Python code. However do not run headless away and start learning C and re-writing all your Python code! First measure if your application is really that slow, create an alternative version with Cython and if the result is fast enough then change it. Alternatively identify the slow parts of your application and optimise / convert only those chunks.
And as we have seen, sometimes it is enough to "cythonize" the code and use the C-compiled extension module and we gain performance. Sometimes we have to re-write the code to Cython-code which enables optimisation for the C compiler -- like adding defined types for variables.