Say you’ve finally hit a point while writing your python code where performance starts becoming an issue. So now, you’ve decided to go ahead and optimize your code a little
At this point, you have a bunch of options in front of you.
With that out of the way, lets get started!
Lets find Prime Numbers!
So here’s a rather straight forward problem. We are interested in calculating the kth prime number. So here’s one way to do it in Python.
This code here, takes around 42 seconds to calculate the 10000th prime number. That’s bad… Let’s see how C performs for the same task with the same algorithm.
This on the other hand took 1.01 seconds to finish.
So as you’ve already guessed, if we could get python to use this module to compute the kth prime number instead, we’d be saving a lot of time. And thankfully, we can. Python provides a bunch of APIs so that you can extend Python functionalities.
Alright! Let’s build a dead simple python library with C to compute the kth prime number. This is how we’re going to proceed.
Integration
We’re building this functionality for the CPython implementation of Python. Everything in CPython is a struct called PyObject. Functions, Class, Data Types… you name it. It’s a PyObject. So some of the things we need to do in this step is, convert between C datatypes, functions etc to things that Python can use; in our case, convert everything to PyObjects.
The above C code can be a little daunting if you are unfamiliar with a strongly typed language such as C++ or Java but it’s quite simple really.
As mentioned before, everything is a PyObject. At line 5, we define a static function which returns a pointer to a PyObject. This is the function that CPython will call when we import our library and call the kthprime function. This function takes two arguments; self and the arguments args . self refers to the PyObject that called the function or the module instance itself.
Python also provides loads of C methods to parse arguments and PyArg_ParseTuple is one of them.
Since I’ve passed argument k as a positional argument, Python internally passes it as a tuple with one element. The “i” tells PyArg_ParseTuple that I’m looking for an integer.
If at any point in the static method, you return a NULL pointer, the Python interpreter assumes that something went wrong and throws an exception. If your function returns nothing, pass Py_None object.
Build
Now it’s time to put everything together.
First, let’s compile our c_prime.c code to a relocatable format. GCC allows you to compile C code to an object file. This object file will be contain the code in the c_prime.c file in machine code format.
Next, we want to compile the fastprime.c code to object code as we did previously and finally, link all the object files together along with the standard library to create our final usable code. This could be a pretty tedious task and we’d like to avoid doing this over and over again if possible. Python hence provides us the disutils library to make all of this a little easier.
With that, we finally have our python module with a .so (shared object; similar to .dll files on Windows) extension. This shared object can be imported into python and now used to compute the kth prime.
This piece of code took 1.01 seconds to run as well!
So we’ve effectively made our code around 40x faster? So this begs the question; Why do people not do this more often? Here’s the catch.
First, its time consuming to keep doing this for every little function you might want to use in Python. If performance is so crucial to your system, then you probably shouldn’t be using python in the first place. The idea behind python is, you spend more time reading code than actually writing code. Hence making code more readable allows it to stay relevant for longer. It was never intended to beat C or C++ in terms of speed and it never will .
Second, this speedup comes at a cost. The original slow python code that we had written was actually machine independent. As you might know, python compiles down to bytecode and is then interpreted on the Python Virtual Machine. So for all I care, I can just go ahead and start distributing my code and let python handle everything else (as long as I use a compatible version of python to run my code). On the other hand, this isn’t the case with C or C++. The object files generated by C and C++ are target machine dependent. So by introducing C object files, I have effectively made my python code machine dependent.
Conclusion
That brings us to the end of this article. Hope this article provided some additional insight as to how CPython works. Any suggestion / feedback / correction would be highly appreciated!