I was watching a lecture series on 'Bit Hacking' and came across the following optimization for finding the minimum of two integers:
return x ^ ((y ^ x) & -(x > y)) Which said to be faster than:
if x < y:     return x else:     return y Since the min function can handle more than just two integers (floats, strings, lists, and even custom objects) I assumed that calling min(x, y) would take longer than the optimized bit hack above.  To my surprise, they were nearly identical:
>>> python -m timeit "min(4, 5)" 1000000 loops, best of 3: 0.203 usec per loop  >>> python -m timeit "4 ^ ((5 ^ 4) & -(4 > 5))" 10000000 loops, best of 3: 0.19 usec per loop This is true even for numbers greater than 255 (pre allocated python integer objects)
>>> python -m timeit "min(15456, 54657)" 10000000 loops, best of 3: 0.191 usec per loop  python -m timeit "15456 ^ ((54657 ^ 15456) & -(54657 > 15456))" 10000000 loops, best of 3: 0.18 usec per loop How is it that a function so versatile like min can still be so fast and optimized?
Note: I ran the above code using Python 3.5. I'm assuming that this is the same for Python 2.7+ but haven't tested
I've created the following c module:
#include <Python.h>  static PyObject * my_min(PyObject *self, PyObject *args){     const long x;     const long y;      if (!PyArg_ParseTuple(args, "ll", &x, &y))         return NULL;      return PyLong_FromLong(x ^ ((y ^ x) & -(x > y))); }  static PyMethodDef MyMinMethods[] =  {     { "my_min", my_min, METH_VARARGS, "bit hack min"     },     {NULL, NULL, 0, NULL} };  PyMODINIT_FUNC initmymin(void) {     PyObject *m;      m = Py_InitModule("mymin", MyMinMethods);     if (m == NULL)         return;  } Compiled it, and installed it onto my system (an ubuntu VM machine). I then ran the following:
>>> python -m timeit 'min(4, 5)' 10000000 loops, best of 3: 0.11 usec per loop  >>> python -m timeit -s 'import mymin' 'mymin.my_min(4,5)' 10000000 loops, best of 3: 0.129 usec per loop While I understand that this is a VM machine, shouldn't there still be a greater gap in execution time with the 'bit hacking' being offloaded into native c?
This is likely due to how the min function is implemented in python.
Many python builtins are actually implemented in low level languages such as C or assembly and use the python apis in order to be callable in python.
Your bit fiddling technique is likely very fast in C but in python the interpretation overhead of the statement will far exceed the overhead of calling even a complex function implemented in a low level language.
If you really want a fair test compare a C program, or C python extension implementing that technique to your python call of min and see how it compares, I expect that will explain the result you see.
EDIT:
Thanks to @Two-BitAlchemist I can now give some more details onto additional reasons this bit twiddling will not work well in python. It appears that integers are not stored in the obvious way but are actually a fairly complex expanding object designed to store potentially very large numbers.
Some details on this are findable here (Thanks to Two-BitAlchemist) though it appears this is changed somewhat in newer python versions. Still the point remains that we are most certainly not manipulation a simple set of bits when we touch an integer in python, but a complex object where the bit manipulations are in fact virtual method calls with enormous overhead (compared to what they do).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With