numpy_tutorial.rst 20.2 KB
Newer Older
Robert Bradshaw's avatar
Robert Bradshaw committed
1 2 3 4 5 6 7 8 9 10
.. highlight:: cython

.. _numpy_tutorial:

**************************
Cython for NumPy users
**************************

This tutorial is aimed at NumPy users who have no experience with Cython at
all. If you have some knowledge of Cython you may want to skip to the
11
''Efficient indexing'' section.
Robert Bradshaw's avatar
Robert Bradshaw committed
12 13 14

The main scenario considered is NumPy end-use rather than NumPy/SciPy
development. The reason is that Cython is not (yet) able to support functions
15
that are generic with respect to the number of dimensions in a
Robert Bradshaw's avatar
Robert Bradshaw committed
16 17 18 19 20 21
high-level fashion. This restriction is much more severe for SciPy development
than more specific, "end-user" functions. See the last section for more
information on this.

The style of this tutorial will not fit everybody, so you can also consider:

22 23 24 25 26 27
* Kurt Smith's `video tutorial of Cython at SciPy 2015
  <https://www.youtube.com/watch?v=gMvkiQ-gOW8&t=4730s&ab_channel=Enthought>`_.
  The slides and notebooks of this talk are `on github
  <https://github.com/kwmsmith/scipy-2015-cython-tutorial>`_.
* Basic Cython documentation (see `Cython front page
  <https://cython.readthedocs.io/en/latest/index.html>`_).
Robert Bradshaw's avatar
Robert Bradshaw committed
28 29

Cython at a glance
30
==================
Robert Bradshaw's avatar
Robert Bradshaw committed
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Cython is a compiler which compiles Python-like code files to C code. Still,
''Cython is not a Python to C translator''. That is, it doesn't take your full
program and "turns it into C" -- rather, the result makes full use of the
Python runtime environment. A way of looking at it may be that your code is
still Python in that it runs within the Python runtime environment, but rather
than compiling to interpreted Python bytecode one compiles to native machine
code (but with the addition of extra syntax for easy embedding of faster
C-like code).

This has two important consequences:

* Speed. How much depends very much on the program involved though. Typical Python numerical programs would tend to gain very little as most time is spent in lower-level C that is used in a high-level fashion. However for-loop-style programs can gain many orders of magnitude, when typing information is added (and is so made possible as a realistic alternative).
* Easy calling into C code. One of Cython's purposes is to allow easy wrapping
  of C libraries. When writing code in Cython you can call into C code as
  easily as into Python code.

48 49 50
Very few Python constructs are not yet supported, though making Cython compile all
Python code is a stated goal, you can see the differences with Python in
:ref:`limitations <cython-limitations>`.
Robert Bradshaw's avatar
Robert Bradshaw committed
51 52

Your Cython environment
53
=======================
Robert Bradshaw's avatar
Robert Bradshaw committed
54 55 56 57 58 59 60 61 62 63 64 65

Using Cython consists of these steps:

1. Write a :file:`.pyx` source file
2. Run the Cython compiler to generate a C file
3. Run a C compiler to generate a compiled library
4. Run the Python interpreter and ask it to import the module

However there are several options to automate these steps:

1. The `SAGE <http://sagemath.org>`_ mathematics software system provides
   excellent support for using Cython and NumPy from an interactive command
66
   line or through a notebook interface (like
Robert Bradshaw's avatar
Robert Bradshaw committed
67
   Maple/Mathematica). See `this documentation
68 69 70 71 72 73 74
   <http://doc.sagemath.org/html/en/developer/coding_in_cython.html>`_.
2. Cython can be used as an extension within a Jupyter notebook,
   making it easy to compile and use Cython code with just a ``%%cython``
   at the top of a cell. For more information see
   :ref:`Using the Jupyter Notebook <jupyter-notebook>`.
3. A version of pyximport is shipped with Cython,
   so that you can import pyx-files dynamically into Python and
Robert Bradshaw's avatar
Robert Bradshaw committed
75
   have them compiled automatically (See :ref:`pyximport`).
76
4. Cython supports distutils so that you can very easily create build scripts
77 78
   which automate the process, this is the preferred method for
   Cython implemented libraries and packages.
79
   See :ref:`Basic setup.py <basic_setup.py>`.
80
5. Manual compilation (see below)
Robert Bradshaw's avatar
Robert Bradshaw committed
81

mathbunnyru's avatar
mathbunnyru committed
82
.. Note::
Robert Bradshaw's avatar
Robert Bradshaw committed
83
    If using another interactive command line environment than SAGE, like
84
    IPython or Python itself, it is important that you restart the process
Robert Bradshaw's avatar
Robert Bradshaw committed
85 86 87 88 89 90
    when you recompile the module. It is not enough to issue an "import"
    statement again.

Installation
=============

91 92 93 94 95 96
If you already have a C compiler, just do::

   pip install Cython

otherwise, see :ref:`the installation page <install>`.

Robert Bradshaw's avatar
Robert Bradshaw committed
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

As of this writing SAGE comes with an older release of Cython than required
for this tutorial. So if using SAGE you should download the newest Cython and
then execute ::

    $ cd path/to/cython-distro
    $ path-to-sage/sage -python setup.py install

This will install the newest Cython into SAGE.

Manual compilation
====================

As it is always important to know what is going on, I'll describe the manual
method here. First Cython is run::

    $ cython yourmod.pyx

This creates :file:`yourmod.c` which is the C source for a Python extension
module. A useful additional switch is ``-a`` which will generate a document
:file:`yourmod.html`) that shows which Cython code translates to which C code
line by line.

Then we compile the C file. This may vary according to your system, but the C
file should be built like Python was built. Python documentation for writing
extensions should have some details. On Linux this often means something
like::

125
    $ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.7 -o yourmod.so yourmod.c
Robert Bradshaw's avatar
Robert Bradshaw committed
126

127 128 129 130 131 132 133 134
``gcc`` should have access to the NumPy C header files so if they are not
installed at :file:`/usr/include/numpy` or similar you may need to pass another
option for those. You only need to provide the NumPy headers if you write::

    cimport numpy

in your Cython code.

Robert Bradshaw's avatar
Robert Bradshaw committed
135 136 137 138 139 140
This creates :file:`yourmod.so` in the same directory, which is importable by
Python by using a normal ``import yourmod`` statement.

The first Cython program
==========================

141
You can easily execute the code of this tutorial by
142
downloading `the Jupyter notebook <https://github.com/cython/cython/blob/master/docs/examples/userguide/numpy_tutorial/numpy_and_cython.ipynb>`_.
Robert Bradshaw's avatar
Robert Bradshaw committed
143

144
The code below does the equivalent of this function in numpy::
Robert Bradshaw's avatar
Robert Bradshaw committed
145

146 147 148 149 150 151 152 153 154 155
    def compute_np(array_1, array_2, a, b, c):
        return np.clip(array_1, 2, 10) * a + array_2 * b + c

We'll say that ``array_1`` and ``array_2`` are 2D NumPy arrays of integer type and
``a``, ``b`` and ``c`` are three Python integers.

This function uses NumPy and is already really fast, so it might be a bit overkill
to do it again with Cython. This is for demonstration purposes. Nonetheless, we
will show that we achieve a better speed and memory efficiency than NumPy at the cost of more verbosity.

156
This code computes the function with the loops over the two dimensions being unrolled.
157 158
It is both valid Python and valid Cython code. I'll refer to it as both
:file:`compute_py.py` for the Python version and :file:`compute_cy.pyx` for the
159 160
Cython version -- Cython uses ``.pyx`` as its file suffix (but it can also compile
``.py`` files).
161 162

.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_py.py
Robert Bradshaw's avatar
Robert Bradshaw committed
163

164 165
This should be compiled to produce :file:`convolve_cy.so` (for Linux systems,
on Windows systems, this will be a ``.pyd`` file). We
Robert Bradshaw's avatar
Robert Bradshaw committed
166 167 168 169 170 171
run a Python session to test both the Python version (imported from
``.py``-file) and the compiled Cython module.

.. sourcecode:: ipython

    In [1]: import numpy as np
172 173
    In [2]: array_1 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
    In [3]: array_2 = np.random.uniform(0, 1000, size=(3000, 2000)).astype(np.intc)
174 175 176 177 178 179
    In [4]: a = 4
    In [5]: b = 3
    In [6]: c = 9
    In [7]: def compute_np(array_1, array_2, a, b, c):
       ...:     return np.clip(array_1, 2, 10) * a + array_2 * b + c
    In [8]: %timeit compute_np(array_1, array_2, a, b, c)
180
    103 ms ± 4.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
181 182 183

    In [9]: import compute_py
    In [10]: compute_py.compute(array_1, array_2, a, b, c)
184
    1min 10s ± 844 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
185 186 187

    In [11]: import compute_cy
    In [12]: compute_cy.compute(array_1, array_2, a, b, c)
188
    56.5 s ± 587 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Robert Bradshaw's avatar
Robert Bradshaw committed
189 190 191

There's not such a huge difference yet; because the C code still does exactly
what the Python interpreter does (meaning, for instance, that a new object is
192 193 194 195 196
allocated for each number used).

You can look at the Python interaction and the generated C
code by using ``-a`` when calling Cython from the command
line, ``%%cython -a`` when using a Jupyter Notebook, or by using
197
``cythonize('compute_cy.pyx', annotate=True)`` when using a ``setup.py``.
198
Look at the generated html file and see what
199
is needed for even the simplest statements. You get the point quickly. We need
Robert Bradshaw's avatar
Robert Bradshaw committed
200 201
to give Cython more information; we need to add types.

202

Robert Bradshaw's avatar
Robert Bradshaw committed
203 204 205 206
Adding types
=============

To add types we use custom Cython syntax, so we are now breaking Python source
207
compatibility. Here's :file:`compute_typed.pyx`. *Read the comments!*
208

209
.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_typed.pyx
Robert Bradshaw's avatar
Robert Bradshaw committed
210

211
.. figure:: compute_typed_html.jpg
Robert Bradshaw's avatar
Robert Bradshaw committed
212

213 214
At this point, have a look at the generated C code for :file:`compute_cy.pyx` and
:file:`compute_typed.pyx`. Click on the lines to expand them and see corresponding C.
Robert Bradshaw's avatar
Robert Bradshaw committed
215

216 217
Especially have a look at the ``for-loops``: In :file:`compute_cy.c`, these are ~20 lines
of C code to set up while in :file:`compute_typed.c` a normal C for loop is used.
Robert Bradshaw's avatar
Robert Bradshaw committed
218 219 220 221 222

After building this and continuing my (very informal) benchmarks, I get:

.. sourcecode:: ipython

223
    In [13]: %timeit compute_typed.compute(array_1, array_2, a, b, c)
224
    26.5 s ± 422 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Robert Bradshaw's avatar
Robert Bradshaw committed
225

226 227
So adding types does make the code faster, but nowhere
near the speed of NumPy?
228

229
What happened is that most of the time spend in this code is spent in the following lines,
230
and those lines are slower to execute than in pure Python::
231

232 233 234
    tmp = clip(array_1[x, y], 2, 10)
    tmp = tmp * a + array_2[x, y] * b
    result[x, y] = tmp + c
235

236
So what made those line so much slower than in the pure Python version?
237

238
``array_1`` and ``array_2`` are still NumPy arrays, so Python objects, and expect
239
Python integers as indexes. Here we pass C int values. So every time
240
Cython reaches this line, it has to convert all the C integers to Python
241
int objects. Since this line is called very often, it outweighs the speed
242 243
benefits of the pure C loops that were created from the ``range()`` earlier.

244 245 246
Furthermore, ``tmp * a + array_2[x, y] * b`` returns a Python integer
and ``tmp`` is a C integer, so Cython has to do type conversions again.
In the end those types conversions add up. And made our computation really
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
247
slow. But this problem can be solved easily by using memoryviews.
248 249 250

Efficient indexing with memoryviews
===================================
Robert Bradshaw's avatar
Robert Bradshaw committed
251

252
There are still two bottlenecks that degrade the performance, and that is the array lookups
253 254
and assignments, as well as C/Python types conversion.
The ``[]``-operator still uses full Python operations --
Robert Bradshaw's avatar
Robert Bradshaw committed
255 256 257 258
what we would like to do instead is to access the data buffer directly at C
speed.

What we need to do then is to type the contents of the :obj:`ndarray` objects.
259 260
We do this with a memoryview. There is :ref:`a page in the Cython documentation
<memoryviews>` dedicated to it.
Robert Bradshaw's avatar
Robert Bradshaw committed
261

262
In short, memoryviews are C structures that can hold a pointer to the data
263 264 265
of a NumPy array and all the necessary buffer metadata to provide efficient
and safe access: dimensions, strides, item size, item type information, etc...
They also support slices, so they work even if
266 267 268
the NumPy array isn't contiguous in memory.
They can be indexed by C integers, thus allowing fast access to the
NumPy array data.
Robert Bradshaw's avatar
Robert Bradshaw committed
269

gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
270 271 272 273 274 275 276
Here is how to declare a memoryview of integers::

    cdef int [:] foo         # 1D memoryview
    cdef int [:, :] foo      # 2D memoryview
    cdef int [:, :, :] foo   # 3D memoryview
    ...                      # You get the idea.

277
No data is copied from the NumPy array to the memoryview in our example.
278
As the name implies, it is only a "view" of the memory. So we can use the
279 280
view ``result_view`` for efficient indexing and at the end return the real NumPy
array ``result`` that holds the data that we operated on.
281 282 283

Here is how to use them in our code:

284
:file:`compute_memview.pyx`
Robert Bradshaw's avatar
Robert Bradshaw committed
285

286
.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_memview.pyx
mathbunnyru's avatar
mathbunnyru committed
287

288
Let's see how much faster accessing is now.
Robert Bradshaw's avatar
Robert Bradshaw committed
289 290 291

.. sourcecode:: ipython

292
    In [22]: %timeit compute_memview.compute(array_1, array_2, a, b, c)
293
    22.9 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Robert Bradshaw's avatar
Robert Bradshaw committed
294 295

Note the importance of this change.
296 297
We're now 3081 times faster than an interpreted version of Python and 4.5 times
faster than NumPy.
Robert Bradshaw's avatar
Robert Bradshaw committed
298

299
Memoryviews can be used with slices too, or even
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
300 301
with Python arrays. Check out the :ref:`memoryview page <memoryviews>` to
see what they can do for you.
Robert Bradshaw's avatar
Robert Bradshaw committed
302 303 304 305 306 307 308 309 310

Tuning indexing further
========================

The array lookups are still slowed down by two factors:

1. Bounds checking is performed.
2. Negative indices are checked for and handled correctly.  The code above is
   explicitly coded so that it doesn't use negative indices, and it
311 312
   (hopefully) always access within bounds.

313
With decorators, we can deactivate those checks::
Robert Bradshaw's avatar
Robert Bradshaw committed
314

315 316 317 318 319 320
    ...
    cimport cython
    @cython.boundscheck(False)  # Deactivate bounds checking
    @cython.wraparound(False)   # Deactivate negative indexing.
    def compute(int[:, :] array_1, int[:, :] array_2, int a, int b, int c):
    ...
mathbunnyru's avatar
mathbunnyru committed
321

Robert Bradshaw's avatar
Robert Bradshaw committed
322 323 324
Now bounds checking is not performed (and, as a side-effect, if you ''do''
happen to access out of bounds you will in the best case crash your program
and in the worst case corrupt data). It is possible to switch bounds-checking
325
mode in many ways, see :ref:`compiler-directives` for more
Robert Bradshaw's avatar
Robert Bradshaw committed
326 327 328 329 330
information.


.. sourcecode:: ipython

331
    In [23]: %timeit compute_index.compute(array_1, array_2, a, b, c)
332
    16.8 ms ± 25.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Robert Bradshaw's avatar
Robert Bradshaw committed
333

334 335 336 337 338
We're faster than the NumPy version (6.2x). NumPy is really well written,
but does not performs operation lazily, resulting in a lot
of intermediate copy operations in memory. Our version is
very memory efficient and cache friendly because we
can execute the operations in a single run over the data.
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
339

Robert Bradshaw's avatar
Robert Bradshaw committed
340 341 342
.. Warning::

    Speed comes with some cost. Especially it can be dangerous to set typed
343
    objects (like ``array_1``, ``array_2`` and ``result_view`` in our sample code) to ``None``.
344
    Setting such objects to ``None`` is entirely legal, but all you can do with them
Robert Bradshaw's avatar
Robert Bradshaw committed
345 346 347 348 349
    is check whether they are None. All other use (attribute lookup or indexing)
    can potentially segfault or corrupt data (rather than raising exceptions as
    they would in Python).

    The actual rules are a bit more complicated but the main message is clear: Do
350
    not use typed objects without knowing that they are not set to ``None``.
Robert Bradshaw's avatar
Robert Bradshaw committed
351

352 353 354
Declaring the NumPy arrays as contiguous
========================================

355 356
For extra speed gains, if you know that the NumPy arrays you are
providing are contiguous in memory, you can declare the
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
357
memoryview as contiguous.
358 359

We give an example on an array that has 3 dimensions.
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
360 361
If you want to give Cython the information that the data is C-contiguous
you have to declare the memoryview like this::
362 363 364

    cdef int [:,:,::1] a

365
If you want to give Cython the information that the data is Fortran-contiguous
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
366
you have to declare the memoryview like this::
367 368 369

    cdef int [::1, :, :] a

370
If all this makes no sense to you, you can skip this part, declaring
371
arrays as contiguous constrains the usage of your functions as it rejects array slices as input.
372
If you still want to understand what contiguous arrays are
373 374 375 376 377 378 379 380
all about, you can see `this answer on StackOverflow
<https://stackoverflow.com/questions/26998223/what-is-the-difference-between-contiguous-and-non-contiguous-arrays>`_.

For the sake of giving numbers, here are the speed gains that you should
get by declaring the memoryviews as contiguous:

.. sourcecode:: ipython

381
    In [23]: %timeit compute_contiguous.compute(array_1, array_2, a, b, c)
382
    11.1 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
383

384
We're now around nine times faster than the NumPy version, and 6300 times
385
faster than the pure Python version!
386 387 388 389

Making the function cleaner
===========================

390 391
Declaring types can make your code quite verbose. If you don't mind
Cython inferring the C types of your variables, you can use
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
392 393 394 395
the ``infer_types=True`` compiler directive at the top of the file.
It will save you quite a bit of typing.

Note that since type declarations must happen at the top indentation level,
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
396
Cython won't infer the type of variables declared for the first time
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
397 398
in other indentation levels. It would change too much the meaning of
our code. This is why, we must still declare manually the type of the
399
``tmp``, ``x`` and ``y`` variable.
400

401
And actually, manually giving the type of the ``tmp`` variable will
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
402
be useful when using fused types.
403

404
.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_infer_types.pyx
405

gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
406 407 408 409
We now do a speed test:

.. sourcecode:: ipython

410
    In [24]: %timeit compute_infer_types.compute(array_1, array_2, a, b, c)
411
    11.5 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
412

413
Lo and behold, the speed has not changed.
414

Robert Bradshaw's avatar
Robert Bradshaw committed
415 416 417
More generic code
==================

gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
418
All those speed gains are nice, but adding types constrains our code.
419
At the moment, it would mean that our function can only work with
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
420 421 422 423 424 425
NumPy arrays with the ``np.intc`` type. Is it possible to make our
code work for multiple NumPy data types?

Yes, with the help of a new feature called fused types.
You can learn more about it at :ref:`this section of the documentation
<fusedtypes>`.
luz.paz's avatar
luz.paz committed
426
It is similar to C++ 's templates. It generates multiple function declarations
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
427
at compile time, and then chooses the right one at run-time based on the
428 429 430
types of the arguments provided. By comparing types in if-conditions, it
is also possible to execute entirely different code paths depending
on the specific data type.
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
431 432 433 434 435 436

In our example, since we don't have access anymore to the NumPy's dtype
of our input arrays, we use those ``if-else`` statements to
know what NumPy data type we should use for our output array.

In this case, our function now works for ints, doubles and floats.
Robert Bradshaw's avatar
Robert Bradshaw committed
437

438
.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_fused_types.pyx
Robert Bradshaw's avatar
Robert Bradshaw committed
439

gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
440 441
We can check that the output type is the right one::

442
    >>>compute(array_1, array_2, a, b, c).dtype
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
443
    dtype('int32')
444
    >>>compute(array_1.astype(np.double), array_2.astype(np.double), a, b, c).dtype
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
445 446 447 448 449 450
    dtype('float64')

We now do a speed test:

.. sourcecode:: ipython

451
    In [25]: %timeit compute_fused_types.compute(array_1, array_2, a, b, c)
452
    11.5 ms ± 258 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
453

454 455 456
More versions of the function are created at compile time. So it makes
sense that the speed doesn't change for executing this function with
integers as before.
457 458 459 460

Using multiple threads
======================

Robert Bradshaw's avatar
Robert Bradshaw committed
461
Cython have support for OpenMP. It also has some nice wrappers around it,
462
like the function :func:`prange`. You can see more information about Cython and
463
parallelism in :ref:`parallel`. Since we do elementwise operations, we can easily
464 465 466 467
distribute the work among multiple threads. It's important not to forget to pass the
correct arguments to the compiler to enable OpenMP. When using the Jupyter notebook,
you should use the cell magic like this::

468 469 470
    %%cython --force
    # distutils: extra_compile_args=-fopenmp
    # distutils: extra_link_args=-fopenmp
471 472 473 474 475 476 477 478 479

The GIL must be released (see :ref:`Releasing the GIL <nogil>`), so this is why we
declare our :func:`clip` function ``nogil``.

.. literalinclude:: ../../examples/userguide/numpy_tutorial/compute_prange.pyx

We can have substantial speed gains for minimal effort:

.. sourcecode:: ipython
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
480

481
    In [25]: %timeit compute_prange.compute(array_1, array_2, a, b, c)
482
    9.33 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
gabrieldemarmiesse's avatar
gabrieldemarmiesse committed
483

484
We're now 7558 times faster than the pure Python version and 11.1 times faster
485
than NumPy!
Robert Bradshaw's avatar
Robert Bradshaw committed
486

487 488 489
Where to go from here?
======================

490 491 492 493
* If you want to learn how to make use of `BLAS <http://www.netlib.org/blas/>`_
  or `LAPACK <http://www.netlib.org/lapack/>`_ with Cython, you can watch
  `the presentation of Ian Henriksen at SciPy 2015
  <https://www.youtube.com/watch?v=R4yB-8tB0J0&t=693s&ab_channel=Enthought>`_.
494 495 496 497
* If you want to learn how to use Pythran as backend in Cython, you
  can see how in :ref:`Pythran as a NumPy backend <numpy-pythran>`.
  Note that using Pythran only works with the
  :ref:`old buffer syntax <working-numpy>` and not yet with memoryviews.