Commit 2b5b3776 authored by Stefan Behnel's avatar Stefan Behnel

clarification on Py_UNICODE behaviour in 0.13

parent db4876c8
...@@ -235,7 +235,7 @@ coerce to a Python unicode object. The following will therefore print ...@@ -235,7 +235,7 @@ coerce to a Python unicode object. The following will therefore print
the character ``A``:: the character ``A``::
cdef Py_UNICODE uchar_val = u'A' cdef Py_UNICODE uchar_val = u'A'
assert uchar_val == ord(u'A') # 65 assert uchar_val == 65 # character point value of u'A'
print( uchar_val ) print( uchar_val )
Again, explicit casting will allow users to override this behaviour. Again, explicit casting will allow users to override this behaviour.
...@@ -271,16 +271,26 @@ The same applies to bytes objects:: ...@@ -271,16 +271,26 @@ The same applies to bytes objects::
for c in bytes_string: for c in bytes_string:
if c == 'A': ... if c == 'A': ...
and unicode objects:: For unicode objects, Cython will automatically infer the type of the
loop variable as ``Py_UNICODE``::
cdef unicode ustring = ... cdef unicode ustring = ...
cdef Py_UNICODE uchar # NOTE: no typing required for 'uchar' !
for uchar in ustring: for uchar in ustring:
if uchar == u'A': ... if uchar == u'A': ...
The automatic type inference usually leads to much more efficient code
here. However, note that some unicode operations still require the
value to be a Python object, so Cython may end up generating redundant
conversion code for the loop variable value inside of the loop. If
this leads to a performance degradation for a specific piece of code,
you can either type the loop variable as a Python object explicitly,
or assign it to a Python typed temporary variable to enforce one-time
coercion before running Python operations on it.
There is also an optimisation for ``in`` tests, so that the following There is also an optimisation for ``in`` tests, so that the following
code will run in plain C code:: code will run in plain C code, (actually using a switch statement)::
cdef Py_UNICODE uchar_val = get_a_unicode_character() cdef Py_UNICODE uchar_val = get_a_unicode_character()
if uchar_val in u'abcABCxY': if uchar_val in u'abcABCxY':
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment