Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Xavier Thompson
cython
Commits
a3230e4a
Commit
a3230e4a
authored
Apr 26, 2012
by
Stefan Behnel
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
streamline string handling tutorial
parent
a40112b0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
26 additions
and
12 deletions
+26
-12
docs/src/tutorial/strings.rst
docs/src/tutorial/strings.rst
+26
-12
No files found.
docs/src/tutorial/strings.rst
View file @
a3230e4a
...
@@ -26,6 +26,26 @@ as terminator character, as generally known from C. The above will
...
@@ -26,6 +26,26 @@ as terminator character, as generally known from C. The above will
therefore only work correctly for C strings that do not contain null
therefore only work correctly for C strings that do not contain null
bytes.
bytes.
Besides not working for null bytes, the above is also very inefficient
for long strings, since Cython has to call ``strlen()`` on the C string
first to find out the length by counting the bytes up to the terminating
null byte. In many cases, the user code will know the length already,
e.g. because a C function returned it. In this case, it is much more
efficient to tell Cython the exact number of bytes by slicing the C
string::
cdef char* c_string = NULL
cdef Py_ssize_t length = 0
# get pointer and length from a C function
get_a_c_string(&c_string, &length)
py_bytes_string = c_string[:length]
Here, no additional byte counting is required and ``length`` bytes from
the ``c_string`` will be copied into the Python bytes object, including
any null bytes.
Note that the creation of the Python bytes string can fail with an
Note that the creation of the Python bytes string can fail with an
exception, e.g. due to insufficient memory. If you need to ``free()``
exception, e.g. due to insufficient memory. If you need to ``free()``
the string after the conversion, you should wrap the assignment in a
the string after the conversion, you should wrap the assignment in a
...
@@ -33,7 +53,7 @@ try-finally construct::
...
@@ -33,7 +53,7 @@ try-finally construct::
cimport stdlib
cimport stdlib
cdef bytes py_string
cdef bytes py_string
cdef char* c_string = c_call_
returning_a
_c_string()
cdef char* c_string = c_call_
creating_a_new
_c_string()
try:
try:
py_string = c_string
py_string = c_string
finally:
finally:
...
@@ -52,7 +72,7 @@ keep a reference to the Python string as long as the ``char*`` is in
...
@@ -52,7 +72,7 @@ keep a reference to the Python string as long as the ``char*`` is in
use. Often enough, this only spans the call to a C function that
use. Often enough, this only spans the call to a C function that
receives the pointer as parameter. Special care must be taken,
receives the pointer as parameter. Special care must be taken,
however, when the C function stores the pointer for later use. Apart
however, when the C function stores the pointer for later use. Apart
from keeping a Python reference to the string, no manual memory
from keeping a Python reference to the string
object
, no manual memory
management is required.
management is required.
Decoding bytes to text
Decoding bytes to text
...
@@ -75,13 +95,7 @@ contains no null bytes::
...
@@ -75,13 +95,7 @@ contains no null bytes::
cdef char* some_c_string = c_call_returning_a_c_string()
cdef char* some_c_string = c_call_returning_a_c_string()
ustring = some_c_string.decode('UTF-8')
ustring = some_c_string.decode('UTF-8')
However, this will not work for strings that contain null bytes, and
And for strings where the length is known::
it is very inefficient for long strings, since Cython has to call
``strlen()`` on the C string first to find out the length by counting
the bytes up to the terminating null byte. In many cases, the user
code will know the length already, e.g. because a C function returned
it. In this case, it is much more efficient to tell Cython the exact
number of bytes by slicing the C string::
cdef char* c_string = NULL
cdef char* c_string = NULL
cdef Py_ssize_t length = 0
cdef Py_ssize_t length = 0
...
@@ -91,9 +105,9 @@ number of bytes by slicing the C string::
...
@@ -91,9 +105,9 @@ number of bytes by slicing the C string::
ustring = c_string[:length].decode('UTF-8')
ustring = c_string[:length].decode('UTF-8')
The same
can be used when the string contains null bytes, e.g. when it
The same
should be used when the string contains null bytes, e.g. when
uses an encoding like UCS-4, where each character is encoded in four
it
uses an encoding like UCS-4, where each character is encoded in four
bytes.
bytes
most of which tend to be 0
.
It is common practice to wrap string conversions (and non-trivial type
It is common practice to wrap string conversions (and non-trivial type
conversions in general) in dedicated functions, as this needs to be
conversions in general) in dedicated functions, as this needs to be
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment