-
Kirill Smelkov authored
We recently moved our custom UTF-8 encoding/decoding routines to Cython. Now we can start taking speedup advantage on C level to make our own UTF-8 decoder a bit less horribly slow on py2: name old time/op new time/op delta stddecode 752ns ± 0% 743ns ± 0% -1.19% (p=0.000 n=9+10) udecode 216µs ± 0% 75µs ± 0% -65.19% (p=0.000 n=9+10) stdencode 328ns ± 2% 327ns ± 1% ~ (p=0.252 n=10+9) bencode 34.1µs ± 1% 32.1µs ± 1% -5.92% (p=0.000 n=10+10) So it is ~ 3x speedup for u(), but still significantly slower compared to std unicode.decode('utf-8'). Only low-hanging fruit here to make _utf_decode_rune a bit more prompt, since it sits in the most inner loop. In the future _utf8_decode_surrogateescape might be reworked as well to avoid constructing resulting unicode via py-level list of py-unicode character objects. And similarly for _utf8_encode_surrogateescape. On py3 the performance of std and u/b decode/encode is approximately the same. /trusted-by @jerome /reviewed-on nexedi/pygolang!19
9cb7b210