golang_str: Speedup utf-8 decoding a bit on py2

We recently moved our custom UTF-8 encoding/decoding routines to Cython.
Now we can start taking speedup advantage on C level to make our own
UTF-8 decoder a bit less horribly slow on py2:

    name       old time/op  new time/op  delta
    stddecode   752ns ± 0%   743ns ± 0%   -1.19%  (p=0.000 n=9+10)
    udecode     216µs ± 0%    75µs ± 0%  -65.19%  (p=0.000 n=9+10)
    stdencode   328ns ± 2%   327ns ± 1%     ~     (p=0.252 n=10+9)
    bencode    34.1µs ± 1%  32.1µs ± 1%   -5.92%  (p=0.000 n=10+10)

So it is ~ 3x speedup for u(), but still significantly slower compared
to std unicode.decode('utf-8').

Only low-hanging fruit here to make _utf_decode_rune a bit more prompt,
since it sits in the most inner loop. In the future
_utf8_decode_surrogateescape might be reworked as well to avoid
constructing resulting unicode via py-level list of py-unicode character
objects. And similarly for _utf8_encode_surrogateescape.

On py3 the performance of std and u/b decode/encode is approximately the same.

/trusted-by @jerome
/reviewed-on !19
2 jobs for master in 0 seconds (queued for 1 second)
Status Job ID Name Coverage
  External
passed Pygolang.UnitTest-Master.Python2

00:01:16

passed Pygolang.UnitTest-Master.Python3

00:01:49