Uniform UTF8-based approach to strings

Context: together with Jérome we've been struggling with porting Zodbtools to Python3 for several years. Despite several incremental attempts[1,2,3] we are not there yet with the main difficulty being backward compatibility breakage that Python3 did for bytes and unicode. During my last trial this spring, after I've tried once again to finish this porting and could not reach satisfactory result, I've finally decided to do something about this at the root of the cause: at the level of strings - where backward compatibility was broken - with the idea to fix everything once and for all. In 2018 in "Python 3 Losses: Nexedi Perspective"[4] and associated "cost overview"[5] Jean-Paul highlighted the problem of strings backward compatibility breakage, that Python 3 did, as the major one. In 2019 we had some conversations with Jérome about this topic as well[6,7]. In 2020 I've started to approach it with `b` and `u` that provide always-working conversion in between bytes and unicode[8], and via limited usage of custom bytes- and unicode- like types that are interoperable with both bytes and unicode simultaneously[9]. Today, with this work, I'm finally exposing those types for general usage, so that bytes/unicode problem could be handled automatically. The overview of the functionality is provided below: ---- 8< ---- Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with the idea to make working with byte- and unicode- strings easy and transparently interoperable: - `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` (*). - `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`. The conversion, in both encoding and decoding, never fails and never looses information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity even if bytes data is not valid UTF-8. Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity. Semantically `bstr` is array of bytes, while `ustr` is array of unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and unicode character correspondingly (+). However it is possible to yield unicode character when iterating `bstr` via `uiter`, and to yield byte character when iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of the time, and `ustr` only needs to be used for random access to string characters. See [Strings, bytes, runes and characters in Go](https://blog.golang.org/strings) for overview of this approach. Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce to `ustr`. When the coercion happens, `bytes` and `bytearray`, similarly to `bstr`, are also treated as UTF8-encoded strings. `bstr` and `ustr` are meant to be drop-in replacements for standard `str`/`unicode` classes. They support all methods of `str`/`unicode` and in particular their constructors accept arbitrary objects and either convert or stringify them. For cases when no stringification is desired, and one only wants to convert `bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer` interface (%), to Pygolang string, `b` and `u` provide way to make sure an object is either `bstr` or `ustr` correspondingly. Usage example: ```py s = b('привет') # s is bstr corresponding to UTF-8 encoding of 'привет'. s += ' мир' # s is b('привет мир') for c in uiter(s): # c will iterate through ... # [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')] # the following gives b('привет мир труд май') b('привет %s %s %s') % (u'мир', # raw unicode u'труд'.encode('utf-8'), # raw bytes u('май')) # ustr def f(s): s = u(s) # make sure s is ustr, decoding as UTF-8(^) if it was bstr, bytes, bytearray or buffer. ... # (^) the decoding never fails nor looses information. ``` (*) `unicode` on Python2, `str` on Python3. (+) ordinal of such byte and unicode character can be obtained via regular `ord`. For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal. (%) data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string. Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not. ---- 8< ---- With this e.g. zodbtools is finally ported to Python3 easily[10]. One note is that we change `b` and `u` to return `bstr`/`ustr` instead of `bytes`/`unicode`. This is change in behaviour, but I hope it won't break anything. The reason for this is that now-returned `bstr` and `ustr` are meant to be drop-in replacements for standard string types, and that there are not many existing `b` and `u` users. We just need to make sure that the places, that already use `b` and `u` continue to work. Those include Zodbtools, Nxdtest[11], and lonet[12], which should continue to work ok. @klaus, you once said that you use `b` and `u` somewhere as well. Please do not hesitate to let me know if this change causes any issues for you, and we will, hopefully, try to find a solution. Kirill /cc @jerome, @klaus, @kazuhiko, @vpelletier, @yusei, @tatuya /reviewed-and-discussed-on nexedi/pygolang!21 [1] nexedi/zodbtools!12 [2] nexedi/zodbtools!13 [3] nexedi/zodbtools!16 [4] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1 [5] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20 [6] nexedi/zodbtools!8 (comment 73726) [7] nexedi/zodbtools!13 (comment 81646) [8] nexedi/pygolang@bcb95cd5 [9] nexedi/pygolang@edc7aaab [10] nexedi/zodbtools@9861c136 [11] https://lab.nexedi.com/nexedi/nxdtest [12] https://lab.nexedi.com/kirr/go123/blob/master/xnet/lonet/__init__.py

Uniform UTF8-based approach to strings
Context: together with Jérome we've been struggling with porting Zodbtools to Python3 for several years. Despite several incremental attempts[1,2,3] we are not there yet with the main difficulty being backward compatibility breakage that Python3 did for bytes and unicode. During my last trial this spring, after I've tried once again to finish this porting and could not reach satisfactory result, I've finally decided to do something about this at the root of the cause: at the level of strings - where backward compatibility was broken - with the idea to fix everything once and for all. In 2018 in "Python 3 Losses: Nexedi Perspective"[4] and associated "cost overview"[5] Jean-Paul highlighted the problem of strings backward compatibility breakage, that Python 3 did, as the major one. In 2019 we had some conversations with Jérome about this topic as well[6,7]. In 2020 I've started to approach it with `b` and `u` that provide always-working conversion in between bytes and unicode[8], and via limited usage of custom bytes- and unicode- like types that are interoperable with both bytes and unicode simultaneously[9]. Today, with this work, I'm finally exposing those types for general usage, so that bytes/unicode problem could be handled automatically. The overview of the functionality is provided below: ---- 8< ---- Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with the idea to make working with byte- and unicode- strings easy and transparently interoperable: - `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` (*). - `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`. The conversion, in both encoding and decoding, never fails and never looses information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity even if bytes data is not valid UTF-8. Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity. Semantically `bstr` is array of bytes, while `ustr` is array of unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and unicode character correspondingly (+). However it is possible to yield unicode character when iterating `bstr` via `uiter`, and to yield byte character when iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of the time, and `ustr` only needs to be used for random access to string characters. See [Strings, bytes, runes and characters in Go](https://blog.golang.org/strings) for overview of this approach. Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce to `ustr`. When the coercion happens, `bytes` and `bytearray`, similarly to `bstr`, are also treated as UTF8-encoded strings. `bstr` and `ustr` are meant to be drop-in replacements for standard `str`/`unicode` classes. They support all methods of `str`/`unicode` and in particular their constructors accept arbitrary objects and either convert or stringify them. For cases when no stringification is desired, and one only wants to convert `bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer` interface (%), to Pygolang string, `b` and `u` provide way to make sure an object is either `bstr` or `ustr` correspondingly. Usage example: ```py s = b('привет') # s is bstr corresponding to UTF-8 encoding of 'привет'. s += ' мир' # s is b('привет мир') for c in uiter(s): # c will iterate through ... # [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')] # the following gives b('привет мир труд май') b('привет %s %s %s') % (u'мир', # raw unicode u'труд'.encode('utf-8'), # raw bytes u('май')) # ustr def f(s): s = u(s) # make sure s is ustr, decoding as UTF-8(^) if it was bstr, bytes, bytearray or buffer. ... # (^) the decoding never fails nor looses information. ``` (*) `unicode` on Python2, `str` on Python3. (+) ordinal of such byte and unicode character can be obtained via regular `ord`. For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal. (%) data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string. Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not. ---- 8< ---- With this e.g. zodbtools is finally ported to Python3 easily[10]. One note is that we change `b` and `u` to return `bstr`/`ustr` instead of `bytes`/`unicode`. This is change in behaviour, but I hope it won't break anything. The reason for this is that now-returned `bstr` and `ustr` are meant to be drop-in replacements for standard string types, and that there are not many existing `b` and `u` users. We just need to make sure that the places, that already use `b` and `u` continue to work. Those include Zodbtools, Nxdtest[11], and lonet[12], which should continue to work ok. @klaus, you once said that you use `b` and `u` somewhere as well. Please do not hesitate to let me know if this change causes any issues for you, and we will, hopefully, try to find a solution. Kirill /cc @jerome, @klaus, @kazuhiko, @vpelletier, @yusei, @tatuya /reviewed-and-discussed-on nexedi/pygolang!21 [1] nexedi/zodbtools!12 [2] nexedi/zodbtools!13 [3] nexedi/zodbtools!16 [4] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1 [5] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20 [6] nexedi/zodbtools!8 (comment 73726) [7] nexedi/zodbtools!13 (comment 81646) [8] nexedi/pygolang@bcb95cd5 [9] nexedi/pygolang@edc7aaab [10] nexedi/zodbtools@9861c136 [11] https://lab.nexedi.com/nexedi/nxdtest [12] https://lab.nexedi.com/kirr/go123/blob/master/xnet/lonet/__init__.py
50b3808c · Kirill Smelkov · f59a785d · 5bf08f8b · 50b3808c · 50b3808c
Commit 50b3808c authored Feb 20, 2025 by Kirill Smelkov
39 changed files
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -2,6 +2,9 @@ include COPYING README.rst CHANGELOG.rst tox.ini pyproject.toml trun .lsan-ignor
 include golang/libgolang.h
 include golang/runtime/libgolang.cpp
 include golang/runtime/libpyxruntime.cpp
+include golang/runtime/platform.h
+include golang/runtime.h
+include golang/runtime.cpp
 include golang/pyx/runtime.h
 include golang/pyx/testprog/golang_dso_user/dsouser/dso.h
 include golang/pyx/testprog/golang_dso_user/dsouser/dso.cpp

--- a/README.rst
+++ b/README.rst
@@ -10,7 +10,7 @@ Package `golang` provides Go-like features for Python:
 - `func` allows to define methods separate from class.
 - `defer` allows to schedule a cleanup from the main control flow.
 - `error` and package `errors` provide error chaining.
- `b` and `u` provide way to make sure an object is either bytes or unicode.
+- `b`, `u` and `bstr`/`ustr` provide uniform UTF8-based approach to strings.
 - `gimport` allows to import python modules by full path in a Go workspace.
 Package `golang.pyx` provides__ similar features for Cython/nogil.
@@ -229,19 +229,64 @@ __ https://www.python.org/dev/peps/pep-3134/
 Strings
 -------
-`b` and `u` provide way to make sure an object is either bytes or unicode.
+Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with
-`b(obj)` converts str/unicode/bytes obj to UTF-8 encoded bytestring, while
+the idea to make working with byte- and unicode- strings easy and transparently
-`u(obj)` converts str/unicode/bytes obj to unicode string. For example::
+interoperable:
-   b("привет мир")   # -> gives bytes corresponding to UTF-8 encoding of "привет мир".
+- `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` [*]_.
+- `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`.
-   def f(s):
+The conversion, in both encoding and decoding, never fails and never looses
-      s = u(s)       # make sure s is unicode, decoding as UTF-8(*) if it was bytes.
+information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity
-      ...            # (*) but see below about lack of decode errors.
+even if bytes data is not valid UTF-8.
+Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity.
+Semantically `bstr` is array of bytes, while `ustr` is array of
+unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and
+unicode character correspondingly [*]_. However it is possible to yield unicode
+character when iterating `bstr` via `uiter`, and to yield byte character when
+iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of
+the time, and `ustr` only needs to be used for random access to string
+characters.  See `Strings, bytes, runes and characters in Go`__ for overview of
+this approach.
+__ https://blog.golang.org/strings
+Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while
+operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce
+to `ustr`.  When the coercion happens, `bytes` and `bytearray`, similarly to
+`bstr`, are also treated as UTF8-encoded strings.
-The conversion in both encoding and decoding never fails and never looses
+`bstr` and `ustr` are meant to be drop-in replacements for standard
-information: `b(u(·))` and `u(b(·))` are always identity for bytes and unicode
+`str`/`unicode` classes. They support all methods of `str`/`unicode` and in
-correspondingly, even if bytes input is not valid UTF-8.
+particular their constructors accept arbitrary objects and either convert or stringify them. For
+cases when no stringification is desired, and one only wants to convert
+`bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer`
+interface [*]_, to Pygolang string, `b` and `u` provide way to make sure an
+object is either `bstr` or `ustr` correspondingly.
+Usage example::
+   s  = b('привет')     # s is bstr corresponding to UTF-8 encoding of 'привет'.
+   s += ' мир'          # s is b('привет мир')
+   for c in uiter(s):   # c will iterate through
+        ...             #     [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')]
+   # the following gives b('привет мир труд май')
+   b('привет %s %s %s') % (u'мир',                  # raw unicode
+                           u'труд'.encode('utf-8'), # raw bytes
+                           u('май'))                # ustr
+   def f(s):
+      s = u(s)          # make sure s is ustr, decoding as UTF-8(*) if it was bstr, bytes, bytearray or buffer.
+      ...               # (*) the decoding never fails nor looses information.
+.. [*] `unicode` on Python2, `str` on Python3.
+.. [*] | ordinal of such byte and unicode character can be obtained via regular `ord`.
+       | For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal.
+.. [*] | data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string.
+       | Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not.
 Import

--- a/golang/.gitignore
+++ b/golang/.gitignore
@@ -9,6 +9,7 @@
 /_io.cpp
 /_os.cpp
 /_os_test.cpp
+/_strconv.cpp
 /_strings_test.cpp
 /_sync.cpp
 /_sync_test.cpp

--- a/golang/__init__.py
+++ b/golang/__init__.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2024  Nexedi SA and Contributors.
+# Copyright (C) 2018-2025  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -24,7 +24,7 @@
 - `func` allows to define methods separate from class.
 - `defer` allows to schedule a cleanup from the main control flow.
 - `error` and package `errors` provide error chaining.
- `b` and `u` provide way to make sure an object is either bytes or unicode.
+- `b`, `u`, `bstr`/`ustr` and `biter`/`uiter` provide uniform UTF8-based approach to strings.
 - `gimport` allows to import python modules by full path in a Go workspace.
 See README for thorough overview.
@@ -36,7 +36,8 @@ from __future__ import print_function, absolute_import
 __version__ = "0.1"
 __all__ = ['go', 'chan', 'select', 'default', 'nilchan', 'defer', 'panic',
-           'recover', 'func', 'error', 'b', 'u', 'gimport']
+           'recover', 'func', 'error', 'b', 'u', 'bstr', 'ustr', 'biter', 'uiter', 'bbyte', 'uchr',
+           'gimport']
 import setuptools_dso
 setuptools_dso.dylink_prepare_dso('golang.runtime.libgolang')
@@ -369,12 +370,11 @@ from ._golang import    \
    pypanic     as panic,   \
    pyerror     as error,   \
    pyb         as b,       \
-    pyu         as u
+    pybstr      as bstr,    \
+    pybbyte     as bbyte,   \
-# import golang.strconv into _golang from here to workaround cyclic golang ↔ strconv dependency
+    pyu         as u,       \
-def _():
+    pyustr      as ustr,    \
-    from . import _golang
+    pyuchr      as uchr,    \
-    from . import strconv
+    pybiter     as biter,   \
-    _golang.pystrconv = strconv
+    pyuiter     as uiter,   \
-_()
+    _butf8b
-del _
--- a/golang/_golang.pxd
+++ b/golang/_golang.pxd
 # cython: language_level=2
-# Copyright (C) 2019-2022  Nexedi SA and Contributors.
+# Copyright (C) 2019-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -43,6 +43,7 @@ In addition to Cython/nogil API, golang.pyx provides runtime for golang.py:
 - Python-level channels are represented by pychan + pyselect.
 - Python-level error is represented by pyerror.
 - Python-level panic is represented by pypanic.
+- Python-level strings are represented by pybstr/pyustr and pyb/pyu.
 """
@@ -64,6 +65,9 @@ cdef extern from *:
 # on the edge of Python/nogil world.
 from libcpp.string cimport string  # golang::string = std::string
 cdef extern from "golang/libgolang.h" namespace "golang" nogil:
+    ctypedef unsigned char  byte
+    ctypedef signed int     rune  # = int32
    void panic(const char *)
    const char *recover()
@@ -265,4 +269,11 @@ cdef class pyerror(Exception):
    cdef object from_error (error err) # -> pyerror | None
+# strings
+cpdef pyb(s) # -> bstr
+cpdef pyu(s) # -> ustr
 cdef __pystr(object obj)
+cdef (rune, int) _utf8_decode_rune(const byte[::1] s)
+cdef unicode _xunichr(rune i)
--- a/golang/_golang.pyx
+++ b/golang/_golang.pyx
@@ -3,7 +3,7 @@
 # cython: binding=False
 # cython: c_string_type=str, c_string_encoding=utf8
 # distutils: language = c++
-# distutils: depends = libgolang.h os/signal.h _golang_str.pyx
+# distutils: depends = libgolang.h os/signal.h unicode/utf8.h _golang_str.pyx _golang_str_pickle.pyx
 #
 # Copyright (C) 2018-2024  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>

--- a/golang/_golang_str.pyx
+++ b/golang/_golang_str.pyx
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2023  Nexedi SA and Contributors.
+# Copyright (C) 2018-2025  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -22,143 +22,1189 @@
 It is included from _golang.pyx .
 """
+from golang.unicode cimport utf8
+from cpython cimport PyUnicode_AsUnicode, PyUnicode_GetSize, PyUnicode_FromUnicode
 from cpython cimport PyUnicode_DecodeUTF8
+from cpython cimport PyTypeObject, Py_TYPE, reprfunc, richcmpfunc, binaryfunc
+from cpython cimport Py_EQ, Py_NE, Py_LT, Py_GT, Py_LE, Py_GE
+from cpython.iterobject cimport PySeqIter_New
+from cpython cimport PyThreadState_GetDict, PyDict_SetItem
+from cpython cimport PyObject_CheckBuffer
+cdef extern from "Python.h":
+    PyTypeObject PyBytes_Type
+    ctypedef struct PyBytesObject:
+        pass
-from libc.stdint cimport uint8_t
+cdef extern from "Python.h":
+    PyTypeObject PyUnicode_Type
+    ctypedef struct PyUnicodeObject:
+        pass
-pystrconv = None  # = golang.strconv imported at runtime (see __init__.py)
+cdef extern from "Python.h":
+    """
+    #if PY_MAJOR_VERSION < 3
+    // on py2, PyDict_GetItemWithError is called _PyDict_GetItemWithError
+    // NOTE Cython3 provides PyDict_GetItemWithError out of the box
+    # define PyDict_GetItemWithError _PyDict_GetItemWithError
+    #endif
+    """
+    PyObject* PyDict_GetItemWithError(object, object) except? NULL  # borrowed ref
-def pyb(s): # -> bytes
+    Py_ssize_t PY_SSIZE_T_MAX
-    """b converts str/unicode/bytes s to UTF-8 encoded bytestring.
+    void PyType_Modified(PyTypeObject *)
-       Bytes input is preserved as-is:
+cdef extern from "Python.h":
+    ctypedef int (*initproc)(object, PyObject *, PyObject *) except -1
+    ctypedef struct _XPyTypeObject "PyTypeObject":
+        PyObject* tp_new(PyTypeObject*, PyObject*, PyObject*) except NULL
+        initproc  tp_init
+        PySequenceMethods *tp_as_sequence
-          b(bytes_input) == bytes_input
+    ctypedef struct PySequenceMethods:
+        binaryfunc sq_concat
+        binaryfunc sq_inplace_concat
+        object (*sq_slice) (object, Py_ssize_t, Py_ssize_t)     # present only on py2
-       Unicode input is UTF-8 encoded. The encoding always succeeds.
-       b is reverse operation to u - the following invariant is always true:
-          b(u(bytes_input)) == bytes_input
+from cython cimport no_gc
-       TypeError is raised if type(s) is not one of the above.
+from libc.stdio cimport FILE
-       See also: u.
+from golang cimport strconv
-    """
+import codecs as pycodecs
-    if isinstance(s, bytes):                    # py2: str      py3: bytes
+import string as pystring
-        pass
+import types as pytypes
-    elif isinstance(s, unicode):                # py2: unicode  py3: str
+import functools as pyfunctools
-        s = _utf8_encode_surrogateescape(s)
+import re as pyre
-    else:
-        raise TypeError("b: invalid type %s" % type(s))
-    return s
-def pyu(s): # -> unicode
+# zbytes/zunicode point to original std bytes/unicode types even if they will be patched.
-    """u converts str/unicode/bytes s to unicode string.
+# we use them to invoke original bytes/unicode methods.
+cdef object zbytes   = <object>(&PyBytes_Type)
+cdef object zunicode = <object>(&PyUnicode_Type)
+# pybstr/pyustr point to version of bstr/ustr types that is actually in use:
+# - when bytes/unicode are not patched -> to _pybstr/_pyustr
+# - when bytes/unicode will be patched -> to bytes/unicode to where original
+#   _pybstr/_pyustr were copied during bytes/unicode patching.
+# at runtime the code should use pybstr/pyustr instead of _pybstr/_pyustr.
+pybstr = _pybstr    # initially point to -> _pybstr/_pyustr
+pyustr = _pyustr    # TODO -> cdef for speed
+cpdef pyb(s): # -> bstr
+    """b converts object to bstr.
+       - For bstr the same object is returned.
+       - For bytes, bytearray, or object with buffer interface, the data is
+         preserved as-is and only result type is changed to bstr.
+       - For ustr/unicode the data is UTF-8 encoded. The encoding always succeeds.
+       TypeError is raised if type(s) is not one of the above.
+       b is reverse operation to u - the following invariant is always true:
-       Unicode input is preserved as-is:
+          b(u(bytes_input))  is bstr with the same data as bytes_input.
-          u(unicode_input) == unicode_input
+       See also: u, bstr/ustr, biter/uiter.
+    """
+    bs = _pyb(pybstr, s)
+    if bs is None:
+        raise TypeError("b: invalid type %s" % type(s))
+    return bs
-       Bytes input is UTF-8 decoded. The decoding always succeeds and input
+cpdef pyu(s): # -> ustr
+    """u converts object to ustr.
+       - For ustr the same object is returned.
+       - For unicode the data is preserved as-is and only result type is changed to ustr.
+       - For bstr, bytes, bytearray, or object with buffer interface, the data is UTF-8 decoded.
+         The decoding always succeeds and input
         information is not lost: non-valid UTF-8 bytes are decoded into
         surrogate codes ranging from U+DC80 to U+DCFF.
-       u is reverse operation to b - the following invariant is always true:
-          u(b(unicode_input)) == unicode_input
       TypeError is raised if type(s) is not one of the above.
-       See also: b.
+       u is reverse operation to b - the following invariant is always true:
+          u(b(unicode_input))  is ustr with the same data as unicode_input.
+       See also: b, bstr/ustr, biter/uiter.
    """
-    if isinstance(s, unicode):                  # py2: unicode  py3: str
+    us = _pyu(pyustr, s)
-        pass
+    if us is None:
-    elif isinstance(s, bytes):                  # py2: str      py3: bytes
-        s = _utf8_decode_surrogateescape(s)
-    else:
        raise TypeError("u: invalid type %s" % type(s))
+    return us
+cdef _pyb(bcls, s): # -> ~bstr | None
+    if type(s) is bcls:
        return s
+    if isinstance(s, bytes):
+        if type(s) is not bytes:
+            s = _bdata(s)
+    elif isinstance(s, unicode):
+        s = _utf8_encode_surrogateescape(s)
+    else:
+        s = _ifbuffer_data(s) # bytearray and buffer
+        if s is None:
+            return None
+    assert type(s) is bytes
+    # like  zbytes.__new__(bcls, s)  but call zbytes.tp_new directly
+    # else tp_new_wrapper complains because pybstr.tp_new != zbytes.tp_new
+    argv = (s,)
+    obj = <object>(<_XPyTypeObject*>zbytes).tp_new(<PyTypeObject*>bcls, <PyObject*>argv, NULL)
+    Py_DECREF(obj)
+    return obj
+cdef _pyu(ucls, s): # -> ~ustr | None
+    if type(s) is ucls:
+        return s
-# __pystr converts obj to str of current python:
+    if isinstance(s, unicode):
+        if type(s) is not unicode:
+            s = _udata(s)
+    else:
+        _ = _ifbuffer_data(s) # bytearray and buffer
+        if _ is not None:
+            s = _
+        if isinstance(s, bytes):
+            s = _utf8_decode_surrogateescape(s)
+        else:
+            return None
+    assert type(s) is unicode
+    # like  zunicode .__new__(bcls, s)  but call zunicode.tp_new directly
+    # else tp_new_wrapper complains because pyustr.tp_new != zunicode.tp_new
+    argv = (s,)
+    obj = <object>(<_XPyTypeObject*>zunicode).tp_new(<PyTypeObject*>ucls, <PyObject*>argv, NULL)
+    Py_DECREF(obj)
+    return obj
+# _ifbuffer_data returns contained data if obj provides buffer interface.
+cdef _ifbuffer_data(obj): # -> bytes|None
+    if PyObject_CheckBuffer(obj):
+        if PY_MAJOR_VERSION >= 3:
+            return bytes(obj)
+        else:
+            # py2: bytes(memoryview)  returns  '<memory at ...>'
+            return bytes(bytearray(obj))
+    elif _XPyObject_CheckOldBuffer(obj):  # old-style buffer, py2-only
+        return bytes(_buffer_py2(obj))
+    else:
+        return None
+# _pyb_coerce coerces x from `b op x` to be used in operation with pyb.
+cdef _pyb_coerce(x):  # -> bstr|bytes
+    if isinstance(x, bytes):
+        return x
+    elif isinstance(x, (unicode, bytearray)):
+        return pyb(x)
+    else:
+        raise TypeError("b: coerce: invalid type %s" % type(x))
+# _pyu_coerce coerces x from `u op x` to be used in operation with pyu.
+cdef _pyu_coerce(x):  # -> ustr|unicode
+    if isinstance(x, unicode):
+        return x
+    elif isinstance(x, (bytes, bytearray)):
+        return pyu(x)
+    else:
+        raise TypeError("u: coerce: invalid type %s" % type(x))
+# _pybu_rcoerce coerces x from `x op b|u` to either bstr or ustr.
+# NOTE bytearray is handled outside of this function.
+cdef _pybu_rcoerce(x): # -> bstr|ustr
+    if isinstance(x, bytes):
+        return pyb(x)
+    elif isinstance(x, unicode):
+        return pyu(x)
+    else:
+        raise TypeError('b/u: coerce: invalid type %s' % type(x))
+# __pystr converts obj to ~str of current python:
 #
-#   - to bytes,   via b, if running on py2, or
+#   - to ~bytes,   via b, if running on py2, or
-#   - to unicode, via u, if running on py3.
+#   - to ~unicode, via u, if running on py3.
 #
 # It is handy to use __pystr when implementing __str__ methods.
 #
 # NOTE __pystr is currently considered to be internal function and should not
 # be used by code outside of pygolang.
 #
-# XXX we should be able to use _pystr, but py3's str verify that it must have
+# XXX we should be able to use pybstr, but py3's str verify that it must have
 # Py_TPFLAGS_UNICODE_SUBCLASS in its type flags.
-cdef __pystr(object obj):
+cdef __pystr(object obj): # -> ~str
    if PY_MAJOR_VERSION >= 3:
        return pyu(obj)
    else:
        return pyb(obj)
-# XXX cannot `cdef class`: github.com/cython/cython/issues/711
+def pybbyte(int i): # -> 1-byte bstr
-class _pystr(bytes):
+    """bbyte(i) returns 1-byte bstr with ordinal i."""
-    """_str is like bytes but can be automatically converted to Python unicode
+    return pyb(bytearray([i]))
-    string via UTF-8 decoding.
-    The decoding never fails nor looses information - see u for details.
+def pyuchr(int i):  # -> 1-character ustr
-    """
+    """uchr(i) returns 1-character ustr with unicode ordinal i."""
+    return pyu(unichr(i))
-    # don't allow to set arbitrary attributes.
-    # won't be needed after switch to -> `cdef class`
-    __slots__ = ()
+@no_gc                       # note setup.py assist this to compile despite
+cdef class _pybstr(bytes):   # https://github.com/cython/cython/issues/711
+    """bstr is byte-string.
-    # __bytes__ - no need
+    It is based on bytes and can automatically convert to/from unicode.
-    def __unicode__(self):  return pyu(self)
+    The conversion never fails and never looses information:
+        bstr → ustr → bstr
+    is always identity even if bytes data is not valid UTF-8.
+    Semantically bstr is array of bytes. Accessing its elements by [index] and
+    iterating it yield byte character. However it is possible to yield unicode
+    character when iterating bstr via uiter. In practice bstr + uiter is enough
+    99% of the time, and ustr only needs to be used for random access to string
+    characters. See https://blog.golang.org/strings for overview of this approach.
+    Operations in between bstr and ustr/unicode / bytes/bytearray coerce to bstr.
+    When the coercion happens, bytes and bytearray, similarly to bstr, are also
+    treated as UTF8-encoded strings.
+    bstr constructor accepts arbitrary objects and stringify them:
+    - if encoding and/or errors is specified, the object must provide buffer
+      interface. The data in the buffer is decoded according to provided
+      encoding/errors and further encoded via UTF-8 into bstr.
+    - if the object is bstr/ustr / unicode/bytes/bytearray - it is converted
+      to bstr. See b for details.
+    - otherwise bstr will have string representation of the object.
+    See also: b, ustr/u, biter/uiter.
+    """
+    # XXX due to "cannot `cdef class` with __new__" (https://github.com/cython/cython/issues/799)
+    # _pybstr.__new__ is hand-made in _pybstr_tp_new which invokes ↓ .____new__() .
+    @staticmethod
+    def ____new__(cls, object='', encoding=None, errors=None):
+        # encoding or errors  ->  object must expose buffer interface
+        if not (encoding is None and errors is None):
+            object = _buffer_decode(object, encoding, errors)
+        # _bstringify. Note: it handles bstr/ustr / unicode/bytes/bytearray as documented
+        object = _bstringify(object)
+        assert isinstance(object, (unicode, bytes)), object
+        bobj = _pyb(cls, object)
+        assert bobj is not None
+        return bobj
+    # __bytes__ converts string to bytes leaving string domain.
+    # NOTE __bytes__ and encode are the only operations that leave string domain.
+    # NOTE __bytes__ is used only by py3 and only for `bytes(obj)` and `b'%s/%b' % obj`.
+    def __bytes__(self):    return _bdata(self)  # -> bytes
+    def __unicode__(self):  return pyu(self)
    def __str__(self):
        if PY_MAJOR_VERSION >= 3:
            return pyu(self)
        else:
-            return self
+            return pyb(self)  # self  or  pybstr if it was subclass
+    def __repr__(self):
+        qself, nonascii_escape = _bpysmartquote_u3b2(self)
+        bs = _inbstringify_get()
+        if bs.inbstringify == 0  or  bs.inrepr:
+            if nonascii_escape:         # so that e.g. b(u'\x80') is represented as
+                qself = 'b' + qself     # b(b'\xc2\x80'),  not as b('\xc2\x80')
+            return "b(" + qself + ")"
+        else:
+            # [b('β')] goes as ['β'] when under _bstringify for %s
+            return qself
+    def __reduce_ex__(self, protocol):
+        return _bstr__reduce_ex__(self, protocol)
+    def __hash__(self):
+        # hash of the same unicode and UTF-8 encoded bytes is generally different
+        # -> we can't make hash(bstr) == both hash(bytes) and hash(unicode) at the same time.
+        # -> make hash(bstr) == hash(str type of current python) so that bstr
+        #    could be used as keys in dictionary interchangeably with native str type.
+        if PY_MAJOR_VERSION >= 3:
+            return hash(pyu(self))
+        else:
+            return zbytes.__hash__(self)
+    # == != < > <= >=
+    # NOTE all operations must succeed against any type so that bstr could be
+    # used as dict key and arbitrary three-way comparisons, done by python,
+    # work correctly. This means that on py2 e.g. `bstr > int` will behave
+    # exactly as builtin str and won't raise TypeError. On py3 TypeError is
+    # raised for such operations by python itself when it receives
+    # NotImplemented from all tried methods.
+    def __eq__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__eq__(a, b)
+    def __ne__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__ne__(a, b)
+    def __lt__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__lt__(a, _pyb_coerce(b))
+    def __gt__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__gt__(a, _pyb_coerce(b))
+    def __le__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__le__(a, _pyb_coerce(b))
+    def __ge__(a, b):
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zbytes.__ge__(a, _pyb_coerce(b))
+    # len - no need to override
+    # [], [:]
+    def __getitem__(self, idx):
+        x = zbytes.__getitem__(self, idx)
+        if type(idx) is slice:
+            return pyb(x)
+        else:
+            # bytes[i] returns 1-character bytestring(py2)  or  int(py3)
+            # we always return 1-character bytestring
+            if PY_MAJOR_VERSION >= 3:
+                return pybbyte(x)
+            else:
+                return pyb(x)
+    # __iter__
+    def __iter__(self):
+        if PY_MAJOR_VERSION >= 3:
+            return _pybstrIter(zbytes.__iter__(self))
+        else:
+            # on python 2 str does not have .__iter__
+            return PySeqIter_New(self)
+    # __contains__
+    def __contains__(self, key):
+        # NOTE on py3 bytes.__contains__ accepts numbers and buffers. We don't want to
+        # automatically coerce any of them to bytestrings
+        return zbytes.__contains__(self, _pyb_coerce(key))
+    # __add__, __radd__     (no need to override __iadd__)
+    def __add__(a, b):
+        # NOTE Cython < 3 does not automatically support __radd__ for cdef class
+        # https://cython.readthedocs.io/en/latest/src/userguide/migrating_to_cy30.html#arithmetic-special-methods
+        # see also https://github.com/cython/cython/issues/4750
+        if type(a) is not pybstr:
+            assert type(b) is pybstr
+            return b.__radd__(a)
+        try:
+            b = _pyb_coerce(b)
+        except TypeError:
+            if not hasattr(b, '__radd__'):
+                raise  # don't let python to handle e.g. bstr + memoryview automatically
+            return NotImplemented
+        return pyb(zbytes.__add__(a, b))
+    def __radd__(b, a):
+        # a.__add__(b) returned NotImplementedError, e.g. for unicode.__add__(bstr)
+        # u''  + b() -> u()     ; same as u() + b() -> u()
+        # b''  + b() -> b()     ; same as b() + b() -> b()
+        # barr + b() -> barr
+        if isinstance(a, bytearray):
+            # force `bytearray +=` to go via bytearray.sq_inplace_concat - see PyNumber_InPlaceAdd
+            return NotImplemented
+        a = _pybu_rcoerce(a)
+        return a.__add__(b)
+    # __mul__, __rmul__     (no need to override __imul__)
+    def __mul__(a, b):
+        if type(a) is not pybstr:
+            assert type(b) is pybstr
+            return b.__rmul__(a)
+        try:
+            _ = zbytes.__mul__(a, b)
+        except TypeError: # TypeError: `b` cannot be interpreted as an integer
+            return NotImplemented
+        return pyb(_)
+    def __rmul__(b, a):
+        return b.__mul__(a)
+    # %-formatting
+    def __mod__(a, b):
+        return _bprintf(a, b)
+    def __rmod__(b, a):
+        # ("..." % x)  calls  "x.__rmod__()" for string subtypes
+        # determine output type as in __radd__
+        if isinstance(a, bytearray):
+            # on py2 bytearray does not implement %
+            return NotImplemented   # no need to check for py3 - there our __rmod__ is not invoked
+        a = _pybu_rcoerce(a)
+        return a.__mod__(b)
+    # format
+    def format(self, *args, **kwargs):  return pyb(pyu(self).format(*args, **kwargs))
+    def format_map(self, mapping):      return pyb(pyu(self).format_map(mapping))
+    def __format__(self, format_spec):
+        # NOTE don't convert to b due to "TypeError: __format__ must return a str, not pybstr"
+        #      we are ok to return ustr even for format(bstr, ...) because in
+        #      practice format builtin is never used and it is only s.format()
+        #      that is used in programs. This way __format__ will be invoked
+        #      only internally.
+        #
+        # NOTE we are ok to use ustr.__format__ because the only format code
+        #      supported by bstr/ustr/unicode __format__ is 's', not e.g. 'r'.
+        return pyu(self).__format__(format_spec)
+    # encode/decode
+    #
+    # Encode encodes unicode representation of the string into bytes, leaving string domain.
+    # Decode decodes bytes   representation of the string into ustr, staying inside string domain.
+    #
+    # Both bstr and ustr are accepted by encode and decode treating them as two
+    # different representations of the same entity.
+    #
+    # On encoding, for bstr, the string representation is first converted to
+    # unicode and encoded to bytes from there. For ustr unicode representation
+    # of the string is directly encoded.
+    #
+    # On decoding, for ustr, the string representation is first converted to
+    # bytes and decoded to unicode from there. For bstr bytes representation of
+    # the string is directly decoded.
+    #
+    # NOTE __bytes__ and encode are the only operations that leave string domain.
+    def encode(self, encoding=None, errors=None): # -> bytes
+        encoding, errors = _encoding_with_defaults(encoding, errors)
+        if encoding == 'utf-8'  and  errors == 'surrogateescape':
+            return _bdata(self)
+        # on py2 e.g. bytes.encode('string-escape') works on bytes directly
+        if PY_MAJOR_VERSION < 3:
+            codec = _pycodecs_lookup_binary(encoding)
+            if codec is not None:
+                return codec.encode(self, errors)[0]
+        return pyu(self).encode(encoding, errors)
+    def decode(self, encoding=None, errors=None): # -> ustr | bstr on py2 for encodings like string-escape
+        encoding, errors = _encoding_with_defaults(encoding, errors)
+        if encoding == 'utf-8'  and  errors == 'surrogateescape':
+            x = _utf8_decode_surrogateescape(self)
+        else:
+            x = zbytes.decode(self, encoding, errors)
+        # on py2 e.g. bytes.decode('string-escape') returns bytes
+        if PY_MAJOR_VERSION < 3  and  isinstance(x, bytes):
+            return pyb(x)
+        return pyu(x)
+    # all other string methods
+    def capitalize(self):                       return pyb(pyu(self).capitalize())
+    def casefold(self):                         return pyb(pyu(self).casefold())
+    def center(self, width, fillchar=' '):      return pyb(pyu(self).center(width, fillchar))
+    def count(self, sub, start=None, end=None): return zbytes.count(self, _pyb_coerce(sub), start, end)
+    def endswith(self, suffix, start=None, end=None):
+        if isinstance(suffix, tuple):
+            for _ in suffix:
+                if self.endswith(_pyb_coerce(_), start, end):
+                    return True
+            return False
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zbytes.endswith(self, _pyb_coerce(suffix), start, end)
+    def expandtabs(self, tabsize=8):            return pyb(pyu(self).expandtabs(tabsize))
+    # NOTE find/index & friends should return byte-position, not unicode-position
+    def find(self, sub, start=None, end=None):  return zbytes.find(self, _pyb_coerce(sub), start, end)
+    def index(self, sub, start=None, end=None): return zbytes.index(self, _pyb_coerce(sub), start, end)
+    def isalnum(self):      return pyu(self).isalnum()
+    def isalpha(self):      return pyu(self).isalpha()
+    # isascii(self)         no need to override
+    def isdecimal(self):    return pyu(self).isdecimal()
+    def isdigit(self):      return pyu(self).isdigit()
+    def isidentifier(self): return pyu(self).isidentifier()
+    def islower(self):      return pyu(self).islower()
+    def isnumeric(self):    return pyu(self).isnumeric()
+    def isprintable(self):  return pyu(self).isprintable()
+    def isspace(self):      return pyu(self).isspace()
+    def istitle(self):      return pyu(self).istitle()
+    def join(self, iterable):               return pyb(zbytes.join(self, (_pyb_coerce(_) for _ in iterable)))
+    def ljust(self, width, fillchar=' '):   return pyb(pyu(self).ljust(width, fillchar))
+    def lower(self):                        return pyb(pyu(self).lower())
+    def lstrip(self, chars=None):           return pyb(pyu(self).lstrip(chars))
+    def partition(self, sep):               return tuple(pyb(_) for _ in zbytes.partition(self, _pyb_coerce(sep)))
+    def removeprefix(self, prefix):         return pyb(pyu(self).removeprefix(prefix))
+    def removesuffix(self, suffix):         return pyb(pyu(self).removesuffix(suffix))
+    def replace(self, old, new, count=-1):  return pyb(zbytes.replace(self, _pyb_coerce(old), _pyb_coerce(new), count))
+    # NOTE rfind/rindex & friends should return byte-position, not unicode-position
+    def rfind(self, sub, start=None, end=None):   return zbytes.rfind(self, _pyb_coerce(sub), start, end)
+    def rindex(self, sub, start=None, end=None):  return zbytes.rindex(self, _pyb_coerce(sub), start, end)
+    def rjust(self, width, fillchar=' '):   return pyb(pyu(self).rjust(width, fillchar))
+    def rpartition(self, sep):              return tuple(pyb(_) for _ in zbytes.rpartition(self, _pyb_coerce(sep)))
+    def rsplit(self, sep=None, maxsplit=-1):
+        v = pyu(self).rsplit(sep, maxsplit)
+        return list([pyb(_) for _ in v])
+    def rstrip(self, chars=None):           return pyb(pyu(self).rstrip(chars))
+    def split(self, sep=None, maxsplit=-1):
+        v = pyu(self).split(sep, maxsplit)
+        return list([pyb(_) for _ in v])
+    def splitlines(self, keepends=False):   return list(pyb(_) for _ in pyu(self).splitlines(keepends))
+    def startswith(self, prefix, start=None, end=None):
+        if isinstance(prefix, tuple):
+            for _ in prefix:
+                if self.startswith(_pyb_coerce(_), start, end):
+                    return True
+            return False
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zbytes.startswith(self, _pyb_coerce(prefix), start, end)
+    def strip(self, chars=None):            return pyb(pyu(self).strip(chars))
+    def swapcase(self):                     return pyb(pyu(self).swapcase())
+    def title(self):                        return pyb(pyu(self).title())
+    def translate(self, table, delete=None):
+        # bytes mode  (compatibility with str/py2)
+        if table is None  or isinstance(table, zbytes)  or  delete is not None:
+            if delete is None:  delete = b''
+            return pyb(zbytes.translate(self, table, delete))
+        # unicode mode
+        else:
+            return pyb(pyu(self).translate(table))
+    def upper(self):                        return pyb(pyu(self).upper())
+    def zfill(self, width):                 return pyb(pyu(self).zfill(width))
+    @staticmethod
+    def maketrans(x=None, y=None, z=None):
+        return pyustr.maketrans(x, y, z)
+# hand-made _pybstr.__new__  (workaround for https://github.com/cython/cython/issues/799)
+cdef PyObject* _pybstr_tp_new(PyTypeObject* _cls, PyObject* _argv, PyObject* _kw) except NULL:
+    argv = ()
+    if _argv != NULL:
+        argv = <object>_argv
+    kw = {}
+    if _kw != NULL:
+        kw = <object>_kw
+    cdef object x = _pybstr.____new__(<object>_cls, *argv, **kw)
+    Py_INCREF(x)
+    return <PyObject*>x
+(<_XPyTypeObject*>_pybstr).tp_new   = &_pybstr_tp_new
+# bytes uses "optimized" and custom .tp_basicsize and .tp_itemsize:
+# https://github.com/python/cpython/blob/v2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L26-L32
+# https://github.com/python/cpython/blob/v2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L3816-L3820
+(<PyTypeObject*>_pybstr) .tp_basicsize  =  (<PyTypeObject*>zbytes).tp_basicsize
+(<PyTypeObject*>_pybstr) .tp_itemsize   =  (<PyTypeObject*>zbytes).tp_itemsize
+# make sure _pybstr C layout corresponds to bytes C layout exactly
+# we patched cython to allow from-bytes cdef class inheritance and we also set
+# .tp_basicsize directly above. All this works ok only if C layouts for _pybstr
+# and bytes are completely the same.
+assert sizeof(_pybstr) == sizeof(PyBytesObject)
-cdef class _pyunicode(unicode):
-    """_unicode is like unicode(py2)|str(py3) but can be automatically converted
-    to bytes via UTF-8 encoding.
-    The encoding always succeeds - see b for details.
+@no_gc
+cdef class _pyustr(unicode):
+    """ustr is unicode-string.
+    It is based on unicode and can automatically convert to/from bytes.
+    The conversion never fails and never looses information:
+        ustr → bstr → ustr
+    is always identity even if bytes data is not valid UTF-8.
+    ustr is similar to standard unicode type - iterating and accessing its
+    elements by [index] yields unicode characters.
+    ustr complements bstr and is meant to be used only in situations when
+    random access to string characters is needed. Otherwise bstr + uiter is
+    more preferable and should be enough 99% of the time.
+    Operations in between ustr and bstr/bytes/bytearray / unicode coerce to ustr.
+    When the coercion happens, bytes and bytearray, similarly to bstr, are also
+    treated as UTF8-encoded strings.
+    ustr constructor, similarly to the one in bstr, accepts arbitrary objects
+    and stringify them. Please refer to bstr and u documentation for details.
+    See also: u, bstr/b, biter/uiter.
    """
-    def __bytes__(self):    return pyb(self)
+    # XXX due to "cannot `cdef class` with __new__" (https://github.com/cython/cython/issues/799)
-    # __unicode__ - no need
+    # _pyustr.__new__ is hand-made in _pyustr_tp_new which invokes ↓ .____new__() .
+    @staticmethod
+    def ____new__(cls, object='', encoding=None, errors=None):
+        # encoding or errors  ->  object must expose buffer interface
+        if not (encoding is None and errors is None):
+            object = _buffer_decode(object, encoding, errors)
+        # _bstringify. Note: it handles bstr/ustr / unicode/bytes/bytearray as documented
+        object = _bstringify(object)
+        assert isinstance(object, (unicode, bytes)), object
+        uobj = _pyu(cls, object)
+        assert uobj is not None
+        return uobj
+    # __bytes__ converts string to bytes leaving string domain.
+    # see bstr.__bytes__ for more details.
+    def __bytes__(self):    return _bdata(pyb(self))  # -> bytes
+    def __unicode__(self):  return pyu(self)  # see __str__
    def __str__(self):
        if PY_MAJOR_VERSION >= 3:
-            return self
+            return pyu(self)  # self  or  pyustr if it was subclass
        else:
            return pyb(self)
-# initialize .tp_print for _pystr so that this type could be printed.
+    def __repr__(self):
+        qself, nonascii_escape = _upysmartquote_u3b2(self)
+        bs = _inbstringify_get()
+        if bs.inbstringify == 0  or  bs.inrepr:
+            if nonascii_escape:
+                qself = 'b'+qself       # see bstr.__repr__
+            return "u(" + qself + ")"
+        else:
+            # [u('β')] goes as ['β'] when under _bstringify for %s
+            return qself
+    def __reduce_ex__(self, protocol):
+        return _ustr__reduce_ex__(self, protocol)
+    def __hash__(self):
+        # see _pybstr.__hash__ for why we stick to hash of current str
+        if PY_MAJOR_VERSION >= 3:
+            return zunicode.__hash__(self)
+        else:
+            return hash(pyb(self))
+    # == != < > <= >=
+    # NOTE all operations must succeed against any type.
+    # See bstr for details.
+    def __eq__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__eq__(a, b)
+    def __ne__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__ne__(a, b)
+    def __lt__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__lt__(a, _pyu_coerce(b))
+    def __gt__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__gt__(a, _pyu_coerce(b))
+    def __le__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__le__(a, _pyu_coerce(b))
+    def __ge__(a, b):
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            return NotImplemented
+        return zunicode.__ge__(a, _pyu_coerce(b))
+    # len - no need to override
+    # [], [:]
+    def __getitem__(self, idx):
+        return pyu(zunicode.__getitem__(self, idx))
+    # __iter__
+    def __iter__(self):
+        if PY_MAJOR_VERSION >= 3:
+            return _pyustrIter(zunicode.__iter__(self))
+        else:
+            # on python 2 unicode does not have .__iter__
+            return PySeqIter_New(self)
+    # __contains__
+    def __contains__(self, key):
+        return zunicode.__contains__(self, _pyu_coerce(key))
+    # __add__, __radd__     (no need to override __iadd__)
+    def __add__(a, b):
+        # NOTE Cython < 3 does not automatically support __radd__ for cdef class
+        # https://cython.readthedocs.io/en/latest/src/userguide/migrating_to_cy30.html#arithmetic-special-methods
+        # see also https://github.com/cython/cython/issues/4750
+        if type(a) is not pyustr:
+            assert type(b) is pyustr
+            return b.__radd__(a)
+        try:
+            b = _pyu_coerce(b)
+        except TypeError:
+            if not hasattr(b, '__radd__'):
+                raise  # don't let py2 to handle e.g. unicode + buffer automatically
+            return NotImplemented
+        return pyu(zunicode.__add__(a, b))
+    def __radd__(b, a):
+        # a.__add__(b) returned NotImplementedError, e.g. for unicode.__add__(bstr)
+        # u''  + u() -> u()     ; same as u() + u() -> u()
+        # b''  + u() -> b()     ; same as b() + u() -> b()
+        # barr + u() -> barr
+        if isinstance(a, bytearray):
+            # force `bytearray +=` to go via bytearray.sq_inplace_concat - see PyNumber_InPlaceAdd
+            # for pyustr this relies on patch to bytearray.sq_inplace_concat to accept ustr as bstr
+            return  NotImplemented
+        a = _pybu_rcoerce(a)
+        return a.__add__(b)
+    # __mul__, __rmul__     (no need to override __imul__)
+    def __mul__(a, b):
+        if type(a) is not pyustr:
+            assert type(b) is pyustr
+            return b.__rmul__(a)
+        try:
+            _ = zunicode.__mul__(a, b)
+        except TypeError: # TypeError: `b` cannot be interpreted as an integer
+            return NotImplemented
+        return pyu(_)
+    def __rmul__(b, a):
+        return b.__mul__(a)
+    # %-formatting
+    def __mod__(a, b):
+        return pyu(pyb(a).__mod__(b))
+    def __rmod__(b, a):
+        # ("..." % x)  calls  "x.__rmod__()" for string subtypes
+        # determine output type as in __radd__
+        if isinstance(a, bytearray):
+            return NotImplemented   # see bstr.__rmod__
+        a = _pybu_rcoerce(a)
+        return a.__mod__(b)
+    # format
+    def format(self, *args, **kwargs):
+        return pyu(_bvformat(self, args, kwargs))
+    def format_map(self, mapping):
+        return pyu(_bvformat(self, (), mapping))
+    def __format__(self, format_spec):
+        # NOTE not e.g. `_bvformat(_pyu_coerce(format_spec), (self,))` because
+        #      the only format code that string.__format__ should support is
+        #      's', not e.g. 'r'.
+        return pyu(zunicode.__format__(self, format_spec))
+    # encode/decode (see bstr for details)
+    def encode(self, encoding=None, errors=None): # -> bytes
+        encoding, errors = _encoding_with_defaults(encoding, errors)
+        if encoding == 'utf-8'  and  errors == 'surrogateescape':
+            return _utf8_encode_surrogateescape(self)
+        # on py2 e.g. 'string-escape' works on bytes
+        if PY_MAJOR_VERSION < 3:
+            codec = _pycodecs_lookup_binary(encoding)
+            if codec is not None:
+                return codec.encode(pyb(self), errors)[0]
+        return zunicode.encode(self, encoding, errors)
+    def decode(self, encoding=None, errors=None): # -> ustr | bstr for  encodings like string-escape
+        encoding, errors = _encoding_with_defaults(encoding, errors)
+        if encoding == 'utf-8'  and  errors == 'surrogateescape':
+            return pyu(self)
+        return pyb(self).decode(encoding, errors)
+    # all other string methods
+    def capitalize(self):   return pyu(zunicode.capitalize(self))
+    def casefold(self):     return pyu(zunicode.casefold(self))
+    def center(self, width, fillchar=' '):      return pyu(zunicode.center(self, width, _pyu_coerce(fillchar)))
+    def count(self, sub, start=None, end=None):
+        # cython optimizes unicode.count to directly call PyUnicode_Count -
+        # - cannot use None for start/stop  https://github.com/cython/cython/issues/4737
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.count(self, _pyu_coerce(sub), start, end)
+    def endswith(self, suffix, start=None, end=None):
+        if isinstance(suffix, tuple):
+            for _ in suffix:
+                if self.endswith(_pyu_coerce(_), start, end):
+                    return True
+            return False
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.endswith(self, _pyu_coerce(suffix), start, end)
+    def expandtabs(self, tabsize=8):            return pyu(zunicode.expandtabs(self, tabsize))
+    def find(self, sub, start=None, end=None):
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.find(self, _pyu_coerce(sub), start, end)
+    def index(self, sub, start=None, end=None):
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.index(self, _pyu_coerce(sub), start, end)
+    # isalnum(self)         no need to override
+    # isalpha(self)         no need to override
+    # isascii(self)         no need to override
+    # isdecimal(self)       no need to override
+    # isdigit(self)         no need to override
+    # isidentifier(self)    no need to override
+    # islower(self)         no need to override
+    # isnumeric(self)       no need to override
+    # isprintable(self)     no need to override
+    # isspace(self)         no need to override
+    # istitle(self)         no need to override
+    def join(self, iterable):               return pyu(zunicode.join(self, (_pyu_coerce(_) for _ in iterable)))
+    def ljust(self, width, fillchar=' '):   return pyu(zunicode.ljust(self, width, _pyu_coerce(fillchar)))
+    def lower(self):                        return pyu(zunicode.lower(self))
+    def lstrip(self, chars=None):           return pyu(zunicode.lstrip(self, _xpyu_coerce(chars)))
+    def partition(self, sep):               return tuple(pyu(_) for _ in zunicode.partition(self, _pyu_coerce(sep)))
+    def removeprefix(self, prefix):         return pyu(zunicode.removeprefix(self, _pyu_coerce(prefix)))
+    def removesuffix(self, suffix):         return pyu(zunicode.removesuffix(self, _pyu_coerce(suffix)))
+    def replace(self, old, new, count=-1):  return pyu(zunicode.replace(self, _pyu_coerce(old), _pyu_coerce(new), count))
+    def rfind(self, sub, start=None, end=None):
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.rfind(self, _pyu_coerce(sub), start, end)
+    def rindex(self, sub, start=None, end=None):
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.rindex(self, _pyu_coerce(sub), start, end)
+    def rjust(self, width, fillchar=' '):   return pyu(zunicode.rjust(self, width, _pyu_coerce(fillchar)))
+    def rpartition(self, sep):              return tuple(pyu(_) for _ in zunicode.rpartition(self, _pyu_coerce(sep)))
+    def rsplit(self, sep=None, maxsplit=-1):
+        v = zunicode.rsplit(self, _xpyu_coerce(sep), maxsplit)
+        return list([pyu(_) for _ in v])
+    def rstrip(self, chars=None):           return pyu(zunicode.rstrip(self, _xpyu_coerce(chars)))
+    def split(self, sep=None, maxsplit=-1):
+        # cython optimizes unicode.split to directly call PyUnicode_Split - cannot use None for sep
+        # and cannot also use object=NULL  https://github.com/cython/cython/issues/4737
+        if sep is None:
+            if PY_MAJOR_VERSION >= 3:
+                v = zunicode.split(self, maxsplit=maxsplit)
+            else:
+                # on py2 unicode.split does not accept keyword arguments
+                v = zunicode.split(self, None, maxsplit)
+        else:
+            v = zunicode.split(self, _pyu_coerce(sep), maxsplit)
+        return list([pyu(_) for _ in v])
+    def splitlines(self, keepends=False):   return list(pyu(_) for _ in zunicode.splitlines(self, keepends))
+    def startswith(self, prefix, start=None, end=None):
+        if isinstance(prefix, tuple):
+            for _ in prefix:
+                if self.startswith(_pyu_coerce(_), start, end):
+                    return True
+            return False
+        if start is None: start = 0
+        if end   is None: end   = PY_SSIZE_T_MAX
+        return zunicode.startswith(self, _pyu_coerce(prefix), start, end)
+    def strip(self, chars=None):            return pyu(zunicode.strip(self, _xpyu_coerce(chars)))
+    def swapcase(self):                     return pyu(zunicode.swapcase(self))
+    def title(self):                        return pyu(zunicode.title(self))
+    def translate(self, table):
+        # unicode.translate does not accept bstr values
+        return pyu(zunicode.translate(self, _pyustrTranslateTab(table)))
+    def upper(self):                        return pyu(zunicode.upper(self))
+    def zfill(self, width):                 return pyu(zunicode.zfill(self, width))
+    @staticmethod
+    def maketrans(x=None, y=None, z=None):
+        if PY_MAJOR_VERSION >= 3:
+            if y is None:
+                # std maketrans(x) accepts only int|unicode keys
+                _ = {}
+                for k,v in x.items():
+                    if not isinstance(k, int):
+                        k = pyu(k)
+                    _[k] = v
+                return zunicode.maketrans(_)
+            elif z is None:
+                return zunicode.maketrans(pyu(x), pyu(y))  # std maketrans does not accept b
+            else:
+                return zunicode.maketrans(pyu(x), pyu(y), pyu(z))  # ----//----
+        # hand-made on py2
+        t = {}
+        if y is not None:
+            x = pyu(x)
+            y = pyu(y)
+            if len(x) != len(y):
+                raise ValueError("len(x) must be == len(y))")
+            for (xi,yi) in zip(x,y):
+                t[ord(xi)] = ord(yi)
+            if z is not None:
+                z = pyu(z)
+                for _ in z:
+                    t[ord(_)] = None
+        else:
+            if type(x) is not dict:
+                raise TypeError("sole x must be dict")
+            for k,v in x.iteritems():
+                if not isinstance(k, (int,long)):
+                    k = ord(pyu(k))
+                t[k] = pyu(v)
+        return t
+# hand-made _pyustr.__new__  (workaround for https://github.com/cython/cython/issues/799)
+cdef PyObject* _pyustr_tp_new(PyTypeObject* _cls, PyObject* _argv, PyObject* _kw) except NULL:
+    argv = ()
+    if _argv != NULL:
+        argv = <object>_argv
+    kw = {}
+    if _kw != NULL:
+        kw = <object>_kw
+    cdef object x = _pyustr.____new__(<object>_cls, *argv, **kw)
+    Py_INCREF(x)
+    return <PyObject*>x
+(<_XPyTypeObject*>_pyustr).tp_new   = &_pyustr_tp_new
+# similarly to bytes - want same C layout for _pyustr vs unicode
+assert sizeof(_pyustr) == sizeof(PyUnicodeObject)
+# _pybstrIter wraps bytes   iterator to return pybstr for each yielded byte.
+cdef class _pybstrIter:
+    cdef object zbiter
+    def __init__(self, zbiter):
+        self.zbiter = zbiter
+    def __iter__(self):
+        return self
+    def __next__(self):
+        x = next(self.zbiter)
+        if PY_MAJOR_VERSION >= 3:
+            return pybbyte(x)
+        else:
+            return pyb(x)
+# _pyustrIter wraps zunicode iterator to return pyustr for each yielded character.
+cdef class _pyustrIter:
+    cdef object zuiter
+    def __init__(self, zuiter):
+        self.zuiter = zuiter
+    def __iter__(self):
+        return self
+    def __next__(self):
+        x = next(self.zuiter)
+        return pyu(x)
+def pybiter(obj):
+    """biter(obj) is like iter(b(obj)) but  TODO: iterates object incrementally
+    without doing full convertion to bstr."""
+    return iter(pyb(obj))   # TODO iterate obj directly
+def pyuiter(obj):
+    """uiter(obj) is like iter(u(obj)) but  TODO: iterates object incrementally
+    without doing full convertion to ustr."""
+    return iter(pyu(obj))   # TODO iterate obj directly
+# _pyustrTranslateTab wraps table for .translate to return bstr as unicode
+# because unicode.translate does not accept bstr values.
+cdef class _pyustrTranslateTab:
+    cdef object tab
+    def __init__(self, tab):
+        self.tab = tab
+    def __getitem__(self, k):
+        v = self.tab[k]
+        if not isinstance(v, int):  # either unicode ordinal,
+            v = _xpyu_coerce(v)     # character or None
+        return v
+# _bdata/_udata retrieve raw data from bytes/unicode.
+def _bdata(obj): # -> bytes
+    assert isinstance(obj, bytes)
+    _ = obj.__getnewargs__()[0] # (`bytes-data`,)
+    assert type(_) is bytes
+    return _
+    """
+    bcopy = bytes(memoryview(obj))
+    assert type(bcopy) is bytes
+    return bcopy
+    """
+def _udata(obj): # -> unicode
+    assert isinstance(obj, unicode)
+    _ = obj.__getnewargs__()[0] # (`unicode-data`,)
+    assert type(_) is unicode
+    return _
+    """
+    cdef Py_UNICODE* u     = PyUnicode_AsUnicode(obj)
+    cdef Py_ssize_t  size  = PyUnicode_GetSize(obj)
+    cdef unicode     ucopy = PyUnicode_FromUnicode(u, size)
+    assert type(ucopy) is unicode
+    return ucopy
+    """
+# initialize .tp_print for pybstr so that this type could be printed.
 # If we don't - printing it will result in `RuntimeError: print recursion`
 # because str of this type never reaches real bytes or unicode.
 # Do it only on python2, because python3 does not use tp_print at all.
-# NOTE _pyunicode does not need this because on py2 str(_pyunicode) returns _pystr.
+# NOTE pyustr does not need this because on py2 str(pyustr) returns pybstr.
 IF PY2:
-    # NOTE Cython does not define tp_print for PyTypeObject - do it ourselves
+    # Cython does not define tp_print for PyTypeObject - do it ourselves
-    from libc.stdio cimport FILE
    cdef extern from "Python.h":
        ctypedef int (*printfunc)(PyObject *, FILE *, int) except -1
-        ctypedef struct PyTypeObject:
+        ctypedef struct _PyTypeObject_Print "PyTypeObject":
            printfunc   tp_print
-        cdef PyTypeObject *Py_TYPE(object)
+        int Py_PRINT_RAW
-    cdef int _pystr_tp_print(PyObject *obj, FILE *f, int nesting) except -1:
+    cdef int _pybstr_tp_print(PyObject *obj, FILE *f, int flags) except -1:
-        o = <bytes>obj
+        o = <object>obj
-        o = bytes(buffer(o))  # change tp_type to bytes instead of _pystr
+        if flags & Py_PRINT_RAW:
-        return Py_TYPE(o).tp_print(<PyObject*>o, f, nesting)
+            # emit str of the object instead of repr
+            # https://docs.python.org/2.7/c-api/object.html#c.PyObject_Print
+            pass
+        else:
+            # emit repr
+            o = repr(o)
+        assert isinstance(o, bytes)
+        o = <bytes>o
+        o = bytes(buffer(o))  # change tp_type to bytes instead of pybstr
+        return (<_PyTypeObject_Print*>zbytes) .tp_print(<PyObject*>o, f, Py_PRINT_RAW)
+    (<_PyTypeObject_Print*>Py_TYPE(_pybstr())) .tp_print = _pybstr_tp_print
+# whiteout .sq_slice for pybstr/pyustr inherited from str/unicode.
+# This way slice access always goes through our __getitem__ implementation.
+# If we don't do this e.g. bstr[:] will be handled by str.__getslice__ instead
+# of bstr.__getitem__, and will return str instead of bstr.
+if PY2:
+    (<_XPyTypeObject*>_pybstr) .tp_as_sequence.sq_slice = NULL
+    (<_XPyTypeObject*>_pyustr) .tp_as_sequence.sq_slice = NULL
+# ---- adjust bstr/ustr classes after what cython generated ----
+# change names of bstr/ustr to be e.g. "golang.bstr" instead of "golang._golang._bstr"
+# this makes sure that unpickling saved bstr does not load via unpatched origin
+# class, and is also generally good for saving pickle size and for reducing _golang exposure.
+(<PyTypeObject*>pybstr).tp_name = "golang.bstr"
+(<PyTypeObject*>pyustr).tp_name = "golang.ustr"
+assert pybstr.__module__ == "golang";  assert pybstr.__name__ == "bstr"
+assert pyustr.__module__ == "golang";  assert pyustr.__name__ == "ustr"
+# for pybstr/pyustr cython generates .tp_dealloc that refer to bytes/unicode types directly.
+# override that to refer to zbytes/zunicode to avoid infinite recursion on free
+# when builtin bytes and unicode are replaced with bstr/ustr.
+(<PyTypeObject*>pybstr).tp_dealloc = (<PyTypeObject*>zbytes)   .tp_dealloc
+(<PyTypeObject*>pyustr).tp_dealloc = (<PyTypeObject*>zunicode) .tp_dealloc
+# remove unsupported bstr/ustr methods. do it outside of `cdef class` to
+# workaround https://github.com/cython/cython/issues/4556 (`if ...` during
+# `cdef class` is silently handled wrongly)
+cdef _bstrustr_remove_unsupported_slots():
+    vslot = (
+        'casefold',     # py3.3     TODO provide py2 implementation
+        'isidentifier', # py3       TODO provide fallback implementation
+        'isprintable',  # py3       TODO provide fallback implementation
+        'removeprefix', # py3.9     TODO provide fallback implementation
+        'removesuffix', # py3.9     TODO provide fallback implementation
+    )
+    for slot in vslot:
+        if not hasattr(unicode, slot):
+            _patch_slot(<PyTypeObject*>pybstr, slot, DEL)
+            try:
+                _patch_slot(<PyTypeObject*>pyustr, slot, DEL)
+            except KeyError:    # e.g. we do not define ustr.isprintable ourselves
+                pass
+_bstrustr_remove_unsupported_slots()
-    Py_TYPE(_pystr()).tp_print = _pystr_tp_print
+# ---- quoting ----
+# _bpysmartquote_u3b2 quotes bytes/bytearray s the same way python would do for string.
+#
+# nonascii_escape indicates whether \xNN with NN >= 0x80 is present in the output.
+#
+# NOTE the return type is str type of current python, so that quoted result
+# could be directly used in __repr__ or __str__ implementation.
+cdef _bpysmartquote_u3b2(const byte[::1] s): # -> (unicode(py3)|bytes(py2), nonascii_escape)
+    # smartquotes: choose ' or " as quoting character exactly the same way python does
+    # https://github.com/python/cpython/blob/v2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L905-L909
+    cdef byte quote = ord("'")
+    if (quote in s) and (ord('"') not in s):
+        quote = ord('"')
+    cdef bint nonascii_escape
+    x = strconv._quote(s, quote, &nonascii_escape)              # raw bytes
+    if PY_MAJOR_VERSION < 3:
+        return x, nonascii_escape
+    else:
+        return _utf8_decode_surrogateescape(x), nonascii_escape # raw unicode
+# _upysmartquote_u3b2 is similar to _bpysmartquote_u3b2 but accepts unicode argument.
+#
+# NOTE the return type is str type of current python - see _bpysmartquote_u3b2 for details.
+cdef _upysmartquote_u3b2(s): # -> (unicode(py3)|bytes(py2), nonascii_escape)
+    assert isinstance(s, unicode), s
+    return _bpysmartquote_u3b2(_utf8_encode_surrogateescape(s))
 # qq is substitute for %q, which is missing in python.
@@ -171,40 +1217,824 @@ def pyqq(obj):
    # py2: unicode | str
    # py3: str     | bytes
    if not isinstance(obj, (unicode, bytes)):
-        obj = str(obj)
+        obj = _bstringify(obj)
+    return strconv.pyquote(obj)
+# ---- _bstringify ----
+# _bstringify returns string representation of obj.
+# it is similar to unicode(obj), but handles bytes as UTF-8 encoded strings.
+cdef _bstringify(object obj): # -> unicode|bytes
+    if type(obj) in (pybstr, pyustr):
+        return obj
+    # indicate to e.g. patched bytes.__repr__ that it is being called from under _bstringify
+    _bstringify_enter()
+    try:
+        if PY_MAJOR_VERSION >= 3:
+            # NOTE this depends on patches to bytes.{__repr__,__str__} below
+            return unicode(obj)
+        else:
+            # on py2 mimic manually what unicode(·) does on py3
+            # the reason we do it manually is because if we try just
+            # unicode(obj), and obj's __str__ returns UTF-8 bytestring, it will
+            # fail with UnicodeDecodeError. Similarly if we unconditionally do
+            # str(obj), it will fail if obj's __str__ returns unicode.
+            #
+            # NOTE this depends on patches to bytes.{__repr__,__str__} and
+            #      unicode.{__repr__,__str__} below.
+            if hasattr(obj, '__unicode__'):
+                return obj.__unicode__()
+            elif hasattr(obj, '__str__'):
+                return obj.__str__()
+            else:
+                return repr(obj)
+    finally:
+        _bstringify_leave()
+# _bstringify_repr returns repr of obj.
+# it is similar to repr(obj), but handles bytes as UTF-8 encoded strings.
+cdef _bstringify_repr(object obj): # -> unicode|bytes
+    _bstringify_enter_repr()
+    try:
+        return repr(obj)
+    finally:
+        _bstringify_leave_repr()
-    qobj = pystrconv.quote(obj)
+# patch bytes.{__repr__,__str__} and (py2) unicode.{__repr__,__str__}, so that both
+# bytes and unicode are treated as normal strings when under _bstringify.
+#
+# Why:
+#
+#   py2: str([ 'β'])          ->  ['\\xce\\xb2']        (1) x
+#   py2: str([u'β'])          ->  [u'\\u03b2']          (2) x
+#   py3: str([ 'β'])          ->  ['β']                 (3)
+#   py3: str(['β'.encode()])  ->  [b'\\xce\\xb2']       (4) x
+#
+# for us 3 is ok, while 1,2 and 4 are not. For all 1,2,3,4 we want e.g.
+# `bstr(·)` or `b('%s') % ·` to give ['β']. This is fixed by patching __repr__.
+#
+# regarding patching __str__ - 6 and 8 in the following examples illustrate the
+# need to do it:
+#
+#   py2: str( 'β')            ->  'β'                   (5)
+#   py2: str(u'β')            ->  UnicodeEncodeError    (6) x
+#   py3: str( 'β')            ->  'β'                   (7)
+#   py3: str('β'.encode())    ->  b'\\xce\\xb2'         (8) x
+#
+# See also overview of %-formatting.
+cdef reprfunc _bytes_tp_repr   = Py_TYPE(b'').tp_repr
+cdef reprfunc _bytes_tp_str    = Py_TYPE(b'').tp_str
+cdef reprfunc _unicode_tp_repr = Py_TYPE(u'').tp_repr
+cdef reprfunc _unicode_tp_str  = Py_TYPE(u'').tp_str
+cdef object _bytes_tp_xrepr(object s):
+    bs = _inbstringify_get()
+    if bs.inbstringify == 0:
+        return _bytes_tp_repr(s)
+    s, _ = _bpysmartquote_u3b2(s)
+    if PY_MAJOR_VERSION >= 3  and  bs.inrepr != 0:
+        s = 'b'+s
+    return s
-    # `printf('%s', qq(obj))` should work. For this make sure qobj is always
+cdef object _bytes_tp_xstr(object s):
-    # a-la str type (unicode on py3, bytes on py2), that can be transparently
+    bs = _inbstringify_get()
-    # converted to unicode or bytes as needed.
+    if bs.inbstringify == 0:
+        return _bytes_tp_str(s)
+    else:
+        if PY_MAJOR_VERSION >= 3:
+            return _utf8_decode_surrogateescape(s)
+        else:
+            return s
+cdef object _unicode2_tp_xrepr(object s):
+    bs = _inbstringify_get()
+    if bs.inbstringify == 0:
+        return _unicode_tp_repr(s)
+    s, _ = _upysmartquote_u3b2(s)
+    if PY_MAJOR_VERSION < 3  and  bs.inrepr != 0:
+        s = 'u'+s
+    return s
+cdef object _unicode2_tp_xstr(object s):
+    bs = _inbstringify_get()
+    if bs.inbstringify == 0:
+        return _unicode_tp_str(s)
+    else:
+        return s
+def _bytes_x__repr__(s):        return _bytes_tp_xrepr(s)
+def _bytes_x__str__(s):         return _bytes_tp_xstr(s)
+def _unicode2_x__repr__(s):     return _unicode2_tp_xrepr(s)
+def _unicode2_x__str__(s):      return _unicode2_tp_xstr(s)
+def _():
+    cdef PyTypeObject* t
+    # NOTE patching bytes and its already-created subclasses that did not override .tp_repr/.tp_str
+    # NOTE if we don't also patch __dict__ - e.g. x.__repr__() won't go through patched .tp_repr
+    for pyt in [bytes] + bytes.__subclasses__():
+        assert isinstance(pyt, type)
+        t = <PyTypeObject*>pyt
+        if t.tp_repr == _bytes_tp_repr:
+            t.tp_repr = _bytes_tp_xrepr
+            _patch_slot(t, '__repr__', _bytes_x__repr__)
+        if t.tp_str  == _bytes_tp_str:
+            t.tp_str  = _bytes_tp_xstr
+            _patch_slot(t, '__str__',  _bytes_x__str__)
+_()
+if PY_MAJOR_VERSION < 3:
+    def _():
+        cdef PyTypeObject* t
+        for pyt in [unicode] + unicode.__subclasses__():
+            assert isinstance(pyt, type)
+            t = <PyTypeObject*>pyt
+            if t.tp_repr == _unicode_tp_repr:
+                t.tp_repr = _unicode2_tp_xrepr
+                _patch_slot(t, '__repr__', _unicode2_x__repr__)
+            if t.tp_str  == _unicode_tp_str:
+                t.tp_str  = _unicode2_tp_xstr
+                _patch_slot(t, '__str__',  _unicode2_x__str__)
+    _()
+# py2: adjust unicode.tp_richcompare(a,b) to return NotImplemented if b is bstr.
+# This way we avoid `UnicodeWarning: Unicode equal comparison failed to convert
+# both arguments to Unicode - interpreting them as being unequal`, and that
+# further `a == b` returns False even if `b == a` gives True.
+#
+# NOTE there is no need to do the same for ustr, because ustr inherits from
+# unicode and can be always natively converted to unicode by python itself.
+cdef richcmpfunc _unicode_tp_richcompare = Py_TYPE(u'').tp_richcompare
+cdef object _unicode_tp_xrichcompare(object a, object b, int op):
+    if isinstance(b, pybstr):
+        return NotImplemented
+    return _unicode_tp_richcompare(a, b, op)
+cdef object _unicode_x__eq__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_EQ)
+cdef object _unicode_x__ne__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_NE)
+cdef object _unicode_x__lt__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_LT)
+cdef object _unicode_x__gt__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_GT)
+cdef object _unicode_x__le__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_LE)
+cdef object _unicode_x__ge__(object a, object b):   return _unicode_tp_richcompare(a, b, Py_GE)
+if PY_MAJOR_VERSION < 3:
+    def _():
+        cdef PyTypeObject* t
+        for pyt in [unicode] + unicode.__subclasses__():
+            assert isinstance(pyt, type)
+            t = <PyTypeObject*>pyt
+            if t.tp_richcompare == _unicode_tp_richcompare:
+                t.tp_richcompare = _unicode_tp_xrichcompare
+                _patch_slot(t, "__eq__", _unicode_x__eq__)
+                _patch_slot(t, "__ne__", _unicode_x__ne__)
+                _patch_slot(t, "__lt__", _unicode_x__lt__)
+                _patch_slot(t, "__gt__", _unicode_x__gt__)
+                _patch_slot(t, "__le__", _unicode_x__le__)
+                _patch_slot(t, "__ge__", _unicode_x__ge__)
+    _()
+# patch bytearray.{__repr__,__str__} similarly to bytes, so that e.g.
+# '%s' % bytearray('β')   turns into β      instead of  bytearray(b'\xce\xb2'),   and
+# '%s' % [bytearray('β']  turns into ['β']  instead of  [bytearray(b'\xce\xb2')].
+#
+# also patch:
+#
+# - bytearray.__init__ to accept ustr instead of raising 'TypeError:
+#   string argument without an encoding'  (pybug: bytearray() should respect
+#   __bytes__ similarly to bytes)
+#
+# - bytearray.{sq_concat,sq_inplace_concat} to accept ustr instead of raising
+#   TypeError.  (pybug: bytearray + and += should respect __bytes__)
+cdef reprfunc   _bytearray_tp_repr    = (<PyTypeObject*>bytearray) .tp_repr
+cdef reprfunc   _bytearray_tp_str     = (<PyTypeObject*>bytearray) .tp_str
+cdef initproc   _bytearray_tp_init    = (<_XPyTypeObject*>bytearray) .tp_init
+cdef binaryfunc _bytearray_sq_concat  = (<_XPyTypeObject*>bytearray) .tp_as_sequence.sq_concat
+cdef binaryfunc _bytearray_sq_iconcat = (<_XPyTypeObject*>bytearray) .tp_as_sequence.sq_inplace_concat
+cdef object _bytearray_tp_xrepr(object a):
+    bs = _inbstringify_get()
+    if bs.inbstringify == 0:
+        return _bytearray_tp_repr(a)
+    s, _ = _bpysmartquote_u3b2(a)
+    if bs.inrepr != 0:
+        s = 'bytearray(b' + s + ')'
+    return s
+cdef object _bytearray_tp_xstr(object a):
+    bs = _inbstringify_get()
+    if bs.inbstringify == 0:
+        return _bytearray_tp_str(a)
+    else:
+        if PY_MAJOR_VERSION >= 3:
+            return _utf8_decode_surrogateescape(a)
+        else:
+            return _bytearray_data(a)
+cdef int _bytearray_tp_xinit(object self, PyObject* args, PyObject* kw) except -1:
+    if args != NULL  and  (kw == NULL  or  (not <object>kw)):
+        argv = <object>args
+        if isinstance(argv, tuple)  and  len(argv) == 1:
+            arg = argv[0]
+            if isinstance(arg, pyustr):
+                argv = (pyb(arg),)      # NOTE argv is kept alive till end of function
+                args = <PyObject*>argv  #      no need to incref it
+    return _bytearray_tp_init(self, args, kw)
+cdef object _bytearray_sq_xconcat(object a, object b):
+    if isinstance(b, pyustr):
+        b = pyb(b)
+    return _bytearray_sq_concat(a, b)
+cdef object _bytearray_sq_xiconcat(object a, object b):
+    if isinstance(b, pyustr):
+        b = pyb(b)
+    return _bytearray_sq_iconcat(a, b)
+def _bytearray_x__repr__(a):    return _bytearray_tp_xrepr(a)
+def _bytearray_x__str__ (a):    return _bytearray_tp_xstr(a)
+def _bytearray_x__init__(self, *argv, **kw):
+    # NOTE don't return - just call: __init__ should return None
+    _bytearray_tp_xinit(self, <PyObject*>argv, <PyObject*>kw)
+def _bytearray_x__add__ (a, b): return _bytearray_sq_xconcat(a, b)
+def _bytearray_x__iadd__(a, b): return _bytearray_sq_xiconcat(a, b)
+def _():
+    cdef PyTypeObject* t
+    for pyt in [bytearray] + bytearray.__subclasses__():
+        assert isinstance(pyt, type)
+        t = <PyTypeObject*>pyt
+        if t.tp_repr == _bytearray_tp_repr:
+            t.tp_repr = _bytearray_tp_xrepr
+            _patch_slot(t, '__repr__', _bytearray_x__repr__)
+        if t.tp_str  == _bytearray_tp_str:
+            t.tp_str  = _bytearray_tp_xstr
+            _patch_slot(t, '__str__',  _bytearray_x__str__)
+        t_ = <_XPyTypeObject*>t
+        if t_.tp_init == _bytearray_tp_init:
+            t_.tp_init = _bytearray_tp_xinit
+            _patch_slot(t, '__init__', _bytearray_x__init__)
+        t_sq = t_.tp_as_sequence
+        if t_sq.sq_concat == _bytearray_sq_concat:
+            t_sq.sq_concat = _bytearray_sq_xconcat
+            _patch_slot(t, '__add__',  _bytearray_x__add__)
+        if t_sq.sq_inplace_concat == _bytearray_sq_iconcat:
+            t_sq.sq_inplace_concat = _bytearray_sq_xiconcat
+            _patch_slot(t, '__iadd__', _bytearray_x__iadd__)
+_()
+# _bytearray_data return raw data in bytearray as bytes.
+# XXX `bytearray s` leads to `TypeError: Expected bytearray, got hbytearray`
+cdef bytes _bytearray_data(object s):
    if PY_MAJOR_VERSION >= 3:
-        qobj = _pyunicode(pyu(qobj))
+        return bytes(s)
    else:
-        qobj = _pystr(pyb(qobj))
+        # on py2 bytes(s) is str(s) which invokes patched bytearray.__str__
+        # we want to get raw bytearray data, which is provided by unpatched bytearray.__str__
+        return _bytearray_tp_str(s)
+# _bstringify_enter*/_bstringify_leave*/_inbstringify_get allow _bstringify* to
+# indicate to further invoked code whether it has been invoked from under
+# _bstringify* or not.
+cdef object _inbstringify_key = "golang._inbstringify"
+@final
+cdef class _InBStringify:
+    cdef int inbstringify   # >0 if we are running under _bstringify/_bstringify_repr
+    cdef int inrepr         # >0 if we are running under             _bstringify_repr
+    def __cinit__(self):
+        self.inbstringify = 0
+        self.inrepr       = 0
+cdef void _bstringify_enter() except*:
+    bs = _inbstringify_get()
+    bs.inbstringify += 1
+cdef void _bstringify_leave() except*:
+    bs = _inbstringify_get()
+    bs.inbstringify -= 1
+cdef void _bstringify_enter_repr() except*:
+    bs = _inbstringify_get()
+    bs.inbstringify += 1
+    bs.inrepr       += 1
+cdef void _bstringify_leave_repr() except*:
+    bs = _inbstringify_get()
+    bs.inbstringify -= 1
+    bs.inrepr       -= 1
+cdef _InBStringify _inbstringify_get():
+    cdef PyObject*  _ts_dict = PyThreadState_GetDict() # borrowed
+    if _ts_dict == NULL:
+        raise RuntimeError("no thread state")
+    cdef _InBStringify ts_inbstringify
+    cdef PyObject* _ts_inbstrinfigy = PyDict_GetItemWithError(<object>_ts_dict, _inbstringify_key) # raises on error
+    if _ts_inbstrinfigy == NULL:
+        # key not present
+        ts_inbstringify = _InBStringify()
+        PyDict_SetItem(<object>_ts_dict, _inbstringify_key, ts_inbstringify)
+    else:
+        ts_inbstringify = <_InBStringify>_ts_inbstrinfigy
+    return ts_inbstringify
+# _patch_slot installs func_or_descr into typ's __dict__ as name.
+#
+# if func_or_descr is descriptor (has __get__), it is installed as is.
+# otherwise it is wrapped with "unbound method" descriptor.
+#
+# if func_or_descr is DEL the slot is removed from typ's __dict__.
+cdef DEL = object()
+cdef _patch_slot(PyTypeObject* typ, str name, object func_or_descr):
+    typdict = <dict>(typ.tp_dict)
+    #print("\npatching %s.%s  with  %r" % (typ.tp_name, name, func_or_descr))
+    #print("old:  %r" % typdict.get(name))
+    if hasattr(func_or_descr, '__get__')  or  func_or_descr is DEL:
+        descr = func_or_descr
+    else:
+        func = func_or_descr
+        if PY_MAJOR_VERSION < 3:
+            descr = pytypes.MethodType(func, None, <object>typ)
+        else:
+            descr = _UnboundMethod(func)
+    if descr is DEL:
+        del typdict[name]
+    else:
+        typdict[name] = descr
+    #print("new:  %r" % typdict.get(name))
+    PyType_Modified(typ)
+cdef class _UnboundMethod(object): # they removed unbound methods on py3
+    cdef object func
+    def __init__(self, func):
+        self.func = func
+    def __get__(self, obj, objtype):
+        return pyfunctools.partial(self.func, obj)
+# ---- % formatting ----
+# When formatting string is bstr/ustr we treat bytes in all arguments as
+# UTF8-encoded bytestrings. The following approach is used to implement this:
+#
+# 1. both bstr and ustr format via bytes-based _bprintf.
+# 2. we parse the format string and handle every formatting specifier separately:
+# 3. for formats besides %s/%r we use bytes.__mod__ directly.
+#
+# 4. for %s we stringify corresponding argument specially with all, potentially
+#    internal, bytes instances treated as UTF8-encoded strings:
+#
+#       '%s' % b'\xce\xb2'      ->  "β"
+#       '%s' % [b'\xce\xb2']    ->  "['β']"
+#
+# 5. for %r, similarly to %s, we prepare repr of corresponding argument
+#    specially with all, potentially internal, bytes instances also treated as
+#    UTF8-encoded strings:
+#
+#       '%r' % b'\xce\xb2'      ->  "b'β'"
+#       '%r' % [b'\xce\xb2']    ->  "[b'β']"
+#
+#
+# For "2" we implement %-format parsing ourselves. test_strings_mod_and_format
+# has good coverage for this phase to make sure we get it right and behaving
+# exactly the same way as standard Python does.
+#
+# For "4" we monkey-patch bytes.__repr__ to repr bytes as strings when called
+# from under bstr.__mod__(). See _bstringify for details.
+#
+# For "5", similarly to "4", we rely on adjustments to bytes.__repr__ .
+# See _bstringify_repr for details.
+#
+# See also overview of patching bytes.{__repr__,__str__} near _bstringify.
+cdef object _missing  = object()
+cdef object _atidx_re = pyre.compile('.* at index ([0-9]+)$')
+cdef _bprintf(const byte[::1] fmt, xarg): # -> pybstr
+    cdef bytearray out = bytearray()
+    cdef object argv = None  # if xarg is tuple or subclass
+    cdef object argm = None  # if xarg is mapping
+    # https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Objects/stringobject.c#L4298-L4300
+    # https://github.com/python/cpython/blob/v3.11.0b1-171-g70aa1b9b912/Objects/unicodeobject.c#L14319-L14320
+    if _XPyMapping_Check(xarg)   and \
+       (not isinstance(xarg, tuple))    and \
+       (not isinstance(xarg, (bytes,unicode))):
+        argm = xarg
+    if isinstance(xarg, tuple):
+        argv = xarg
+        xarg = _missing
+    #print()
+    #print('argv:', argv)
+    #print('argm:', argm)
+    #print('xarg:', xarg)
+    cdef int argv_idx = 0
+    def nextarg():
+        nonlocal argv_idx, xarg
+        # NOTE for `'%s %(x)s' % {'x':1}`  python gives  "{'x': 1} 1"
+        # -> so we avoid argm check completely here
+        #if argm is not None:
+        if 0:
+            raise ValueError('mixing dict/tuple')
+        elif argv is not None:
+            # tuple xarg
+            if argv_idx < len(argv):
+                arg = argv[argv_idx]
+                argv_idx += 1
+                return arg
+        elif xarg is not _missing:
+            # sole xarg
+            arg = xarg
+            xarg = _missing
+            return arg
+        raise TypeError('not enough arguments for format string')
+    def badf():
+        raise ValueError('incomplete format')
+    # parse format string locating formatting specifiers
+    # if we see %s/%r - use _bstringify
+    # else use builtin %-formatting
+    #
+    #   %[(name)][flags][width|*][.[prec|*]][len](type)
+    #
+    # https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
+    # https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Objects/stringobject.c#L4266-L4765
+    #
+    # Rejected alternative: try to format; if we get "TypeError: %b requires a
+    # bytes-like object ..." retry with that argument converted to bstr.
+    #
+    # Rejected because e.g. for  `%(x)s %(x)r` % {'x': obj}`  we need to use
+    # access number instead of key 'x' to determine which accesses to
+    # bstringify. We could do that, but unfortunately on Python2 the access
+    # number is not easily predictable because string could be upgraded to
+    # unicode in the midst of being formatted and so some access keys will be
+    # accesses not once.
+    #
+    # Another reason for rejection: b'%r' and u'%r' handle arguments
+    # differently - on b %r is aliased to %a.
+    cdef int i = 0
+    cdef int l = len(fmt)
+    cdef byte c
+    while i < l:
+        c  = fmt[i]
+        i += 1
+        if c != ord('%'):
+            out.append(c)
+            continue
+        fmt_istart = i-1
+        nameb = _missing
+        width = _missing
+        prec  = _missing
+        value = _missing
+        # `c = fmt_nextchar()`  avoiding https://github.com/cython/cython/issues/4798
+        if i >= l: badf()
+        c = fmt[i]; i += 1
+        # (name)
+        if c == ord('('):
+            #print('(name)')
+            if argm is None:
+                raise TypeError('format requires a mapping')
+            nparen = 1
+            nameb = b''
+            while 1:
+                if i >= l:
+                    raise ValueError('incomplete format key')
+                c = fmt[i]; i += 1
+                if c == ord('('):
+                    nparen += 1
+                elif c == ord(')'):
+                    nparen -= 1
+                    if i >= l: badf()
+                    c = fmt[i]; i += 1
+                    break
+                else:
+                    nameb += bchr(c)
+        # flags
+        while chr(c) in '#0- +':
+            #print('flags')
+            if i >= l: badf()
+            c = fmt[i]; i += 1
+        # [width|*]
+        if c == ord('*'):
+            #print('*width')
+            width = nextarg()
+            if i >= l: badf()
+            c = fmt[i]; i += 1
+        else:
+            while chr(c).isdigit():
+                #print('width')
+                if i >= l: badf()
+                c = fmt[i]; i += 1
+        # [.prec|*]
+        if c == ord('.'):
+            #print('dot')
+            if i >= l: badf()
+            c = fmt[i]; i += 1
+            if c == ord('*'):
+                #print('.*')
+                prec = nextarg()
+                if i >= l: badf()
+                c = fmt[i]; i += 1
+            else:
+                while chr(c).isdigit():
+                    #print('.prec')
+                    if i >= l: badf()
+                    c = fmt[i]; i += 1
+        # [len]
+        while chr(c) in 'hlL':
+            #print('len')
+            if i >= l: badf()
+            c = fmt[i]; i += 1
+        fmt_type = c
+        #print('fmt_type:', repr(chr(fmt_type)))
+        if fmt_type == ord('%'):
+            if i-2 == fmt_istart:   # %%
+                out.append(b'%')
+                continue
-    return qobj
+        if nameb is not _missing:
+            xarg = _missing # `'%(x)s %s' % {'x':1}`  raises "not enough arguments"
+            nameu = _utf8_decode_surrogateescape(nameb)
+            try:
+                value = argm[nameb]
+            except KeyError:
+                # retry with changing key via bytes <-> unicode
+                # e.g. for `b('%(x)s') % {'x': ...}` builtin bytes.__mod__ will
+                # extract b'x' as key and raise KeyError: b'x'. We avoid that via
+                # retrying with second string type for key.
+                value = argm[nameu]
+        else:
+            # NOTE for `'%4%' % ()` python raises "not enough arguments ..."
+            #if fmt_type != ord('%'):
+            if 1:
+                value = nextarg()
+        if fmt_type == ord('%'):
+            raise ValueError("unsupported format character '%s' (0x%x) at index %i" % (chr(c), c, i-1))
+        fmt1 = memoryview(fmt[fmt_istart:i]).tobytes()
+        #print('fmt_istart:', fmt_istart)
+        #print('i:         ', i)
+        #print(' ~> __mod__ ', repr(fmt1))
+        # bytes %r is aliased of %a (ASCII), but we want unicode-like %r
+        # -> handle it ourselves
+        if fmt_type == ord('r'):
+            value = pyb(_bstringify_repr(value))
+            fmt_type = ord('s')
+            fmt1 = fmt1[:-1] + b's'
+        elif fmt_type == ord('s'):
+            # %s -> feed value through _bstringify
+            # this also converts e.g. int to bstr, else e.g. on `b'%s' % 123` python
+            # complains '%b requires a bytes-like object ...'
+            value = pyb(_bstringify(value))
+        if nameb is not _missing:
+            arg = {nameb: value, nameu: value}
+        else:
+            t = []
+            if width is not _missing:   t.append(width)
+            if prec  is not _missing:   t.append(prec)
+            if value is not _missing:   t.append(value)
+            t = tuple(t)
+            arg = t
+        #print('--> __mod__ ', repr(fmt1), ' % ', repr(arg))
+        try:
+            s = zbytes.__mod__(fmt1, arg)
+        except ValueError as e:
+            # adjust position in '... at index <idx>' from fmt1 to fmt
+            if len(e.args) == 1:
+                a = e.args[0]
+                m = _atidx_re.match(a)
+                if m is not None:
+                    a = a[:m.start(1)] + str(i-1)
+                    e.args = (a,)
+            raise
+        out.extend(s)
+    if argm is None:
+        #print('END')
+        #print('argv:', argv, 'argv_idx:', argv_idx, 'xarg:', xarg)
+        if (argv is not None  and  argv_idx != len(argv))  or  (xarg is not _missing):
+            raise TypeError("not all arguments converted during string formatting")
+    return pybstr(out)
+# ---- .format formatting ----
+# Handling .format is easier and similar to %-Formatting: we detect fields to
+# format as strings via using custom string.Formatter (see _BFormatter), and
+# further treat objects to stringify similarly to how %-formatting does for %s and %r.
+#
+# We do not need to implement format parsing ourselves, because
+# string.Formatter provides it.
+# _bvformat implements .format for pybstr/pyustr.
+cdef _bvformat(fmt, args, kw):
+    return _BFormatter().vformat(fmt, args, kw)
+class _BFormatter(pystring.Formatter):
+    def format_field(self, v, fmtspec):
+        #print('format_field', repr(v), repr(fmtspec))
+        # {} on bytes/bytearray  ->  treat it as bytestring
+        if type(v) in (bytes, bytearray):
+            v = pyb(v)
+        #print('  ~ ', repr(v))
+        # if the object contains bytes inside, e.g. as in [b'β'] - treat those
+        # internal bytes also as bytestrings
+        _bstringify_enter()
+        try:
+            #return super(_BFormatter, self).format_field(v, fmtspec)
+            x = super(_BFormatter, self).format_field(v, fmtspec)
+        finally:
+            _bstringify_leave()
+        #print('  ->', repr(x))
+        if PY_MAJOR_VERSION < 3:  # py2 Formatter._vformat does does ''.join(result)
+            x = pyu(x)            # -> we want everything in result to be unicode to avoid
+                                  # UnicodeDecodeError
+        return x
+    def convert_field(self, v, conv):
+        #print('convert_field', repr(v), repr(conv))
+        if conv == 's':
+            # string.Formatter does str(v) for 's'. we don't want that:
+            # py3: stringify, and especially treat bytes as bytestring
+            # py2: stringify, avoiding e.g. UnicodeEncodeError for str(unicode)
+            x = pyb(_bstringify(v))
+        elif conv == 'r':
+            # for bytes {!r} produces ASCII-only, but we want unicode-like !r for e.g. b'β'
+            # -> handle it ourselves
+            x = pyb(_bstringify_repr(v))
+        else:
+            x = super(_BFormatter, self).convert_field(v, conv)
+        #print('  ->', repr(x))
+        return x
+    # on py2 string.Formatter does not handle field autonumbering
+    # -> do it ourselves
+    if PY_MAJOR_VERSION < 3:
+        _autoidx   = 0
+        _had_digit = False
+        def get_field(self, field_name, args, kwargs):
+            if field_name == '':
+                if self._had_digit:
+                    raise ValueError("mixing explicit and auto numbered fields is forbidden")
+                field_name = str(self._autoidx)
+                self._autoidx += 1
+            elif field_name.isdigit():
+                self._had_digit = True
+                if self._autoidx != 0:
+                    raise ValueError("mixing explicit and auto numbered fields is forbidden")
+            return super(_BFormatter, self).get_field(field_name, args, kwargs)
+# ---- misc ----
+cdef object _xpyu_coerce(obj):
+    return _pyu_coerce(obj) if obj is not None else None
+# _buffer_py2 returns buffer(obj) on py2 / fails on py3
+cdef object _buffer_py2(object obj):
+    IF PY2:                 # cannot `if PY_MAJOR_VERSION < 3` because then cython errors
+        return buffer(obj)  # "undeclared name not builtin: buffer"
+    ELSE:
+        raise AssertionError("must be called only on py2")
+# _buffer_decode decodes buf to unicode according to encoding and errors.
+#
+# buf must expose buffer interface.
+# encoding/errors can be None meaning to use default utf-8/strict.
+cdef unicode _buffer_decode(buf, encoding, errors):
+    if encoding is None: encoding = 'utf-8' # NOTE always UTF-8, not sys.getdefaultencoding
+    if errors   is None: errors   = 'strict'
+    if _XPyObject_CheckOldBuffer(buf):
+        buf = _buffer_py2(buf)
+    else:
+        buf = memoryview(buf)
+    return bytearray(buf).decode(encoding, errors)
+cdef extern from "Python.h":
+    """
+    static int _XPyObject_CheckOldBuffer(PyObject *o) {
+    #if PY_MAJOR_VERSION >= 3
+        // no old-style buffers on py3
+        return 0;
+    #else
+        return PyObject_CheckReadBuffer(o);
+    #endif
+    }
+    """
+    bint _XPyObject_CheckOldBuffer(object o)
+cdef extern from "Python.h":
+    """
+    static int _XPyMapping_Check(PyObject *o) {
+    #if PY_MAJOR_VERSION >= 3
+        return PyMapping_Check(o);
+    #else
+        // on py2 PyMapping_Check besides checking tp_as_mapping->mp_subscript
+        // also verifies !tp_as_sequence->sq_slice. We want to avoid that
+        // because PyString_Format checks only tp_as_mapping->mp_subscript.
+        return Py_TYPE(o)->tp_as_mapping && Py_TYPE(o)->tp_as_mapping->mp_subscript;
+    #endif
+    }
+    """
+    bint _XPyMapping_Check(object o)
+# _pycodecs_lookup_binary returns codec corresponding to encoding if the codec works on binary input.
+# example of such codecs are string-escape and hex encodings.
+cdef _pycodecs_lookup_binary(encoding): # -> codec | None (text) | LookupError (no such encoding)
+    codec = pycodecs.lookup(encoding)
+    if not codec._is_text_encoding or \
+       encoding in ('string-escape',):  # string-escape also works on bytes
+        return codec
+    return None
 # ---- UTF-8 encode/decode ----
+# _encoding_with_defaults returns encoding and errors substituted with defaults
+# as needed for functions like ustr.encode and bstr.decode .
+cdef _encoding_with_defaults(encoding, errors): # -> (encoding, errors)
+    if encoding is None and errors is None:
+        encoding = 'utf-8'             # NOTE always UTF-8, not sys.getdefaultencoding
+        errors   = 'surrogateescape'
+    else:
+        if encoding is None:  encoding = 'utf-8'
+        if errors   is None:  errors   = 'strict'
+    return (encoding, errors)
+# TODO(kirr) adjust UTF-8 encode/decode surrogateescape(*) a bit so that not
+# only bytes -> unicode -> bytes is always identity for any bytes (this is
+# already true), but also that unicode -> bytes -> unicode is also always true
+# for all unicode codepoints.
+#
+# The latter currently fails for all surrogate codepoints outside of U+DC80..U+DCFF range:
+#
+#   In [1]: x = u'\udc00'
+#
+#   In [2]: x.encode('utf-8')
+#   UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed
+#
+#   In [3]: x.encode('utf-8', 'surrogateescape')
+#   UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed
+#
+# (*) aka UTF-8b (see http://hyperreal.org/~est/utf-8b/releases/utf-8b-20060413043934/kuhn-utf-8b.html)
+#
+# Call resulting encoding as UTF-8bk.
+#
+# TODO(kirr) adjust bstr pickling for protocol < 3 after switching bstr/ustr
+# to decode/encode via UTF-8bk instead of UTF-8b.
 from six import unichr                      # py2: unichr       py3: chr
 from six import int2byte as bchr            # py2: chr          py3: lambda x: bytes((x,))
-cdef int _rune_error = 0xFFFD # unicode replacement character
-_py_rune_error = _rune_error
 cdef bint _ucs2_build = (sys.maxunicode ==     0xffff)      #    ucs2
 assert    _ucs2_build or sys.maxunicode >= 0x0010ffff       # or ucs4
 # _utf8_decode_rune decodes next UTF8-character from byte string s.
 #
 # _utf8_decode_rune(s) -> (r, size)
-def _py_utf8_decode_rune(const uint8_t[::1] s):
+cdef (rune, int) _utf8_decode_rune(const byte[::1] s):
-    return _utf8_decode_rune(s)
-cdef (int, int) _utf8_decode_rune(const uint8_t[::1] s):
    if len(s) == 0:
-        return _rune_error, 0
+        return utf8.RuneError, 0
    cdef int l = min(len(s), 4)  # max size of an UTF-8 encoded character
    while l > 0:
@@ -231,11 +2061,11 @@ cdef (int, int) _utf8_decode_rune(const uint8_t[::1] s):
        continue
    # invalid UTF-8
-    return _rune_error, 1
+    return utf8.RuneError, 1
 # _utf8_decode_surrogateescape mimics s.decode('utf-8', 'surrogateescape') from py3.
-def _utf8_decode_surrogateescape(const uint8_t[::1] s): # -> unicode
+cdef _utf8_decode_surrogateescape(const byte[::1] s): # -> unicode
    if PY_MAJOR_VERSION >= 3:
        if len(s) == 0:
            return u''  # avoid out-of-bounds slice access on &s[0]
@@ -250,7 +2080,7 @@ def _utf8_decode_surrogateescape(const uint8_t[::1] s): # -> unicode
    while len(s) > 0:
        r, width = _utf8_decode_rune(s)
-        if r == _rune_error  and  width == 1:
+        if r == utf8.RuneError  and  width == 1:
            b = s[0]
            assert 0x80 <= b <= 0xff, b
            emit(unichr(0xdc00 + b))
@@ -275,10 +2105,10 @@ def _utf8_decode_surrogateescape(const uint8_t[::1] s): # -> unicode
 # _utf8_encode_surrogateescape mimics s.encode('utf-8', 'surrogateescape') from py3.
-def _utf8_encode_surrogateescape(s): # -> bytes
+cdef _utf8_encode_surrogateescape(s): # -> bytes
    assert isinstance(s, unicode)
    if PY_MAJOR_VERSION >= 3:
-        return s.encode('UTF-8', 'surrogateescape')
+        return zunicode.encode(s, 'UTF-8', 'surrogateescape')
    # py2 does not have surrogateescape error handler, and even if we
    # provide one, builtin unicode.encode() does not treat
@@ -345,11 +2175,11 @@ else:
 # _xunichr returns unicode character for an ordinal i.
 #
 # it works correctly even on ucs2 python builds, where ordinals >= 0x10000 are
-# represented as 2 unicode pointe.
+# represented as 2 unicode points.
-if not _ucs2_build:
+cdef unicode _xunichr(rune i):
-    _xunichr = unichr
+    if not _ucs2_build:
-else:
+        return unichr(i)
-    def _xunichr(i):
+    else:
        if i < 0x10000:
            return unichr(i)
@@ -357,3 +2187,8 @@ else:
        uh = i - 0x10000
        return unichr(0xd800 + (uh >> 10)) + \
               unichr(0xdc00 + (uh & 0x3ff))
+# ---- pickle ----
+include '_golang_str_pickle.pyx'
--- a/golang/_golang_str_pickle.pyx
+++ b/golang/_golang_str_pickle.pyx
+# -*- coding: utf-8 -*-
+# Copyright (C) 2023-2025  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""_golang_str_pickle.pyx complements _golang_str.pyx and keeps everything
+related to pickling strings.
+It is included from _golang_str.pyx .
+"""
+if PY_MAJOR_VERSION >= 3:
+    import copyreg as pycopyreg
+else:
+    import copy_reg as pycopyreg
+cdef object zbinary  # = zodbpickle.binary | None
+try:
+    import zodbpickle
+except ImportError:
+    zbinary = None
+else:
+    zbinary = zodbpickle.binary
+# support for pickling bstr/ustr as standalone types.
+#
+# pickling is organized in such a way that
+# - what is saved by py2 can be loaded correctly on both py2/py3,  and similarly
+# - what is saved by py3 can be loaded correctly on both py2/py3   as well.
+cdef _bstr__reduce_ex__(self, protocol):
+    # Ideally we want to emit bstr(BYTES), but BYTES is not available for
+    # protocol < 3. And for protocol < 3 emitting bstr(STRING) is not an
+    # option because plain py3 raises UnicodeDecodeError on loading arbitrary
+    # STRING data. However emitting bstr(UNICODE) works universally because
+    # pickle supports arbitrary unicode - including invalid unicode - out of
+    # the box and in exactly the same way on both py2 and py3. For the
+    # reference upstream py3 uses surrogatepass on encode/decode UNICODE data
+    # to achieve that.
+    if protocol < 3:
+        # use UNICODE for data
+        #
+        # explicitly mark to unpickle via _butf8b because with the introduction
+        # of UTF-8bk the way bstr decodes unicode will change, and so if we
+        # would use `bstr UNICODE` for pickling it will result in corrupt data
+        # to be loaded after the switch to UTF-8bk.
+        #
+        # TODO pickle via bstr UNICODE REDUCE/NEWOBJ after switch from UTF-8b to UTF-8bk.
+        udata = _utf8_decode_surrogateescape(self)
+        if self.__class__ is pybstr:
+            return (_butf8b,                    # _butf8b UNICODE REDUCE
+                    (udata,))
+        else:
+            return (_butf8b,                    # _butf8b bstr UNICODE REDUCE
+                    (self.__class__, udata))
+    else:
+        # use BYTES for data
+        bdata = _bdata(self)
+        if PY_MAJOR_VERSION < 3:
+            # the only way we can get here on py2 and protocol >= 3 is zodbpickle
+            # -> similarly to py3 save bdata as BYTES
+            assert zbinary is not None
+            bdata = zbinary(bdata)
+        return (
+            pycopyreg.__newobj__,               # bstr BYTES   NEWOBJ
+            (self.__class__, bdata))
+cdef _ustr__reduce_ex__(self, protocol):
+    # emit ustr(UNICODE).
+    # TODO after UTF-8bk we might want to switch to emitting ustr(BYTES)
+    #      even if we do this, it should be backward compatible
+    if protocol < 2:
+        return (self.__class__, (_udata(self),))# ustr UNICODE REDUCE
+    else:
+        return (pycopyreg.__newobj__,           # ustr UNICODE NEWOBJ
+                (self.__class__, _udata(self)))
+# `_butf8b [bcls] udata` serves unpickling of bstr pickled with data
+# represented via UTF-8b decoded unicode.
+def _butf8b(*argv):
+    cdef object bcls = pybstr
+    cdef object udata
+    cdef int l = len(argv)
+    if l == 1:
+        udata = argv[0]
+    elif l == 2:
+        bcls, udata = argv
+    else:
+        raise TypeError("_butf8b() takes 1 or 2 arguments; %d given" % l)
+    return _pyb(bcls, _utf8_encode_surrogateescape(udata))
+_butf8b.__module__ = "golang"
--- a/golang/_strconv.pxd
+++ b/golang/_strconv.pxd
+# -*- coding: utf-8 -*-
+# cython: language_level=2
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package strconv provides Go-compatible string conversions."""
+from golang cimport byte
+cpdef pyquote(s)
+cdef bytes _quote(const byte[::1] s, char quote, bint* out_nonascii_escape) # -> (quoted, nonascii_escape)
--- a/golang/_strconv.pyx
+++ b/golang/_strconv.pyx
+# -*- coding: utf-8 -*-
+# cython: language_level=2
+# Copyright (C) 2018-2024  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""_strconv.pyx implements strconv.pyx - see _strconv.pxd for package overview."""
+from __future__ import print_function, absolute_import
+import unicodedata, codecs
+from golang cimport pyb, byte, rune
+from golang cimport _utf8_decode_rune, _xunichr
+from golang.unicode cimport utf8
+from cpython cimport PyObject, _PyBytes_Resize
+cdef extern from "Python.h":
+    PyObject* PyBytes_FromStringAndSize(char*, Py_ssize_t) except NULL
+    char* PyBytes_AS_STRING(PyObject*)
+    void Py_DECREF(PyObject*)
+# quote quotes unicode|bytes string into valid "..." bytestring always quoted with ".
+cpdef pyquote(s):  # -> bstr
+    cdef bint _
+    q = _quote(pyb(s), '"', &_)
+    return pyb(q)
+cdef char[16] hexdigit # = '0123456789abcdef'
+for i, c in enumerate('0123456789abcdef'):
+    hexdigit[i] = ord(c)
+# XXX not possible to use `except (NULL, False)`
+#     (https://stackoverflow.com/a/66335433/9456786)
+cdef bytes _quote(const byte[::1] s, char quote, bint* out_nonascii_escape): # -> (quoted, nonascii_escape)
+    # 2*" + max(4)*each byte (+ 1 for tail \0 implicitly by PyBytesObject)
+    cdef Py_ssize_t qmaxsize = 1 + 4*len(s) + 1
+    cdef PyObject*  qout     = PyBytes_FromStringAndSize(NULL, qmaxsize)
+    cdef byte*      q        = <byte*>PyBytes_AS_STRING(qout)
+    cdef bint nonascii_escape = False
+    cdef Py_ssize_t i = 0, j
+    cdef Py_ssize_t isize
+    cdef int size
+    cdef rune r
+    cdef byte c
+    q[0] = quote;  q += 1
+    while i < len(s):
+        c = s[i]
+        # fast path - ASCII only
+        if c < 0x80:
+            if c in (ord('\\'), quote):
+                q[0] = ord('\\')
+                q[1] = c
+                q += 2
+            # printable ASCII
+            elif 0x20 <= c <= 0x7e:
+                q[0] = c
+                q += 1
+            # non-printable ASCII
+            elif c == ord('\t'):
+                q[0] = ord('\\')
+                q[1] = ord('t')
+                q += 2
+            elif c == ord('\n'):
+                q[0] = ord('\\')
+                q[1] = ord('n')
+                q += 2
+            elif c == ord('\r'):
+                q[0] = ord('\\')
+                q[1] = ord('r')
+                q += 2
+            # everything else is non-printable
+            else:
+                q[0] = ord('\\')
+                q[1] = ord('x')
+                q[2] = hexdigit[c >> 4]
+                q[3] = hexdigit[c & 0xf]
+                q += 4
+            i += 1
+        # slow path - full UTF-8 decoding + unicodedata
+        else:
+            r, size = _utf8_decode_rune(s[i:])
+            isize = i + size
+            # decode error - just emit raw byte as escaped
+            if r == utf8.RuneError  and  size == 1:
+                nonascii_escape = True
+                q[0] = ord('\\')
+                q[1] = ord('x')
+                q[2] = hexdigit[c >> 4]
+                q[3] = hexdigit[c & 0xf]
+                q += 4
+            # printable utf-8 characters go as is
+            elif _unicodedata_category(_xunichr(r))[0] in 'LNPS': # letters, numbers, punctuation, symbols
+                for j in range(i, isize):
+                    q[0] = s[j]
+                    q += 1
+            # everything else goes in numeric byte escapes
+            else:
+                nonascii_escape = True
+                for j in range(i, isize):
+                    c = s[j]
+                    q[0] = ord('\\')
+                    q[1] = ord('x')
+                    q[2] = hexdigit[c >> 4]
+                    q[3] = hexdigit[c & 0xf]
+                    q += 4
+            i = isize
+    q[0] = quote;  q += 1
+    q[0] = 0;      # don't q++ at last because size does not include tail \0
+    cdef Py_ssize_t qsize = (q - <byte*>PyBytes_AS_STRING(qout))
+    assert qsize <= qmaxsize
+    _PyBytes_Resize(&qout, qsize)
+    bqout = <bytes>qout
+    Py_DECREF(qout)
+    out_nonascii_escape[0] = nonascii_escape
+    return bqout
+# unquote decodes "-quoted unicode|byte string.
+#
+# ValueError is raised if there are quoting syntax errors.
+def pyunquote(s):  # -> bstr
+    us, tail = pyunquote_next(s)
+    if len(tail) != 0:
+        raise ValueError('non-empty tail after closing "')
+    return us
+# unquote_next decodes next "-quoted unicode|byte string.
+#
+# it returns -> (unquoted(s), tail-after-")
+#
+# ValueError is raised if there are quoting syntax errors.
+def pyunquote_next(s):  # -> (bstr, bstr)
+    us, tail = _unquote_next(pyb(s))
+    return pyb(us), pyb(tail)
+cdef _unquote_next(s):
+    assert isinstance(s, bytes)
+    if len(s) == 0 or s[0:0+1] != b'"':
+        raise ValueError('no starting "')
+    outv = []
+    emit= outv.append
+    s = s[1:]
+    while 1:
+        r, width = _utf8_decode_rune(s)
+        if width == 0:
+            raise ValueError('no closing "')
+        if r == ord('"'):
+            s = s[1:]
+            break
+        # regular UTF-8 character
+        if r != ord('\\'):
+            emit(s[:width])
+            s = s[width:]
+            continue
+        if len(s) < 2:
+            raise ValueError('unexpected EOL after \\')
+        c = s[1:1+1]
+        # \<c> -> <c>   ; c = \ "
+        if c in b'\\"':
+            emit(c)
+            s = s[2:]
+            continue
+        # \t \n \r
+        uc = None
+        if   c == b't':  uc = b'\t'
+        elif c == b'n':  uc = b'\n'
+        elif c == b'r':  uc = b'\r'
+        # accept also \a \b \v \f that Go might produce
+        # Python also decodes those escapes even though it does not produce them:
+        # https://github.com/python/cpython/blob/2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L677-L688
+        elif c == b'a':  uc = b'\x07'
+        elif c == b'b':  uc = b'\x08'
+        elif c == b'v':  uc = b'\x0b'
+        elif c == b'f':  uc = b'\x0c'
+        if uc is not None:
+            emit(uc)
+            s = s[2:]
+            continue
+        # \x?? hex
+        if c == b'x':   # XXX also handle octals?
+            if len(s) < 2+2:
+                raise ValueError('unexpected EOL after \\x')
+            b = codecs.decode(s[2:2+2], 'hex')
+            emit(b)
+            s = s[2+2:]
+            continue
+        raise ValueError('invalid escape \\%s' % chr(ord(c[0:0+1])))
+    return b''.join(outv), s
+cdef _unicodedata_category = unicodedata.category
--- a/golang/fmt.h
+++ b/golang/fmt.h
 #ifndef _NXD_LIBGOLANG_FMT_H
 #define _NXD_LIBGOLANG_FMT_H
-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -111,7 +111,7 @@ inline error errorf(const string& format, Argv... argv) {
 // `const char *` overloads just to catch format mistakes as
 // __attribute__(format) does not work with std::string.
 LIBGOLANG_API string sprintf(const char *format, ...)
-#ifndef _MSC_VER
+#ifndef LIBGOLANG_CC_msc
                                __attribute__ ((format (printf, 1, 2)))
 #endif
 	;

--- a/golang/golang_str_pickle_test.py
+++ b/golang/golang_str_pickle_test.py
+# -*- coding: utf-8 -*-
+# Copyright (C) 2022-2025  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+from __future__ import print_function, absolute_import
+from golang import b, u, bstr, ustr
+from golang.golang_str_test import xbytes, unicode
+from pytest import raises, fixture
+import io, struct
+import six
+# run all tests on all py/c pickle modules we aim to support
+import pickle as stdPickle
+if six.PY2:
+    import cPickle
+else:
+    import _pickle as cPickle
+from zodbpickle import slowpickle as zslowPickle
+from zodbpickle import fastpickle as zfastPickle
+from zodbpickle import pickle  as zpickle
+from zodbpickle import _pickle as _zpickle
+import pickletools as stdpickletools
+if six.PY2:
+    from zodbpickle import pickletools_2 as zpickletools
+else:
+    from zodbpickle import pickletools_3 as zpickletools
+# pickle is pytest fixture that yields all variants of pickle module.
+@fixture(scope="function", params=[stdPickle, cPickle,
+                                   zslowPickle, zfastPickle, zpickle, _zpickle])
+def pickle(request):
+    yield request.param
+# pickletools is pytest fixture that yields all variants of pickletools module.
+@fixture(scope="function", params=[stdpickletools, zpickletools])
+def pickletools(request):
+    yield request.param
+# pickle2tools returns pickletools module that corresponds to module pickle.
+def pickle2tools(pickle):
+    if pickle in (stdPickle, cPickle):
+        return stdpickletools
+    else:
+        return zpickletools
+# verify that loading *UNICODE opcodes loads them as unicode/ustr.
+# this is standard behaviour but we verify it since we will patch pickle's strings processing.
+# also verify save lightly for symmetry.
+def test_strings_pickle_loadsave_UNICODE(pickle):
+    # NOTE builtin pickle behaviour is to save unicode via 'surrogatepass' error handler
+    #      this means that b'мир\xff' -> ustr/unicode -> save will emit *UNICODE with
+    #      b'мир\xed\xb3\xbf' instead of b'мир\xff' as data.
+    p_uni   = b'V\\u043c\\u0438\\u0440\\udcff\n.'                       # UNICODE 'мир\uDCFF'
+    p_binu  = b'X\x09\x00\x00\x00\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbf.' # BINUNICODE  NOTE ...edb3bf not ...ff
+    p_sbinu = b'\x8c\x09\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbf.'          # SHORT_BINUNICODE
+    p_binu8 = b'\x8d\x09\x00\x00\x00\x00\x00\x00\x00\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbf.' # BINUNICODE8
+    u_obj = u'мир\uDCFF'; assert type(u_obj) is unicode
+    # load: check invokes f on all test pickles that pickle should support
+    def check(f):
+        f(p_uni)
+        f(p_binu)
+        if HIGHEST_PROTOCOL(pickle) >= 4:
+            f(p_sbinu)
+            f(p_binu8)
+    def _(p):
+        obj = xloads(pickle, p)
+        assert type(obj) is unicode
+        assert obj == u_obj
+    check(_)
+    # save
+    def dumps(proto):
+        return xdumps(pickle, u_obj, proto)
+    assert dumps(0) == p_uni
+    assert dumps(1) == p_binu
+    assert dumps(2) == p_binu
+    if HIGHEST_PROTOCOL(pickle) >= 3:
+        assert dumps(3) == p_binu
+    if HIGHEST_PROTOCOL(pickle) >= 4:
+        assert dumps(4) == p_sbinu
+# verify that bstr/ustr can be pickled/unpickled correctly.
+def test_strings_pickle_bstr_ustr(pickle):
+    bs = b(xbytes('мир')+b'\xff')
+    us = u(xbytes('май')+b'\xff')
+    def diss(p): return xdiss(pickle2tools(pickle), p)
+    def dis(p): print(diss(p))
+    # assert_pickle verifies that pickling obj results in dumps_ok
+    # and that unpickling results back in obj.
+    assert HIGHEST_PROTOCOL(pickle) <= 5
+    def assert_pickle(obj, proto, dumps_ok):
+        if proto > HIGHEST_PROTOCOL(pickle):
+            with raises(ValueError):
+                xdumps(pickle, obj, proto)
+            return
+        p = xdumps(pickle, obj, proto)
+        assert p == dumps_ok, diss(p)
+        #dis(p)
+        obj2 = xloads(pickle, p)
+        assert type(obj2) is type(obj)
+        assert obj2 == obj
+    _ = assert_pickle
+    _(bs, 0,
+             b"cgolang\n_butf8b\n(V\\u043c\\u0438\\u0440\\udcff\ntR.")      # _butf8b(UNICODE)
+    _(us, 0,
+             b'cgolang\nustr\n(V\\u043c\\u0430\\u0439\\udcff\ntR.')         # ustr(UNICODE)
+    _(bs, 1,
+             b'cgolang\n_butf8b\n(X\x09\x00\x00\x00'                        # _butf8b(BINUNICODE)
+                        b'\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbftR.')
+    # NOTE BINUNICODE ...edb3bf not ...ff  (see test_strings_pickle_loadsave_UNICODE for details)
+    _(us, 1,
+             b'cgolang\nustr\n(X\x09\x00\x00\x00'                           # ustr(BINUNICODE)
+                        b'\xd0\xbc\xd0\xb0\xd0\xb9\xed\xb3\xbftR.')
+    _(bs, 2,
+             b'cgolang\n_butf8b\nX\x09\x00\x00\x00'                         # _butf8b(BINUNICODE)
+                        b'\xd0\xbc\xd0\xb8\xd1\x80\xed\xb3\xbf\x85R.')
+    _(us, 2,
+             b'cgolang\nustr\nX\x09\x00\x00\x00'                            # ustr(BINUNICODE)
+                        b'\xd0\xbc\xd0\xb0\xd0\xb9\xed\xb3\xbf\x85\x81.')
+    _(bs, 3,
+             b'cgolang\nbstr\nC\x07\xd0\xbc\xd0\xb8\xd1\x80\xff\x85\x81.')  # bstr(SHORT_BINBYTES)
+    _(us, 3,
+             b'cgolang\nustr\nX\x09\x00\x00\x00'                            # ustr(BINUNICODE)
+                        b'\xd0\xbc\xd0\xb0\xd0\xb9\xed\xb3\xbf\x85\x81.')
+    for p in (4,5):
+        _(bs, p,
+             b'\x8c\x06golang\x8c\x04bstr\x93C\x07'                         # bstr(SHORT_BINBYTES)
+                        b'\xd0\xbc\xd0\xb8\xd1\x80\xff\x85\x81.')
+        _(us, p,
+             b'\x8c\x06golang\x8c\x04ustr\x93\x8c\x09'                      # ustr(SHORT_BINUNICODE)
+                        b'\xd0\xbc\xd0\xb0\xd0\xb9\xed\xb3\xbf\x85\x81.')
+# ---- disassembly ----
+# xdiss returns disassembly of a pickle as string.
+def xdiss(pickletools, p): # -> str
+    out = six.StringIO()
+    pickletools.dis(p, out)
+    return out.getvalue()
+# ---- loads and normalized dumps ----
+# xloads loads pickle p via pickle.loads
+# it also verifies that .load and Unpickler.load give the same result.
+def xloads(pickle, p, **kw):
+    obj1 = _xpickle_attr(pickle, 'loads')(p, **kw)
+    obj2 = _xpickle_attr(pickle, 'load') (io.BytesIO(p), **kw)
+    obj3 = _xpickle_attr(pickle, 'Unpickler')(io.BytesIO(p), **kw).load()
+    assert type(obj2) is type(obj1)
+    assert type(obj3) is type(obj1)
+    assert obj1 == obj2 == obj3
+    return obj1
+# xdumps dumps obj via pickle.dumps
+# it also verifies that .dump and Pickler.dump give the same.
+def xdumps(pickle, obj, proto, **kw):
+    p1 = _xpickle_attr(pickle, 'dumps')(obj, proto, **kw)
+    f2 = io.BytesIO();  _xpickle_attr(pickle, 'dump')(obj, f2, proto, **kw)
+    p2 = f2.getvalue()
+    f3 = io.BytesIO();  _xpickle_attr(pickle, 'Pickler')(f3, proto, **kw).dump(obj)
+    p3 = f3.getvalue()
+    assert type(p1) is bytes
+    assert type(p2) is bytes
+    assert type(p3) is bytes
+    assert p1 == p2 == p3
+    # remove not interesting parts: PROTO / FRAME header and unused PUTs
+    if proto >= 2:
+        assert p1.startswith(PROTO(proto))
+    return pickle_normalize(pickle2tools(pickle), p1)
+def _xpickle_attr(pickle, name):
+    # on py3 pickle.py tries to import from C _pickle to optimize by default
+    # -> verify py version if we are asked to test pickle.py
+    if six.PY3 and (pickle is stdPickle):
+        assert getattr(pickle, name) is getattr(cPickle, name)
+        name = '_'+name
+    return getattr(pickle, name)
+# pickle_normalize returns normalized version of pickle p.
+#
+# - PROTO and FRAME opcodes are removed from header,
+# - unused PUT, BINPUT and MEMOIZE opcodes - those without corresponding GET are removed,
+# - *PUT indices start from 0 (this unifies cPickle with pickle).
+def pickle_normalize(pickletools, p):
+    def iter_pickle(p): # -> i(op, arg, pdata)
+        op_prev  = None
+        arg_prev = None
+        pos_prev = None
+        for op, arg, pos in pickletools.genops(p):
+            if op_prev is not None:
+                pdata_prev = p[pos_prev:pos]
+                yield (op_prev, arg_prev, pdata_prev)
+            op_prev  = op
+            arg_prev = arg
+            pos_prev = pos
+        if op_prev is not None:
+            yield (op_prev, arg_prev, p[pos_prev:])
+    memo_oldnew = {} # idx used in original pop/get -> new index | None if not get
+    idx = 0
+    for op, arg, pdata in iter_pickle(p):
+        if 'PUT' in op.name:
+            memo_oldnew.setdefault(arg, None)
+        elif 'MEMOIZE' in op.name:
+            memo_oldnew.setdefault(len(memo_oldnew), None)
+        elif 'GET' in op.name:
+            if memo_oldnew.get(arg) is None:
+                memo_oldnew[arg] = idx
+                idx += 1
+    pout = b''
+    memo_old = set() # idx used in original pop
+    for op, arg, pdata in iter_pickle(p):
+        if op.name in ('PROTO', 'FRAME'):
+            continue
+        if 'PUT' in op.name:
+            memo_old.add(arg)
+            newidx = memo_oldnew.get(arg)
+            if newidx is None:
+                continue
+            pdata = globals()[op.name](newidx)
+        if 'MEMOIZE' in op.name:
+            idx = len(memo_old)
+            memo_old.add(idx)
+            newidx = memo_oldnew.get(idx)
+            if newidx is None:
+                continue
+        if 'GET' in op.name:
+            newidx = memo_oldnew[arg]
+            assert newidx is not None
+            pdata = globals()[op.name](newidx)
+        pout += pdata
+    return pout
+P = struct.pack
+def PROTO(version):     return b'\x80'  + P('<B', version)
+def FRAME(size):        return b'\x95'  + P('<Q', size)
+def GET(idx):           return b'g%d\n' % (idx,)
+def PUT(idx):           return b'p%d\n' % (idx,)
+def BINPUT(idx):        return b'q'     + P('<B', idx)
+def BINGET(idx):        return b'h'     + P('<B', idx)
+def LONG_BINPUT(idx):   return b'r'     + P('<I', idx)
+def LONG_BINGET(idx):   return b'j'     + P('<I', idx)
+MEMOIZE =                      b'\x94'
+def test_pickle_normalize(pickletools):
+    def diss(p):
+        return xdiss(pickletools, p)
+    proto = 0
+    for op in pickletools.opcodes:
+        proto = max(proto, op.proto)
+    assert proto >= 2
+    def _(p, p_normok):
+        p_norm = pickle_normalize(pickletools, p)
+        assert p_norm == p_normok, diss(p_norm)
+    _(b'.', b'.')
+    _(b'I1\n.', b'I1\n.')
+    _(PROTO(2)+b'I1\n.', b'I1\n.')
+    putgetv = [(PUT,GET), (BINPUT, BINGET)]
+    if proto >= 4:
+        putgetv.append((LONG_BINPUT, LONG_BINGET))
+    for (put,get) in putgetv:
+        _(b'(I1\n'+put(1) + b'I2\n'+put(2) +b't'+put(3)+b'0'+get(3)+put(4)+b'.',
+          b'(I1\nI2\nt'+put(0)+b'0'+get(0)+b'.')
+    if proto >= 4:
+        _(FRAME(4)+b'I1\n.', b'I1\n.')
+        _(b'I1\n'+MEMOIZE+b'I2\n'+MEMOIZE+GET(0)+b'.',
+          b'I1\n'+MEMOIZE+b'I2\n'+GET(0)+b'.')
+# ---- misc ----
+# HIGHEST_PROTOCOL returns highest protocol supported by pickle.
+def HIGHEST_PROTOCOL(pickle):
+    if   six.PY3  and  pickle is cPickle:
+        pmax = stdPickle.HIGHEST_PROTOCOL  # py3: _pickle has no .HIGHEST_PROTOCOL
+    elif six.PY3  and  pickle is _zpickle:
+        pmax = zpickle.HIGHEST_PROTOCOL    # ----//---- for _zpickle
+    else:
+        pmax = pickle.HIGHEST_PROTOCOL
+    assert pmax >= 2
+    return pmax
--- a/golang/golang_str_test.py
+++ b/golang/golang_str_test.py
--- a/golang/libgolang.h
+++ b/golang/libgolang.h
@@ -169,6 +169,8 @@
 // [1] Libtask: a Coroutine Library for C and Unix. https://swtch.com/libtask.
 // [2] http://9p.io/magic/man2html/2/thread.
+#include "golang/runtime/platform.h"
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
@@ -177,21 +179,18 @@
 #include <sys/stat.h>
 #include <fcntl.h>
-#ifdef _MSC_VER // no mode_t on msvc
+#ifdef LIBGOLANG_CC_msc // no mode_t on msvc
 typedef int mode_t;
 #endif
 // DSO symbols visibility (based on https://gcc.gnu.org/wiki/Visibility)
-#if defined _WIN32 || defined __CYGWIN__
+#ifdef LIBGOLANG_OS_windows
  #define LIBGOLANG_DSO_EXPORT __declspec(dllexport)
  #define LIBGOLANG_DSO_IMPORT __declspec(dllimport)
-#elif __GNUC__ >= 4
+#else
  #define LIBGOLANG_DSO_EXPORT __attribute__ ((visibility ("default")))
  #define LIBGOLANG_DSO_IMPORT __attribute__ ((visibility ("default")))
-#else
-  #define LIBGOLANG_DSO_EXPORT
-  #define LIBGOLANG_DSO_IMPORT
 #endif
 #if BUILDING_LIBGOLANG
@@ -438,6 +437,10 @@ constexpr Nil nil = nullptr;
 // string is alias for std::string.
 using string = std::string;
+// byte/rune types related to string.
+using byte = uint8_t;
+using rune = int32_t;
 // func is alias for std::function.
 template<typename F>
 using func = std::function<F>;

--- a/golang/os.cpp
+++ b/golang/os.cpp
-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -38,7 +38,7 @@
 // cut this short
 // (on darwing sys_siglist declaration is normally provided)
 // (on windows sys_siglist is not available at all)
-#if !(defined(__APPLE__) || defined(_WIN32))
+#if !(defined(LIBGOLANG_OS_darwin) || defined(LIBGOLANG_OS_windows))
 extern "C" {
    extern const char * const sys_siglist[];
 }
@@ -287,7 +287,7 @@ string Signal::String() const {
    const Signal& sig = *this;
    const char *sigstr = nil;
-#ifdef _WIN32
+#ifdef LIBGOLANG_OS_windows
    switch (sig.signo) {
    case SIGABRT:   return "Aborted";
    case SIGBREAK:  return "Break";

--- a/golang/os.h
+++ b/golang/os.h
 #ifndef _NXD_LIBGOLANG_OS_H
 #define _NXD_LIBGOLANG_OS_H
 //
-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -96,7 +96,7 @@ private:
 // Open opens file @path.
 LIBGOLANG_API std::tuple<File, error> Open(const string &path, int flags = O_RDONLY,
        mode_t mode =
-#if !defined(_MSC_VER)
+#if !defined(LIBGOLANG_CC_msc)
                      S_IRUSR | S_IWUSR | S_IXUSR |
                      S_IRGRP | S_IWGRP | S_IXGRP |
                      S_IROTH | S_IWOTH | S_IXOTH

--- a/golang/os/signal.cpp
+++ b/golang/os/signal.cpp
-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -89,7 +89,7 @@
 #include <atomic>
 #include <tuple>
-#if defined(_WIN32)
+#if defined(LIBGOLANG_OS_windows)
 # include <windows.h>
 #endif
@@ -101,7 +101,7 @@
 #  define debugf(format, ...) do {} while (0)
 #endif
-#if defined(_MSC_VER)
+#ifdef LIBGOLANG_CC_msc
 # define HAVE_SIGACTION 0
 #else
 # define HAVE_SIGACTION 1
@@ -194,7 +194,7 @@ void _init() {
    if (err != nil)
        panic("os::newFile(_wakerx");
    _waketx = vfd[1];
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
    if (sys::Fcntl(_waketx, F_SETFL, O_NONBLOCK) < 0)
        panic("fcntl(_waketx, O_NONBLOCK)");    // TODO +syserr
 #else

--- a/golang/pyx/build.py
+++ b/golang/pyx/build.py
-# Copyright (C) 2019-2023  Nexedi SA and Contributors.
+# Copyright (C) 2019-2024  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -212,9 +212,11 @@ def _with_build_defaults(name, kw):   # -> (pygo, kw')
    dependv = kw.get('depends', [])[:]
    dependv.extend(['%s/golang/%s' % (pygo, _) for _ in [
        'libgolang.h',
+        'runtime.h',
        'runtime/internal.h',
        'runtime/internal/atomic.h',
        'runtime/internal/syscall.h',
+        'runtime/platform.h',
        'context.h',
        'cxx.h',
        'errors.h',
@@ -226,6 +228,7 @@ def _with_build_defaults(name, kw):   # -> (pygo, kw')
        'os.h',
        'os/signal.h',
        'pyx/runtime.h',
+        'unicode/utf8.h',
        '_testing.h',
        '_compat/windows/strings.h',
        '_compat/windows/unistd.h',
@@ -264,6 +267,8 @@ def Extension(name, sources, **kw):
        '_fmt.pxd',
        'io.pxd',
        '_io.pxd',
+        'strconv.pxd',
+        '_strconv.pxd',
        'strings.pxd',
        'sync.pxd',
        '_sync.pxd',
@@ -274,6 +279,8 @@ def Extension(name, sources, **kw):
        'os/signal.pxd',
        'os/_signal.pxd',
        'pyx/runtime.pxd',
+        'unicode/utf8.pxd',
+        'unicode/_utf8.pxd',
    ]])
    kw['depends'] = dependv

--- a/golang/runtime.cpp
+++ b/golang/runtime.cpp
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+// Package runtime mirrors Go package runtime.
+// See runtime.h for package overview.
+#include "golang/runtime.h"
+// golang::runtime::
+namespace golang {
+namespace runtime {
+const string OS =
+#ifdef LIBGOLANG_OS_linux
+    "linux"
+#elif defined(LIBGOLANG_OS_darwin)
+    "darwin"
+#elif defined(LIBGOLANG_OS_windows)
+    "windows"
+#else
+# error
+#endif
+    ;
+const string CC =
+#ifdef LIBGOLANG_CC_gcc
+    "gcc"
+#elif defined(LIBGOLANG_CC_clang)
+    "clang"
+#elif defined(LIBGOLANG_CC_msc)
+    "msc"
+#else
+# error
+#endif
+    ;
+}}  // golang::runtime::
--- a/golang/runtime.h
+++ b/golang/runtime.h
+#ifndef _NXD_LIBGOLANG_RUNTIME_H
+#define _NXD_LIBGOLANG_RUNTIME_H
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+// Package runtime mirrors Go package runtime.
+#include "golang/libgolang.h"
+// golang::runtime::
+namespace golang {
+namespace runtime {
+// OS indicates operating system, that is running the program.
+//
+// e.g. "linux", "darwin", "windows", ...
+extern LIBGOLANG_API const string OS;
+// CC indicates C/C++ compiler, that compiled the program.
+//
+// e.g. "gcc", "clang", "msc", ...
+extern LIBGOLANG_API const string CC;
+}} // golang::runtime::
+#endif  // _NXD_LIBGOLANG_RUNTIME_H
--- a/golang/runtime/_runtime_gevent.pyx
+++ b/golang/runtime/_runtime_gevent.pyx
@@ -40,7 +40,7 @@ ELSE:
 from gevent import sleep as pygsleep
-from libc.stdint cimport uint8_t, uint64_t, UINT64_MAX
+from libc.stdint cimport uint64_t, UINT64_MAX
 cdef extern from *:
    ctypedef bint cbool "bool"
@@ -52,7 +52,7 @@ from golang.runtime._libgolang cimport _libgolang_runtime_ops, _libgolang_sema,
 from golang.runtime.internal cimport syscall
 from golang.runtime cimport _runtime_thread
 from golang.runtime._runtime_pymisc cimport PyExc, pyexc_fetch, pyexc_restore
-from golang cimport topyexc
+from golang cimport byte, topyexc
 from libc.stdlib cimport calloc, free
 from libc.errno  cimport EBADF
@@ -351,7 +351,7 @@ cdef nogil:
 cdef:
    bint _io_read(IOH* ioh, int* out_n, void *buf, size_t count):
        pygfobj = <object>ioh.pygfobj
-        cdef uint8_t[::1] mem = <uint8_t[:count]>buf
+        cdef byte[::1] mem = <byte[:count]>buf
        xmem = memoryview(mem) # to avoid https://github.com/cython/cython/issues/3900 on mem[:0]=b''
        try:
            # NOTE buf might be on stack, so it must not be accessed, e.g. from
@@ -388,7 +388,7 @@ cdef nogil:
 cdef:
    bint _io_write(IOH* ioh, int* out_n, const void *buf, size_t count):
        pygfobj = <object>ioh.pygfobj
-        cdef const uint8_t[::1] mem = <const uint8_t[:count]>buf
+        cdef const byte[::1] mem = <const byte[:count]>buf
        # NOTE buf might be on stack, so it must not be accessed, e.g. from
        # FileObjectThread, while our greenlet is parked (see STACK_DEAD_WHILE_PARKED

--- a/golang/runtime/internal/atomic.cpp
+++ b/golang/runtime/internal/atomic.cpp
-// Copyright (C) 2022-2023  Nexedi SA and Contributors.
+// Copyright (C) 2022-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -20,7 +20,7 @@
 #include "golang/runtime/internal/atomic.h"
 #include "golang/libgolang.h"
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 #include <pthread.h>
 #endif
@@ -44,7 +44,7 @@ static void _forkNewEpoch() {
 void _init() {
 // there is no fork on windows
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
    int e = pthread_atfork(/*prepare*/nil, /*inparent*/nil, /*inchild*/_forkNewEpoch);
    if (e != 0)
        panic("pthread_atfork failed");

--- a/golang/runtime/internal/syscall.cpp
+++ b/golang/runtime/internal/syscall.cpp
-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -58,9 +58,9 @@ string _Errno::Error() {
    char ebuf[128];
    bool ok;
-#if __APPLE__
+#ifdef LIBGOLANG_OS_darwin
    ok = (::strerror_r(-e.syserr, ebuf, sizeof(ebuf)) == 0);
-#elif defined(_WIN32)
+#elif defined(LIBGOLANG_OS_windows)
    ok = (::strerror_s(ebuf, sizeof(ebuf), -e.syserr) == 0);
 #else
    char *estr = ::strerror_r(-e.syserr, ebuf, sizeof(ebuf));
@@ -102,7 +102,7 @@ __Errno Close(int fd) {
    return err;
 }
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 __Errno Fcntl(int fd, int cmd, int arg) {
    int save_errno = errno;
    int err = ::fcntl(fd, cmd, arg);
@@ -124,7 +124,7 @@ __Errno Fstat(int fd, struct ::stat *out_st) {
 int Open(const char *path, int flags, mode_t mode) {
    int save_errno = errno;
-#ifdef _WIN32  // default to open files in binary mode
+#ifdef LIBGOLANG_OS_windows  // default to open files in binary mode
    if ((flags & (_O_TEXT | _O_BINARY)) == 0)
        flags |= _O_BINARY;
 #endif
@@ -141,9 +141,9 @@ __Errno Pipe(int vfd[2], int flags) {
        return -EINVAL;
    int save_errno = errno;
    int err;
-#ifdef __linux__
+#ifdef LIBGOLANG_OS_linux
    err = ::pipe2(vfd, flags);
-#elif defined(_WIN32)
+#elif defined(LIBGOLANG_OS_windows)
    err = ::_pipe(vfd, 4096, flags | _O_BINARY);
 #else
    err = ::pipe(vfd);
@@ -167,7 +167,7 @@ out:
    return err;
 }
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 __Errno Sigaction(int signo, const struct ::sigaction *act, struct ::sigaction *oldact) {
    int save_errno = errno;
    int err = ::sigaction(signo, act, oldact);

--- a/golang/runtime/internal/syscall.h
+++ b/golang/runtime/internal/syscall.h
 #ifndef _NXD_LIBGOLANG_RUNTIME_INTERNAL_SYSCALL_H
 #define _NXD_LIBGOLANG_RUNTIME_INTERNAL_SYSCALL_H
-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -63,13 +63,13 @@ LIBGOLANG_API int/*n|err*/ Read(int fd, void *buf, size_t count);
 LIBGOLANG_API int/*n|err*/ Write(int fd, const void *buf, size_t count);
 LIBGOLANG_API __Errno Close(int fd);
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 LIBGOLANG_API __Errno Fcntl(int fd, int cmd, int arg);
 #endif
 LIBGOLANG_API __Errno Fstat(int fd, struct ::stat *out_st);
 LIBGOLANG_API int/*fd|err*/ Open(const char *path, int flags, mode_t mode);
 LIBGOLANG_API __Errno Pipe(int vfd[2], int flags);
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 LIBGOLANG_API __Errno Sigaction(int signo, const struct ::sigaction *act, struct ::sigaction *oldact);
 #endif
 typedef void (*sighandler_t)(int);

--- a/golang/runtime/libgolang.cpp
+++ b/golang/runtime/libgolang.cpp
@@ -52,7 +52,7 @@
 #include <linux/list.h>
 // MSVC does not support statement expressions and typeof
 // -> redo list_entry via C++ lambda.
-#ifdef _MSC_VER
+#ifdef LIBGOLANG_CC_msc
 # undef list_entry
 # define list_entry(ptr, type, member) [&]() {                      \
        const decltype( ((type *)0)->member ) *__mptr = (ptr);      \

--- a/golang/runtime/platform.h
+++ b/golang/runtime/platform.h
+#ifndef _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
+#define _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+// Header platform.h provides preprocessor defines that describe target platform.
+// LIBGOLANG_OS_<X> is defined on operating system X.
+//
+// List of supported operating systems: linux, darwin, windows.
+#ifdef __linux__
+# define LIBGOLANG_OS_linux     1
+#elif defined(__APPLE__)
+# define LIBGOLANG_OS_darwin    1
+#elif defined(_WIN32) || defined(__CYGWIN__)
+# define LIBGOLANG_OS_windows   1
+#else
+# error "unsupported operating system"
+#endif
+// LIBGOLANG_CC_<X> is defined on C/C++ compiler X.
+//
+// List of supported compilers: gcc, clang, msc.
+#ifdef __clang__
+# define LIBGOLANG_CC_clang     1
+#elif defined(_MSC_VER)
+# define LIBGOLANG_CC_msc       1
+// NOTE gcc comes last because e.g. clang and icc define __GNUC__ as well
+#elif __GNUC__
+# define LIBGOLANG_CC_gcc       1
+#else
+# error "unsupported compiler"
+#endif
+#endif  // _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
--- a/golang/strconv.pxd
+++ b/golang/strconv.pxd
+# cython: language_level=2
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package strconv provides Go-compatible string conversions.
+See _strconv.pxd for package documentation.
+"""
+# redirect cimport: golang.strconv -> golang._strconv (see __init__.pxd for rationale)
+from golang._strconv cimport *
--- a/golang/strconv.py
+++ b/golang/strconv.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2022  Nexedi SA and Contributors.
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -21,174 +21,7 @@
 from __future__ import print_function, absolute_import
-import unicodedata, codecs
+from golang._strconv import \
-from six import text_type as unicode        # py2: unicode      py3: str
+    pyquote             as quote,       \
-from six.moves import range as xrange
+    pyunquote           as unquote,     \
+    pyunquote_next      as unquote_next
-from golang import b, u
-from golang._golang import _py_utf8_decode_rune as _utf8_decode_rune, _py_rune_error as _rune_error, _xunichr
-# _bstr is like b but also returns whether input was unicode.
-def _bstr(s):   # -> sbytes, wasunicode
-    return b(s), isinstance(s, unicode)
-# _ustr is like u but also returns whether input was bytes.
-def _ustr(s):   # -> sunicode, wasbytes
-    return u(s), isinstance(s, bytes)
-# quote quotes unicode|bytes string into valid "..." unicode|bytes string always quoted with ".
-def quote(s):
-    s, wasunicode = _bstr(s)
-    qs = _quote(s)
-    if wasunicode:
-        qs, _ = _ustr(qs)
-    return qs
-def _quote(s):
-    assert isinstance(s, bytes)
-    outv = []
-    emit = outv.append
-    i = 0
-    while i < len(s):
-        c = s[i:i+1]
-        # fast path - ASCII only
-        if ord(c) < 0x80:
-            if c in b'\\"':
-                emit(b'\\'+c)
-            # printable ASCII
-            elif b' ' <= c <= b'\x7e':
-                emit(c)
-            # non-printable ASCII
-            elif c == b'\t':
-                emit(br'\t')
-            elif c == b'\n':
-                emit(br'\n')
-            elif c == b'\r':
-                emit(br'\r')
-            # everything else is non-printable
-            else:
-                emit(br'\x%02x' % ord(c))
-            i += 1
-        # slow path - full UTF-8 decoding + unicodedata
-        else:
-            r, size = _utf8_decode_rune(s[i:])
-            isize = i + size
-            # decode error - just emit raw byte as escaped
-            if r == _rune_error  and  size == 1:
-                emit(br'\x%02x' % ord(c))
-            # printable utf-8 characters go as is
-            elif unicodedata.category(_xunichr(r))[0] in _printable_cat0:
-                emit(s[i:isize])
-            # everything else goes in numeric byte escapes
-            else:
-                for j in xrange(i, isize):
-                    emit(br'\x%02x' % ord(s[j:j+1]))
-            i = isize
-    return b'"' + b''.join(outv) + b'"'
-# unquote decodes "-quoted unicode|byte string.
-#
-# ValueError is raised if there are quoting syntax errors.
-def unquote(s):
-    us, tail = unquote_next(s)
-    if len(tail) != 0:
-        raise ValueError('non-empty tail after closing "')
-    return us
-# unquote_next decodes next "-quoted unicode|byte string.
-#
-# it returns -> (unquoted(s), tail-after-")
-#
-# ValueError is raised if there are quoting syntax errors.
-def unquote_next(s):
-    s, wasunicode = _bstr(s)
-    us, tail = _unquote_next(s)
-    if wasunicode:
-        us, _   = _ustr(us)
-        tail, _ = _ustr(tail)
-    return us, tail
-def _unquote_next(s):
-    assert isinstance(s, bytes)
-    if len(s) == 0 or s[0:0+1] != b'"':
-        raise ValueError('no starting "')
-    outv = []
-    emit= outv.append
-    s = s[1:]
-    while 1:
-        r, width = _utf8_decode_rune(s)
-        if width == 0:
-            raise ValueError('no closing "')
-        if r == ord('"'):
-            s = s[1:]
-            break
-        # regular UTF-8 character
-        if r != ord('\\'):
-            emit(s[:width])
-            s = s[width:]
-            continue
-        if len(s) < 2:
-            raise ValueError('unexpected EOL after \\')
-        c = s[1:1+1]
-        # \<c> -> <c>   ; c = \ "
-        if c in b'\\"':
-            emit(c)
-            s = s[2:]
-            continue
-        # \t \n \r
-        uc = None
-        if   c == b't':  uc = b'\t'
-        elif c == b'n':  uc = b'\n'
-        elif c == b'r':  uc = b'\r'
-        # accept also \a \b \v \f that Go might produce
-        # Python also decodes those escapes even though it does not produce them:
-        # https://github.com/python/cpython/blob/2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L677-L688
-        elif c == b'a':  uc = b'\x07'
-        elif c == b'b':  uc = b'\x08'
-        elif c == b'v':  uc = b'\x0b'
-        elif c == b'f':  uc = b'\x0c'
-        if uc is not None:
-            emit(uc)
-            s = s[2:]
-            continue
-        # \x?? hex
-        if c == b'x':   # XXX also handle octals?
-            if len(s) < 2+2:
-                raise ValueError('unexpected EOL after \\x')
-            b = codecs.decode(s[2:2+2], 'hex')
-            emit(b)
-            s = s[2+2:]
-            continue
-        raise ValueError('invalid escape \\%s' % chr(ord(c[0:0+1])))
-    return b''.join(outv), s
-_printable_cat0 = frozenset(['L', 'N', 'P', 'S'])   # letters, numbers, punctuation, symbols
--- a/golang/strconv_test.py
+++ b/golang/strconv_test.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2022  Nexedi SA and Contributors.
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -20,12 +20,16 @@
 from __future__ import print_function, absolute_import
+from golang import bstr
 from golang.strconv import quote, unquote, unquote_next
 from golang.gcompat import qq
-from six import int2byte as bchr, PY3
+from six import int2byte as bchr
 from six.moves import range as xrange
-from pytest import raises
+from pytest import raises, mark
+import codecs
 def byterange(start, stop):
    b = b""
@@ -34,16 +38,9 @@ def byterange(start, stop):
    return b
-# asstr converts unicode|bytes to str type of current python.
+def assert_bstreq(x, y):
-def asstr(s):
+    assert type(x) is bstr
-    if PY3:
+    assert x == y
-        if isinstance(s, bytes):
-            s = s.decode('utf-8')
-    # PY2
-    else:
-        if isinstance(s, unicode):
-            s = s.encode('utf-8')
-    return s
 def test_quote():
    testv = (
@@ -72,6 +69,9 @@ def test_quote():
        (u'\ufffd',         u'�'),
    )
+    # quote/unquote* always give bstr
+    BEQ = assert_bstreq
    for tin, tquoted in testv:
        # quote(in) == quoted
        # in = unquote(quoted)
@@ -79,14 +79,13 @@ def test_quote():
        tail = b'123' if isinstance(tquoted, bytes) else '123'
        tquoted = q + tquoted + q   # add lead/trail "
-        assert quote(tin) == tquoted
+        BEQ(quote(tin), tquoted)
-        assert unquote(tquoted) == tin
+        BEQ(unquote(tquoted), tin)
-        assert unquote_next(tquoted) == (tin, type(tin)())
+        _, __ = unquote_next(tquoted);          BEQ(_, tin);  BEQ(__, "")
-        assert unquote_next(tquoted + tail) == (tin, tail)
+        _, __ = unquote_next(tquoted + tail);   BEQ(_, tin);  BEQ(__, tail)
        with raises(ValueError): unquote(tquoted + tail)
-        # qq always gives str
+        BEQ(qq(tin), tquoted)
-        assert qq(tin) == asstr(tquoted)
        # also check how it works on complementary unicode/bytes input type
        if isinstance(tin, bytes):
@@ -103,14 +102,13 @@ def test_quote():
            tquoted = tquoted.encode('utf-8')
            tail = tail.encode('utf-8')
-        assert quote(tin) == tquoted
+        BEQ(quote(tin), tquoted)
-        assert unquote(tquoted) == tin
+        BEQ(unquote(tquoted), tin)
-        assert unquote_next(tquoted) == (tin, type(tin)())
+        _, __ = unquote_next(tquoted);          BEQ(_, tin);  BEQ(__, "")
-        assert unquote_next(tquoted + tail) == (tin, tail)
+        _, __ = unquote_next(tquoted + tail);   BEQ(_, tin);  BEQ(__, tail)
        with raises(ValueError): unquote(tquoted + tail)
-        # qq always gives str
+        BEQ(qq(tin), tquoted)
-        assert qq(tin) == asstr(tquoted)
 # verify that non-canonical quotation can be unquoted too.
@@ -143,3 +141,52 @@ def test_unquote_bad():
        with raises(ValueError) as exc:
            unquote(tin)
        assert exc.value.args == (err,)
+# ---- benchmarks ----
+# quoting + unquoting
+uchar_testv = ['a',               # ascii
+               u'α',              # 2-bytes utf8
+               u'\u65e5',         # 3-bytes utf8
+               u'\U0001f64f']     # 4-bytes utf8
+@mark.parametrize('ch', uchar_testv)
+def bench_quote(b, ch):
+    s = bstr_ch1000(ch)
+    q = quote
+    for i in xrange(b.N):
+        q(s)
+def bench_stdquote(b):
+    s = b'a'*1000
+    q = repr
+    for i in xrange(b.N):
+        q(s)
+@mark.parametrize('ch', uchar_testv)
+def bench_unquote(b, ch):
+    s = bstr_ch1000(ch)
+    s = quote(s)
+    unq = unquote
+    for i in xrange(b.N):
+        unq(s)
+def bench_stdunquote(b):
+    s = b'"' + b'a'*1000 + b'"'
+    escape_decode = codecs.escape_decode
+    def unq(s): return escape_decode(s[1:-1])[0]
+    for i in xrange(b.N):
+        unq(s)
+# bstr_ch1000 returns bstr with many repetitions of character ch occupying ~ 1000 bytes.
+def bstr_ch1000(ch): # -> bstr
+    assert len(ch) == 1
+    s = bstr(ch)
+    s = s * (1000 // len(s))
+    if len(s) % 3 == 0:
+        s += 'x'
+    assert len(s) == 1000
+    return s
--- a/golang/testprog/golang_test_str.py
+++ b/golang/testprog/golang_test_str.py
@@ -18,7 +18,7 @@
 #
 # See COPYING file for full licensing terms.
 # See https://www.nexedi.com/licensing for rationale and options.
-"""This program helps to verify _pystr and _pyunicode.
+"""This program helps to verify b, u and underlying bstr and ustr.
 It complements golang_str_test.test_strings_print.
 """
@@ -31,8 +31,17 @@ from golang.gcompat import qq
 def main():
    sb = b("привет αβγ b")
    su = u("привет αβγ u")
+    print("print(b):", sb)
+    print("print(u):", su)
    print("print(qq(b)):", qq(sb))
    print("print(qq(u)):", qq(su))
+    print("print(repr(b)):", repr(sb))
+    print("print(repr(u)):", repr(su))
+    # py2: print(dict) calls PyObject_Print(flags=0) for both keys and values,
+    #      not with flags=Py_PRINT_RAW used by default almost everywhere else.
+    #      this way we can verify whether bstr.tp_print handles flags correctly.
+    print("print({b: u}):", {sb: su})
 if __name__ == '__main__':

--- a/golang/testprog/golang_test_str.txt
+++ b/golang/testprog/golang_test_str.txt
+print(b): привет αβγ b
+print(u): привет αβγ u
 print(qq(b)): "привет αβγ b"
 print(qq(u)): "привет αβγ u"
+print(repr(b)): b('привет αβγ b')
+print(repr(u)): u('привет αβγ u')
+print({b: u}): {b('привет αβγ b'): u('привет αβγ u')}
--- a/golang/testprog/golang_test_str_index2.py
+++ b/golang/testprog/golang_test_str_index2.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright (C) 2022-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""This program helps to verify [:] handling for bstr and ustr.
+It complements golang_str_test.test_strings_index2.
+It needs to verify [:] only lightly because thorough verification is done in
+test_string_index, and here we need to verify only that __getslice__, inherited
+from builtin str/unicode, does not get into our way.
+"""
+from __future__ import print_function, absolute_import
+from golang import b, u, bstr, ustr
+from golang.gcompat import qq
+def main():
+    us = u("миру мир")
+    bs = b("миру мир")
+    def emit(what, uobj, bobj):
+        assert type(uobj) is ustr
+        assert type(bobj) is bstr
+        print("u"+what, qq(uobj))
+        print("b"+what, qq(bobj))
+    emit("s",       us,        bs)
+    emit("s[:]",    us[:],     bs[:])
+    emit("s[0:1]",  us[0:1],   bs[0:1])
+    emit("s[0:2]",  us[0:2],   bs[0:2])
+    emit("s[1:2]",  us[1:2],   bs[1:2])
+    emit("s[0:-1]", us[0:-1],  bs[0:-1])
+if __name__ == '__main__':
+    main()
--- a/golang/testprog/golang_test_str_index2.txt
+++ b/golang/testprog/golang_test_str_index2.txt
+us "миру мир"
+bs "миру мир"
+us[:] "миру мир"
+bs[:] "миру мир"
+us[0:1] "м"
+bs[0:1] "\xd0"
+us[0:2] "ми"
+bs[0:2] "м"
+us[1:2] "и"
+bs[1:2] "\xbc"
+us[0:-1] "миру ми"
+bs[0:-1] "миру ми\xd1"
--- a/golang/unicode/__init__.py
+++ b/golang/unicode/__init__.py
--- a/golang/unicode/_utf8.pxd
+++ b/golang/unicode/_utf8.pxd
+# cython: language_level=2
+# Copyright (C) 2023  Nexedi SA and Contributors.
+#                     Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package utf8 mirrors Go package utf8.
+See https://golang.org/pkg/unicode/utf8 for Go utf8 package documentation.
+"""
+from golang cimport rune
+cdef extern from "golang/unicode/utf8.h" namespace "golang::unicode::utf8" nogil:
+    rune RuneError
--- a/golang/unicode/utf8.h
+++ b/golang/unicode/utf8.h
+#ifndef _NXD_LIBGOLANG_UNICODE_UTF8_H
+#define _NXD_LIBGOLANG_UNICODE_UTF8_H
+// Copyright (C) 2023  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+// Package utf8 mirrors Go package utf8.
+#include <golang/libgolang.h>
+// golang::unicode::utf8::
+namespace golang {
+namespace unicode {
+namespace utf8 {
+constexpr rune RuneError = 0xFFFD;  // unicode replacement character
+}}} // golang::os::utf8::
+#endif  // _NXD_LIBGOLANG_UNICODE_UTF8_H
--- a/golang/unicode/utf8.pxd
+++ b/golang/unicode/utf8.pxd
+# cython: language_level=2
+# Copyright (C) 2023  Nexedi SA and Contributors.
+#                     Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package utf8 mirrors Go package utf8.
+See _utf8.pxd for package documentation.
+"""
+# redirect cimport: golang.unicode.utf8 -> golang.unicode._utf8 (see __init__.pxd for rationale)
+from golang.unicode._utf8 cimport *
--- a/gpython/gpython_test.py
+++ b/gpython/gpython_test.py
@@ -71,6 +71,12 @@ def test_golang_builtins():
    assert error  is golang.error
    assert b      is golang.b
    assert u      is golang.u
+    assert bstr   is golang.bstr
+    assert ustr   is golang.ustr
+    assert biter  is golang.biter
+    assert uiter  is golang.uiter
+    assert bbyte  is golang.bbyte
+    assert uchr   is golang.uchr
    # indirectly verify golang.__all__
    for k in golang.__all__:

--- a/setup.py
+++ b/setup.py
@@ -19,6 +19,25 @@
 # See COPYING file for full licensing terms.
 # See https://www.nexedi.com/licensing for rationale and options.
+# patch cython to allow `cdef class X(bytes)` while building pygolang to
+# workaround https://github.com/cython/cython/issues/711
+# see `cdef class pybstr` in golang/_golang_str.pyx for details.
+# (should become unneeded with cython 3 once https://github.com/cython/cython/pull/5212 is finished)
+import inspect
+from Cython.Compiler.PyrexTypes import BuiltinObjectType
+def pygo_cy_builtin_type_name_set(self, v):
+    self._pygo_name = v
+def pygo_cy_builtin_type_name_get(self):
+    name = self._pygo_name
+    if name == 'bytes':
+        caller = inspect.currentframe().f_back.f_code.co_name
+        if caller == 'analyse_declarations':
+            # need anything different from 'bytes' to deactivate check in
+            # https://github.com/cython/cython/blob/c21b39d4/Cython/Compiler/Nodes.py#L4759-L4762
+            name = 'xxx'
+    return name
+BuiltinObjectType.name = property(pygo_cy_builtin_type_name_get, pygo_cy_builtin_type_name_set)
 from setuptools import find_packages
 from setuptools.command.install_scripts import install_scripts as _install_scripts
 from setuptools.command.develop import develop as _develop
@@ -166,7 +185,8 @@ for pkg in R:
 R['all'] = Rall
 # ipython/pytest are required to test py2 integration patches
-R['all_test'] = Rall.union(['ipython', 'pytest']) # pip does not like "+" in all+test
+# zodbpickle is used to test pickle support for bstr/ustr
+R['all_test'] = Rall.union(['ipython', 'pytest', 'zodbpickle']) # pip does not like "+" in all+test
 # extras_require <- R
 extras_require = {}
@@ -207,6 +227,7 @@ setup(
                        ['golang/runtime/libgolang.cpp',
                         'golang/runtime/internal/atomic.cpp',
                         'golang/runtime/internal/syscall.cpp',
+                         'golang/runtime.cpp',
                         'golang/context.cpp',
                         'golang/errors.cpp',
                         'golang/fmt.cpp',
@@ -218,9 +239,11 @@ setup(
                         'golang/time.cpp'],
                        depends = [
                            'golang/libgolang.h',
+                            'golang/runtime.h',
                            'golang/runtime/internal.h',
                            'golang/runtime/internal/atomic.h',
                            'golang/runtime/internal/syscall.h',
+                            'golang/runtime/platform.h',
                            'golang/context.h',
                            'golang/cxx.h',
                            'golang/errors.h',
@@ -249,7 +272,9 @@ setup(
    ext_modules = [
                    Ext('golang._golang',
                        ['golang/_golang.pyx'],
-                        depends = ['golang/_golang_str.pyx']),
+                        depends = [
+                            'golang/_golang_str.pyx',
+                            'golang/_golang_str_pickle.pyx']),
                    Ext('golang.runtime._runtime_thread',
                        ['golang/runtime/_runtime_thread.pyx']),
@@ -301,6 +326,9 @@ setup(
                    Ext('golang.os._signal',
                        ['golang/os/_signal.pyx']),
+                    Ext('golang._strconv',
+                        ['golang/_strconv.pyx']),
                    Ext('golang._strings_test',
                        ['golang/_strings_test.pyx',
                         'golang/strings_test.cpp']),