Uniform UTF8-based approach to strings

Context: together with Jérome we've been struggling with porting Zodbtools to Python3 for several years. Despite several incremental attempts[1,2,3] we are not there yet with the main difficulty being backward compatibility breakage that Python3 did for bytes and unicode. During my last trial this spring, after I've tried once again to finish this porting and could not reach satisfactory result, I've finally decided to do something about this at the root of the cause: at the level of strings - where backward compatibility was broken - with the idea to fix everything once and for all. In 2018 in "Python 3 Losses: Nexedi Perspective"[4] and associated "cost overview"[5] Jean-Paul highlighted the problem of strings backward compatibility breakage, that Python 3 did, as the major one. In 2019 we had some conversations with Jérome about this topic as well[6,7]. In 2020 I've started to approach it with `b` and `u` that provide always-working conversion in between bytes and unicode[8], and via limited usage of custom bytes- and unicode- like types that are interoperable with both bytes and unicode simultaneously[9]. Today, with this work, I'm finally exposing those types for general usage, so that bytes/unicode problem could be handled automatically. The overview of the functionality is provided below: ---- 8< ---- Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with the idea to make working with byte- and unicode- strings easy and transparently interoperable: - `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` (*). - `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`. The conversion, in both encoding and decoding, never fails and never looses information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity even if bytes data is not valid UTF-8. Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity. Semantically `bstr` is array of bytes, while `ustr` is array of unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and unicode character correspondingly (+). However it is possible to yield unicode character when iterating `bstr` via `uiter`, and to yield byte character when iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of the time, and `ustr` only needs to be used for random access to string characters. See [Strings, bytes, runes and characters in Go](https://blog.golang.org/strings) for overview of this approach. Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce to `ustr`. When the coercion happens, `bytes` and `bytearray`, similarly to `bstr`, are also treated as UTF8-encoded strings. `bstr` and `ustr` are meant to be drop-in replacements for standard `str`/`unicode` classes. They support all methods of `str`/`unicode` and in particular their constructors accept arbitrary objects and either convert or stringify them. For cases when no stringification is desired, and one only wants to convert `bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer` interface (%), to Pygolang string, `b` and `u` provide way to make sure an object is either `bstr` or `ustr` correspondingly. Usage example: ```py s = b('привет') # s is bstr corresponding to UTF-8 encoding of 'привет'. s += ' мир' # s is b('привет мир') for c in uiter(s): # c will iterate through ... # [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')] # the following gives b('привет мир труд май') b('привет %s %s %s') % (u'мир', # raw unicode u'труд'.encode('utf-8'), # raw bytes u('май')) # ustr def f(s): s = u(s) # make sure s is ustr, decoding as UTF-8(^) if it was bstr, bytes, bytearray or buffer. ... # (^) the decoding never fails nor looses information. ``` (*) `unicode` on Python2, `str` on Python3. (+) ordinal of such byte and unicode character can be obtained via regular `ord`. For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal. (%) data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string. Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not. ---- 8< ---- With this e.g. zodbtools is finally ported to Python3 easily[10]. One note is that we change `b` and `u` to return `bstr`/`ustr` instead of `bytes`/`unicode`. This is change in behaviour, but I hope it won't break anything. The reason for this is that now-returned `bstr` and `ustr` are meant to be drop-in replacements for standard string types, and that there are not many existing `b` and `u` users. We just need to make sure that the places, that already use `b` and `u` continue to work. Those include Zodbtools, Nxdtest[11], and lonet[12], which should continue to work ok. @klaus, you once said that you use `b` and `u` somewhere as well. Please do not hesitate to let me know if this change causes any issues for you, and we will, hopefully, try to find a solution. Kirill /cc @jerome, @klaus, @kazuhiko, @vpelletier, @yusei, @tatuya /reviewed-and-discussed-on !21 [1] zodbtools!12 [2] zodbtools!13 [3] zodbtools!16 [4] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1 [5] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20 [6] zodbtools!8 (comment 73726) [7] zodbtools!13 (comment 81646) [8] bcb95cd5 [9] edc7aaab [10] zodbtools@9861c136 [11] https://lab.nexedi.com/nexedi/nxdtest [12] https://lab.nexedi.com/kirr/go123/blob/master/xnet/lonet/__init__.py

Uniform UTF8-based approach to strings
Context: together with Jérome we've been struggling with porting Zodbtools to Python3 for several years. Despite several incremental attempts[1,2,3] we are not there yet with the main difficulty being backward compatibility breakage that Python3 did for bytes and unicode. During my last trial this spring, after I've tried once again to finish this porting and could not reach satisfactory result, I've finally decided to do something about this at the root of the cause: at the level of strings - where backward compatibility was broken - with the idea to fix everything once and for all. In 2018 in "Python 3 Losses: Nexedi Perspective"[4] and associated "cost overview"[5] Jean-Paul highlighted the problem of strings backward compatibility breakage, that Python 3 did, as the major one. In 2019 we had some conversations with Jérome about this topic as well[6,7]. In 2020 I've started to approach it with `b` and `u` that provide always-working conversion in between bytes and unicode[8], and via limited usage of custom bytes- and unicode- like types that are interoperable with both bytes and unicode simultaneously[9]. Today, with this work, I'm finally exposing those types for general usage, so that bytes/unicode problem could be handled automatically. The overview of the functionality is provided below: ---- 8< ---- Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with the idea to make working with byte- and unicode- strings easy and transparently interoperable: - `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` (*). - `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`. The conversion, in both encoding and decoding, never fails and never looses information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity even if bytes data is not valid UTF-8. Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity. Semantically `bstr` is array of bytes, while `ustr` is array of unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and unicode character correspondingly (+). However it is possible to yield unicode character when iterating `bstr` via `uiter`, and to yield byte character when iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of the time, and `ustr` only needs to be used for random access to string characters. See [Strings, bytes, runes and characters in Go](https://blog.golang.org/strings) for overview of this approach. Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce to `ustr`. When the coercion happens, `bytes` and `bytearray`, similarly to `bstr`, are also treated as UTF8-encoded strings. `bstr` and `ustr` are meant to be drop-in replacements for standard `str`/`unicode` classes. They support all methods of `str`/`unicode` and in particular their constructors accept arbitrary objects and either convert or stringify them. For cases when no stringification is desired, and one only wants to convert `bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer` interface (%), to Pygolang string, `b` and `u` provide way to make sure an object is either `bstr` or `ustr` correspondingly. Usage example: ```py s = b('привет') # s is bstr corresponding to UTF-8 encoding of 'привет'. s += ' мир' # s is b('привет мир') for c in uiter(s): # c will iterate through ... # [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')] # the following gives b('привет мир труд май') b('привет %s %s %s') % (u'мир', # raw unicode u'труд'.encode('utf-8'), # raw bytes u('май')) # ustr def f(s): s = u(s) # make sure s is ustr, decoding as UTF-8(^) if it was bstr, bytes, bytearray or buffer. ... # (^) the decoding never fails nor looses information. ``` (*) `unicode` on Python2, `str` on Python3. (+) ordinal of such byte and unicode character can be obtained via regular `ord`. For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal. (%) data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string. Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not. ---- 8< ---- With this e.g. zodbtools is finally ported to Python3 easily[10]. One note is that we change `b` and `u` to return `bstr`/`ustr` instead of `bytes`/`unicode`. This is change in behaviour, but I hope it won't break anything. The reason for this is that now-returned `bstr` and `ustr` are meant to be drop-in replacements for standard string types, and that there are not many existing `b` and `u` users. We just need to make sure that the places, that already use `b` and `u` continue to work. Those include Zodbtools, Nxdtest[11], and lonet[12], which should continue to work ok. @klaus, you once said that you use `b` and `u` somewhere as well. Please do not hesitate to let me know if this change causes any issues for you, and we will, hopefully, try to find a solution. Kirill /cc @jerome, @klaus, @kazuhiko, @vpelletier, @yusei, @tatuya /reviewed-and-discussed-on !21 [1] zodbtools!12 [2] zodbtools!13 [3] zodbtools!16 [4] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20/1 [5] https://www.nexedi.com/NXD-Presentation.Multicore.PyconFR.2018?portal_skin=CI_slideshow#/20 [6] zodbtools!8 (comment 73726) [7] zodbtools!13 (comment 81646) [8] bcb95cd5 [9] edc7aaab [10] zodbtools@9861c136 [11] https://lab.nexedi.com/nexedi/nxdtest [12] https://lab.nexedi.com/kirr/go123/blob/master/xnet/lonet/__init__.py
50b3808c · Kirill Smelkov · f59a785d · 5bf08f8b · 50b3808c · 50b3808c
Commit 50b3808c authored Feb 20, 2025 by Kirill Smelkov
39 changed files
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -2,6 +2,9 @@ include COPYING README.rst CHANGELOG.rst tox.ini pyproject.toml trun .lsan-ignor
 include golang/libgolang.h
 include golang/runtime/libgolang.cpp
 include golang/runtime/libpyxruntime.cpp
+include golang/runtime/platform.h
+include golang/runtime.h
+include golang/runtime.cpp
 include golang/pyx/runtime.h
 include golang/pyx/testprog/golang_dso_user/dsouser/dso.h
 include golang/pyx/testprog/golang_dso_user/dsouser/dso.cpp

--- a/README.rst
+++ b/README.rst
@@ -10,7 +10,7 @@ Package `golang` provides Go-like features for Python:
 - `func` allows to define methods separate from class.
 - `defer` allows to schedule a cleanup from the main control flow.
 - `error` and package `errors` provide error chaining.
- `b` and `u` provide way to make sure an object is either bytes or unicode.
+- `b`, `u` and `bstr`/`ustr` provide uniform UTF8-based approach to strings.
 - `gimport` allows to import python modules by full path in a Go workspace.

 Package `golang.pyx` provides__ similar features for Cython/nogil.
@@ -229,19 +229,64 @@ __ https://www.python.org/dev/peps/pep-3134/
 Strings
 -------

-`b` and `u` provide way to make sure an object is either bytes or unicode.
-`b(obj)` converts str/unicode/bytes obj to UTF-8 encoded bytestring, while
-`u(obj)` converts str/unicode/bytes obj to unicode string. For example::
+Pygolang, similarly to Go, provides uniform UTF8-based approach to strings with
+the idea to make working with byte- and unicode- strings easy and transparently
+interoperable:

-   b("привет мир")   # -> gives bytes corresponding to UTF-8 encoding of "привет мир".
+- `bstr` is byte-string: it is based on `bytes` and can automatically convert to/from `unicode` [*]_.
+- `ustr` is unicode-string: it is based on `unicode` and can automatically convert to/from `bytes`.

-   def f(s):
-      s = u(s)       # make sure s is unicode, decoding as UTF-8(*) if it was bytes.
-      ...            # (*) but see below about lack of decode errors.
+The conversion, in both encoding and decoding, never fails and never looses
+information: `bstr→ustr→bstr` and `ustr→bstr→ustr` are always identity
+even if bytes data is not valid UTF-8.
+
+Both `bstr` and `ustr` represent stings. They are two different *representations* of the same entity.
+
+Semantically `bstr` is array of bytes, while `ustr` is array of
+unicode-characters. Accessing their elements by `[index]` and iterating them yield byte and
+unicode character correspondingly [*]_. However it is possible to yield unicode
+character when iterating `bstr` via `uiter`, and to yield byte character when
+iterating `ustr` via `biter`. In practice `bstr` + `uiter` is enough 99% of
+the time, and `ustr` only needs to be used for random access to string
+characters.  See `Strings, bytes, runes and characters in Go`__ for overview of
+this approach.
+
+__ https://blog.golang.org/strings
+
+Operations in between `bstr` and `ustr`/`unicode` / `bytes`/`bytearray` coerce to `bstr`, while
+operations in between `ustr` and `bstr`/`bytes`/`bytearray` / `unicode` coerce
+to `ustr`.  When the coercion happens, `bytes` and `bytearray`, similarly to
+`bstr`, are also treated as UTF8-encoded strings.

-The conversion in both encoding and decoding never fails and never looses
-information: `b(u(·))` and `u(b(·))` are always identity for bytes and unicode
-correspondingly, even if bytes input is not valid UTF-8.
+`bstr` and `ustr` are meant to be drop-in replacements for standard
+`str`/`unicode` classes. They support all methods of `str`/`unicode` and in
+particular their constructors accept arbitrary objects and either convert or stringify them. For
+cases when no stringification is desired, and one only wants to convert
+`bstr`/`ustr` / `unicode`/`bytes`/`bytearray`, or an object with `buffer`
+interface [*]_, to Pygolang string, `b` and `u` provide way to make sure an
+object is either `bstr` or `ustr` correspondingly.
+
+Usage example::
+
+   s  = b('привет')     # s is bstr corresponding to UTF-8 encoding of 'привет'.
+   s += ' мир'          # s is b('привет мир')
+   for c in uiter(s):   # c will iterate through
+        ...             #     [u(_) for _ in ('п','р','и','в','е','т',' ','м','и','р')]
+
+   # the following gives b('привет мир труд май')
+   b('привет %s %s %s') % (u'мир',                  # raw unicode
+                           u'труд'.encode('utf-8'), # raw bytes
+                           u('май'))                # ustr
+
+   def f(s):
+      s = u(s)          # make sure s is ustr, decoding as UTF-8(*) if it was bstr, bytes, bytearray or buffer.
+      ...               # (*) the decoding never fails nor looses information.
+
+.. [*] `unicode` on Python2, `str` on Python3.
+.. [*] | ordinal of such byte and unicode character can be obtained via regular `ord`.
+       | For completeness `bbyte` and `uchr` are also provided for constructing 1-byte `bstr` and 1-character `ustr` from ordinal.
+.. [*] | data in buffer, similarly to `bytes` and `bytearray`, is treated as UTF8-encoded string.
+       | Notice that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not.


 Import

--- a/golang/.gitignore
+++ b/golang/.gitignore
@@ -9,6 +9,7 @@
 /_io.cpp
 /_os.cpp
 /_os_test.cpp
+/_strconv.cpp
 /_strings_test.cpp
 /_sync.cpp
 /_sync_test.cpp

--- a/golang/__init__.py
+++ b/golang/__init__.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2024  Nexedi SA and Contributors.
+# Copyright (C) 2018-2025  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -24,7 +24,7 @@
 - `func` allows to define methods separate from class.
 - `defer` allows to schedule a cleanup from the main control flow.
 - `error` and package `errors` provide error chaining.
- `b` and `u` provide way to make sure an object is either bytes or unicode.
+- `b`, `u`, `bstr`/`ustr` and `biter`/`uiter` provide uniform UTF8-based approach to strings.
 - `gimport` allows to import python modules by full path in a Go workspace.

 See README for thorough overview.
@@ -36,7 +36,8 @@ from __future__ import print_function, absolute_import
 __version__ = "0.1"

 __all__ = ['go', 'chan', 'select', 'default', 'nilchan', 'defer', 'panic',
-           'recover', 'func', 'error', 'b', 'u', 'gimport']
+           'recover', 'func', 'error', 'b', 'u', 'bstr', 'ustr', 'biter', 'uiter', 'bbyte', 'uchr',
+           'gimport']

 import setuptools_dso
 setuptools_dso.dylink_prepare_dso('golang.runtime.libgolang')
@@ -369,12 +370,11 @@ from ._golang import    \
    pypanic     as panic,   \
    pyerror     as error,   \
    pyb         as b,       \
-    pyu         as u
-
-# import golang.strconv into _golang from here to workaround cyclic golang ↔ strconv dependency
-def _():
-    from . import _golang
-    from . import strconv
-    _golang.pystrconv = strconv
-_()
-del _
+    pybstr      as bstr,    \
+    pybbyte     as bbyte,   \
+    pyu         as u,       \
+    pyustr      as ustr,    \
+    pyuchr      as uchr,    \
+    pybiter     as biter,   \
+    pyuiter     as uiter,   \
+    _butf8b
--- a/golang/_golang.pxd
+++ b/golang/_golang.pxd
 # cython: language_level=2
-# Copyright (C) 2019-2022  Nexedi SA and Contributors.
+# Copyright (C) 2019-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -43,6 +43,7 @@ In addition to Cython/nogil API, golang.pyx provides runtime for golang.py:
 - Python-level channels are represented by pychan + pyselect.
 - Python-level error is represented by pyerror.
 - Python-level panic is represented by pypanic.
+- Python-level strings are represented by pybstr/pyustr and pyb/pyu.
 """


@@ -64,6 +65,9 @@ cdef extern from *:
 # on the edge of Python/nogil world.
 from libcpp.string cimport string  # golang::string = std::string
 cdef extern from "golang/libgolang.h" namespace "golang" nogil:
+    ctypedef unsigned char  byte
+    ctypedef signed int     rune  # = int32
+
    void panic(const char *)
    const char *recover()

@@ -265,4 +269,11 @@ cdef class pyerror(Exception):
    cdef object from_error (error err) # -> pyerror | None


+# strings
+cpdef pyb(s) # -> bstr
+cpdef pyu(s) # -> ustr
 cdef __pystr(object obj)
+
+
+cdef (rune, int) _utf8_decode_rune(const byte[::1] s)
+cdef unicode _xunichr(rune i)
--- a/golang/_golang.pyx
+++ b/golang/_golang.pyx
@@ -3,7 +3,7 @@
 # cython: binding=False
 # cython: c_string_type=str, c_string_encoding=utf8
 # distutils: language = c++
-# distutils: depends = libgolang.h os/signal.h _golang_str.pyx
+# distutils: depends = libgolang.h os/signal.h unicode/utf8.h _golang_str.pyx _golang_str_pickle.pyx
 #
 # Copyright (C) 2018-2024  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>

--- a/golang/_golang_str.pyx
+++ b/golang/_golang_str.pyx
--- a/golang/_golang_str_pickle.pyx
+++ b/golang/_golang_str_pickle.pyx
+# -*- coding: utf-8 -*-
+# Copyright (C) 2023-2025  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""_golang_str_pickle.pyx complements _golang_str.pyx and keeps everything
+related to pickling strings.
+
+It is included from _golang_str.pyx .
+"""
+
+if PY_MAJOR_VERSION >= 3:
+    import copyreg as pycopyreg
+else:
+    import copy_reg as pycopyreg
+
+cdef object zbinary  # = zodbpickle.binary | None
+try:
+    import zodbpickle
+except ImportError:
+    zbinary = None
+else:
+    zbinary = zodbpickle.binary
+
+
+# support for pickling bstr/ustr as standalone types.
+#
+# pickling is organized in such a way that
+# - what is saved by py2 can be loaded correctly on both py2/py3,  and similarly
+# - what is saved by py3 can be loaded correctly on both py2/py3   as well.
+cdef _bstr__reduce_ex__(self, protocol):
+    # Ideally we want to emit bstr(BYTES), but BYTES is not available for
+    # protocol < 3. And for protocol < 3 emitting bstr(STRING) is not an
+    # option because plain py3 raises UnicodeDecodeError on loading arbitrary
+    # STRING data. However emitting bstr(UNICODE) works universally because
+    # pickle supports arbitrary unicode - including invalid unicode - out of
+    # the box and in exactly the same way on both py2 and py3. For the
+    # reference upstream py3 uses surrogatepass on encode/decode UNICODE data
+    # to achieve that.
+    if protocol < 3:
+        # use UNICODE for data
+        #
+        # explicitly mark to unpickle via _butf8b because with the introduction
+        # of UTF-8bk the way bstr decodes unicode will change, and so if we
+        # would use `bstr UNICODE` for pickling it will result in corrupt data
+        # to be loaded after the switch to UTF-8bk.
+        #
+        # TODO pickle via bstr UNICODE REDUCE/NEWOBJ after switch from UTF-8b to UTF-8bk.
+        udata = _utf8_decode_surrogateescape(self)
+        if self.__class__ is pybstr:
+            return (_butf8b,                    # _butf8b UNICODE REDUCE
+                    (udata,))
+        else:
+            return (_butf8b,                    # _butf8b bstr UNICODE REDUCE
+                    (self.__class__, udata))
+    else:
+        # use BYTES for data
+        bdata = _bdata(self)
+        if PY_MAJOR_VERSION < 3:
+            # the only way we can get here on py2 and protocol >= 3 is zodbpickle
+            # -> similarly to py3 save bdata as BYTES
+            assert zbinary is not None
+            bdata = zbinary(bdata)
+        return (
+            pycopyreg.__newobj__,               # bstr BYTES   NEWOBJ
+            (self.__class__, bdata))
+
+cdef _ustr__reduce_ex__(self, protocol):
+    # emit ustr(UNICODE).
+    # TODO after UTF-8bk we might want to switch to emitting ustr(BYTES)
+    #      even if we do this, it should be backward compatible
+    if protocol < 2:
+        return (self.__class__, (_udata(self),))# ustr UNICODE REDUCE
+    else:
+        return (pycopyreg.__newobj__,           # ustr UNICODE NEWOBJ
+                (self.__class__, _udata(self)))
+
+# `_butf8b [bcls] udata` serves unpickling of bstr pickled with data
+# represented via UTF-8b decoded unicode.
+def _butf8b(*argv):
+    cdef object bcls = pybstr
+    cdef object udata
+    cdef int l = len(argv)
+    if l == 1:
+        udata = argv[0]
+    elif l == 2:
+        bcls, udata = argv
+    else:
+        raise TypeError("_butf8b() takes 1 or 2 arguments; %d given" % l)
+    return _pyb(bcls, _utf8_encode_surrogateescape(udata))
+_butf8b.__module__ = "golang"
--- a/golang/_strconv.pxd
+++ b/golang/_strconv.pxd
+# -*- coding: utf-8 -*-
+# cython: language_level=2
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package strconv provides Go-compatible string conversions."""
+
+from golang cimport byte
+
+cpdef pyquote(s)
+cdef bytes _quote(const byte[::1] s, char quote, bint* out_nonascii_escape) # -> (quoted, nonascii_escape)
--- a/golang/_strconv.pyx
+++ b/golang/_strconv.pyx
+# -*- coding: utf-8 -*-
+# cython: language_level=2
+# Copyright (C) 2018-2024  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""_strconv.pyx implements strconv.pyx - see _strconv.pxd for package overview."""
+
+from __future__ import print_function, absolute_import
+
+import unicodedata, codecs
+
+from golang cimport pyb, byte, rune
+from golang cimport _utf8_decode_rune, _xunichr
+from golang.unicode cimport utf8
+
+from cpython cimport PyObject, _PyBytes_Resize
+
+cdef extern from "Python.h":
+    PyObject* PyBytes_FromStringAndSize(char*, Py_ssize_t) except NULL
+    char* PyBytes_AS_STRING(PyObject*)
+    void Py_DECREF(PyObject*)
+
+
+# quote quotes unicode|bytes string into valid "..." bytestring always quoted with ".
+cpdef pyquote(s):  # -> bstr
+    cdef bint _
+    q = _quote(pyb(s), '"', &_)
+    return pyb(q)
+
+
+cdef char[16] hexdigit # = '0123456789abcdef'
+for i, c in enumerate('0123456789abcdef'):
+    hexdigit[i] = ord(c)
+
+
+# XXX not possible to use `except (NULL, False)`
+#     (https://stackoverflow.com/a/66335433/9456786)
+cdef bytes _quote(const byte[::1] s, char quote, bint* out_nonascii_escape): # -> (quoted, nonascii_escape)
+    # 2*" + max(4)*each byte (+ 1 for tail \0 implicitly by PyBytesObject)
+    cdef Py_ssize_t qmaxsize = 1 + 4*len(s) + 1
+    cdef PyObject*  qout     = PyBytes_FromStringAndSize(NULL, qmaxsize)
+    cdef byte*      q        = <byte*>PyBytes_AS_STRING(qout)
+
+    cdef bint nonascii_escape = False
+    cdef Py_ssize_t i = 0, j
+    cdef Py_ssize_t isize
+    cdef int size
+    cdef rune r
+    cdef byte c
+    q[0] = quote;  q += 1
+    while i < len(s):
+        c = s[i]
+        # fast path - ASCII only
+        if c < 0x80:
+            if c in (ord('\\'), quote):
+                q[0] = ord('\\')
+                q[1] = c
+                q += 2
+
+            # printable ASCII
+            elif 0x20 <= c <= 0x7e:
+                q[0] = c
+                q += 1
+
+            # non-printable ASCII
+            elif c == ord('\t'):
+                q[0] = ord('\\')
+                q[1] = ord('t')
+                q += 2
+            elif c == ord('\n'):
+                q[0] = ord('\\')
+                q[1] = ord('n')
+                q += 2
+            elif c == ord('\r'):
+                q[0] = ord('\\')
+                q[1] = ord('r')
+                q += 2
+
+            # everything else is non-printable
+            else:
+                q[0] = ord('\\')
+                q[1] = ord('x')
+                q[2] = hexdigit[c >> 4]
+                q[3] = hexdigit[c & 0xf]
+                q += 4
+
+            i += 1
+
+        # slow path - full UTF-8 decoding + unicodedata
+        else:
+            r, size = _utf8_decode_rune(s[i:])
+            isize = i + size
+
+            # decode error - just emit raw byte as escaped
+            if r == utf8.RuneError  and  size == 1:
+                nonascii_escape = True
+                q[0] = ord('\\')
+                q[1] = ord('x')
+                q[2] = hexdigit[c >> 4]
+                q[3] = hexdigit[c & 0xf]
+                q += 4
+
+            # printable utf-8 characters go as is
+            elif _unicodedata_category(_xunichr(r))[0] in 'LNPS': # letters, numbers, punctuation, symbols
+                for j in range(i, isize):
+                    q[0] = s[j]
+                    q += 1
+
+            # everything else goes in numeric byte escapes
+            else:
+                nonascii_escape = True
+                for j in range(i, isize):
+                    c = s[j]
+                    q[0] = ord('\\')
+                    q[1] = ord('x')
+                    q[2] = hexdigit[c >> 4]
+                    q[3] = hexdigit[c & 0xf]
+                    q += 4
+
+            i = isize
+
+    q[0] = quote;  q += 1
+    q[0] = 0;      # don't q++ at last because size does not include tail \0
+    cdef Py_ssize_t qsize = (q - <byte*>PyBytes_AS_STRING(qout))
+    assert qsize <= qmaxsize
+    _PyBytes_Resize(&qout, qsize)
+
+    bqout = <bytes>qout
+    Py_DECREF(qout)
+    out_nonascii_escape[0] = nonascii_escape
+    return bqout
+
+
+# unquote decodes "-quoted unicode|byte string.
+#
+# ValueError is raised if there are quoting syntax errors.
+def pyunquote(s):  # -> bstr
+    us, tail = pyunquote_next(s)
+    if len(tail) != 0:
+        raise ValueError('non-empty tail after closing "')
+    return us
+
+# unquote_next decodes next "-quoted unicode|byte string.
+#
+# it returns -> (unquoted(s), tail-after-")
+#
+# ValueError is raised if there are quoting syntax errors.
+def pyunquote_next(s):  # -> (bstr, bstr)
+    us, tail = _unquote_next(pyb(s))
+    return pyb(us), pyb(tail)
+
+cdef _unquote_next(s):
+    assert isinstance(s, bytes)
+
+    if len(s) == 0 or s[0:0+1] != b'"':
+        raise ValueError('no starting "')
+
+    outv = []
+    emit= outv.append
+
+    s = s[1:]
+    while 1:
+        r, width = _utf8_decode_rune(s)
+        if width == 0:
+            raise ValueError('no closing "')
+
+        if r == ord('"'):
+            s = s[1:]
+            break
+
+        # regular UTF-8 character
+        if r != ord('\\'):
+            emit(s[:width])
+            s = s[width:]
+            continue
+
+        if len(s) < 2:
+            raise ValueError('unexpected EOL after \\')
+
+        c = s[1:1+1]
+
+        # \<c> -> <c>   ; c = \ "
+        if c in b'\\"':
+            emit(c)
+            s = s[2:]
+            continue
+
+        # \t \n \r
+        uc = None
+        if   c == b't':  uc = b'\t'
+        elif c == b'n':  uc = b'\n'
+        elif c == b'r':  uc = b'\r'
+        # accept also \a \b \v \f that Go might produce
+        # Python also decodes those escapes even though it does not produce them:
+        # https://github.com/python/cpython/blob/2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L677-L688
+        elif c == b'a':  uc = b'\x07'
+        elif c == b'b':  uc = b'\x08'
+        elif c == b'v':  uc = b'\x0b'
+        elif c == b'f':  uc = b'\x0c'
+
+        if uc is not None:
+            emit(uc)
+            s = s[2:]
+            continue
+
+        # \x?? hex
+        if c == b'x':   # XXX also handle octals?
+            if len(s) < 2+2:
+                raise ValueError('unexpected EOL after \\x')
+
+            b = codecs.decode(s[2:2+2], 'hex')
+            emit(b)
+            s = s[2+2:]
+            continue
+
+        raise ValueError('invalid escape \\%s' % chr(ord(c[0:0+1])))
+
+    return b''.join(outv), s
+
+
+cdef _unicodedata_category = unicodedata.category
--- a/golang/fmt.h
+++ b/golang/fmt.h
 #ifndef _NXD_LIBGOLANG_FMT_H
 #define _NXD_LIBGOLANG_FMT_H

-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -111,7 +111,7 @@ inline error errorf(const string& format, Argv... argv) {
 // `const char *` overloads just to catch format mistakes as
 // __attribute__(format) does not work with std::string.
 LIBGOLANG_API string sprintf(const char *format, ...)
-#ifndef _MSC_VER
+#ifndef LIBGOLANG_CC_msc
                                __attribute__ ((format (printf, 1, 2)))
 #endif
 	;

--- a/golang/golang_str_pickle_test.py
+++ b/golang/golang_str_pickle_test.py
--- a/golang/golang_str_test.py
+++ b/golang/golang_str_test.py
--- a/golang/libgolang.h
+++ b/golang/libgolang.h
@@ -169,6 +169,8 @@
 // [1] Libtask: a Coroutine Library for C and Unix. https://swtch.com/libtask.
 // [2] http://9p.io/magic/man2html/2/thread.

+#include "golang/runtime/platform.h"
+
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
@@ -177,21 +179,18 @@
 #include <sys/stat.h>

 #include <fcntl.h>
-#ifdef _MSC_VER // no mode_t on msvc
+#ifdef LIBGOLANG_CC_msc // no mode_t on msvc
 typedef int mode_t;
 #endif


 // DSO symbols visibility (based on https://gcc.gnu.org/wiki/Visibility)
-#if defined _WIN32 || defined __CYGWIN__
+#ifdef LIBGOLANG_OS_windows
  #define LIBGOLANG_DSO_EXPORT __declspec(dllexport)
  #define LIBGOLANG_DSO_IMPORT __declspec(dllimport)
-#elif __GNUC__ >= 4
+#else
  #define LIBGOLANG_DSO_EXPORT __attribute__ ((visibility ("default")))
  #define LIBGOLANG_DSO_IMPORT __attribute__ ((visibility ("default")))
-#else
-  #define LIBGOLANG_DSO_EXPORT
-  #define LIBGOLANG_DSO_IMPORT
 #endif

 #if BUILDING_LIBGOLANG
@@ -438,6 +437,10 @@ constexpr Nil nil = nullptr;
 // string is alias for std::string.
 using string = std::string;

+// byte/rune types related to string.
+using byte = uint8_t;
+using rune = int32_t;
+
 // func is alias for std::function.
 template<typename F>
 using func = std::function<F>;

--- a/golang/os.cpp
+++ b/golang/os.cpp
-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -38,7 +38,7 @@
 // cut this short
 // (on darwing sys_siglist declaration is normally provided)
 // (on windows sys_siglist is not available at all)
-#if !(defined(__APPLE__) || defined(_WIN32))
+#if !(defined(LIBGOLANG_OS_darwin) || defined(LIBGOLANG_OS_windows))
 extern "C" {
    extern const char * const sys_siglist[];
 }
@@ -287,7 +287,7 @@ string Signal::String() const {
    const Signal& sig = *this;
    const char *sigstr = nil;

-#ifdef _WIN32
+#ifdef LIBGOLANG_OS_windows
    switch (sig.signo) {
    case SIGABRT:   return "Aborted";
    case SIGBREAK:  return "Break";

--- a/golang/os.h
+++ b/golang/os.h
 #ifndef _NXD_LIBGOLANG_OS_H
 #define _NXD_LIBGOLANG_OS_H
 //
-// Copyright (C) 2019-2023  Nexedi SA and Contributors.
+// Copyright (C) 2019-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -96,7 +96,7 @@ private:
 // Open opens file @path.
 LIBGOLANG_API std::tuple<File, error> Open(const string &path, int flags = O_RDONLY,
        mode_t mode =
-#if !defined(_MSC_VER)
+#if !defined(LIBGOLANG_CC_msc)
                      S_IRUSR | S_IWUSR | S_IXUSR |
                      S_IRGRP | S_IWGRP | S_IXGRP |
                      S_IROTH | S_IWOTH | S_IXOTH

--- a/golang/os/signal.cpp
+++ b/golang/os/signal.cpp
-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -89,7 +89,7 @@
 #include <atomic>
 #include <tuple>

-#if defined(_WIN32)
+#if defined(LIBGOLANG_OS_windows)
 # include <windows.h>
 #endif

@@ -101,7 +101,7 @@
 #  define debugf(format, ...) do {} while (0)
 #endif

-#if defined(_MSC_VER)
+#ifdef LIBGOLANG_CC_msc
 # define HAVE_SIGACTION 0
 #else
 # define HAVE_SIGACTION 1
@@ -194,7 +194,7 @@ void _init() {
    if (err != nil)
        panic("os::newFile(_wakerx");
    _waketx = vfd[1];
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
    if (sys::Fcntl(_waketx, F_SETFL, O_NONBLOCK) < 0)
        panic("fcntl(_waketx, O_NONBLOCK)");    // TODO +syserr
 #else

--- a/golang/pyx/build.py
+++ b/golang/pyx/build.py
-# Copyright (C) 2019-2023  Nexedi SA and Contributors.
+# Copyright (C) 2019-2024  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -212,9 +212,11 @@ def _with_build_defaults(name, kw):   # -> (pygo, kw')
    dependv = kw.get('depends', [])[:]
    dependv.extend(['%s/golang/%s' % (pygo, _) for _ in [
        'libgolang.h',
+        'runtime.h',
        'runtime/internal.h',
        'runtime/internal/atomic.h',
        'runtime/internal/syscall.h',
+        'runtime/platform.h',
        'context.h',
        'cxx.h',
        'errors.h',
@@ -226,6 +228,7 @@ def _with_build_defaults(name, kw):   # -> (pygo, kw')
        'os.h',
        'os/signal.h',
        'pyx/runtime.h',
+        'unicode/utf8.h',
        '_testing.h',
        '_compat/windows/strings.h',
        '_compat/windows/unistd.h',
@@ -264,6 +267,8 @@ def Extension(name, sources, **kw):
        '_fmt.pxd',
        'io.pxd',
        '_io.pxd',
+        'strconv.pxd',
+        '_strconv.pxd',
        'strings.pxd',
        'sync.pxd',
        '_sync.pxd',
@@ -274,6 +279,8 @@ def Extension(name, sources, **kw):
        'os/signal.pxd',
        'os/_signal.pxd',
        'pyx/runtime.pxd',
+        'unicode/utf8.pxd',
+        'unicode/_utf8.pxd',
    ]])
    kw['depends'] = dependv


--- a/golang/runtime.cpp
+++ b/golang/runtime.cpp
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// Package runtime mirrors Go package runtime.
+// See runtime.h for package overview.
+
+#include "golang/runtime.h"
+
+
+// golang::runtime::
+namespace golang {
+namespace runtime {
+
+const string OS =
+#ifdef LIBGOLANG_OS_linux
+    "linux"
+#elif defined(LIBGOLANG_OS_darwin)
+    "darwin"
+#elif defined(LIBGOLANG_OS_windows)
+    "windows"
+#else
+# error
+#endif
+    ;
+
+
+const string CC =
+#ifdef LIBGOLANG_CC_gcc
+    "gcc"
+#elif defined(LIBGOLANG_CC_clang)
+    "clang"
+#elif defined(LIBGOLANG_CC_msc)
+    "msc"
+#else
+# error
+#endif
+    ;
+
+
+}}  // golang::runtime::
--- a/golang/runtime.h
+++ b/golang/runtime.h
+#ifndef _NXD_LIBGOLANG_RUNTIME_H
+#define _NXD_LIBGOLANG_RUNTIME_H
+
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// Package runtime mirrors Go package runtime.
+
+#include "golang/libgolang.h"
+
+
+// golang::runtime::
+namespace golang {
+namespace runtime {
+
+// OS indicates operating system, that is running the program.
+//
+// e.g. "linux", "darwin", "windows", ...
+extern LIBGOLANG_API const string OS;
+
+// CC indicates C/C++ compiler, that compiled the program.
+//
+// e.g. "gcc", "clang", "msc", ...
+extern LIBGOLANG_API const string CC;
+
+
+}} // golang::runtime::
+
+#endif  // _NXD_LIBGOLANG_RUNTIME_H
--- a/golang/runtime/_runtime_gevent.pyx
+++ b/golang/runtime/_runtime_gevent.pyx
@@ -40,7 +40,7 @@ ELSE:

 from gevent import sleep as pygsleep

-from libc.stdint cimport uint8_t, uint64_t, UINT64_MAX
+from libc.stdint cimport uint64_t, UINT64_MAX
 cdef extern from *:
    ctypedef bint cbool "bool"

@@ -52,7 +52,7 @@ from golang.runtime._libgolang cimport _libgolang_runtime_ops, _libgolang_sema,
 from golang.runtime.internal cimport syscall
 from golang.runtime cimport _runtime_thread
 from golang.runtime._runtime_pymisc cimport PyExc, pyexc_fetch, pyexc_restore
-from golang cimport topyexc
+from golang cimport byte, topyexc

 from libc.stdlib cimport calloc, free
 from libc.errno  cimport EBADF
@@ -351,7 +351,7 @@ cdef nogil:
 cdef:
    bint _io_read(IOH* ioh, int* out_n, void *buf, size_t count):
        pygfobj = <object>ioh.pygfobj
-        cdef uint8_t[::1] mem = <uint8_t[:count]>buf
+        cdef byte[::1] mem = <byte[:count]>buf
        xmem = memoryview(mem) # to avoid https://github.com/cython/cython/issues/3900 on mem[:0]=b''
        try:
            # NOTE buf might be on stack, so it must not be accessed, e.g. from
@@ -388,7 +388,7 @@ cdef nogil:
 cdef:
    bint _io_write(IOH* ioh, int* out_n, const void *buf, size_t count):
        pygfobj = <object>ioh.pygfobj
-        cdef const uint8_t[::1] mem = <const uint8_t[:count]>buf
+        cdef const byte[::1] mem = <const byte[:count]>buf

        # NOTE buf might be on stack, so it must not be accessed, e.g. from
        # FileObjectThread, while our greenlet is parked (see STACK_DEAD_WHILE_PARKED

--- a/golang/runtime/internal/atomic.cpp
+++ b/golang/runtime/internal/atomic.cpp
-// Copyright (C) 2022-2023  Nexedi SA and Contributors.
+// Copyright (C) 2022-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -20,7 +20,7 @@
 #include "golang/runtime/internal/atomic.h"
 #include "golang/libgolang.h"

-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 #include <pthread.h>
 #endif

@@ -44,7 +44,7 @@ static void _forkNewEpoch() {

 void _init() {
 // there is no fork on windows
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
    int e = pthread_atfork(/*prepare*/nil, /*inparent*/nil, /*inchild*/_forkNewEpoch);
    if (e != 0)
        panic("pthread_atfork failed");

--- a/golang/runtime/internal/syscall.cpp
+++ b/golang/runtime/internal/syscall.cpp
-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -58,9 +58,9 @@ string _Errno::Error() {

    char ebuf[128];
    bool ok;
-#if __APPLE__
+#ifdef LIBGOLANG_OS_darwin
    ok = (::strerror_r(-e.syserr, ebuf, sizeof(ebuf)) == 0);
-#elif defined(_WIN32)
+#elif defined(LIBGOLANG_OS_windows)
    ok = (::strerror_s(ebuf, sizeof(ebuf), -e.syserr) == 0);
 #else
    char *estr = ::strerror_r(-e.syserr, ebuf, sizeof(ebuf));
@@ -102,7 +102,7 @@ __Errno Close(int fd) {
    return err;
 }

-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 __Errno Fcntl(int fd, int cmd, int arg) {
    int save_errno = errno;
    int err = ::fcntl(fd, cmd, arg);
@@ -124,7 +124,7 @@ __Errno Fstat(int fd, struct ::stat *out_st) {

 int Open(const char *path, int flags, mode_t mode) {
    int save_errno = errno;
-#ifdef _WIN32  // default to open files in binary mode
+#ifdef LIBGOLANG_OS_windows  // default to open files in binary mode
    if ((flags & (_O_TEXT | _O_BINARY)) == 0)
        flags |= _O_BINARY;
 #endif
@@ -141,9 +141,9 @@ __Errno Pipe(int vfd[2], int flags) {
        return -EINVAL;
    int save_errno = errno;
    int err;
-#ifdef __linux__
+#ifdef LIBGOLANG_OS_linux
    err = ::pipe2(vfd, flags);
-#elif defined(_WIN32)
+#elif defined(LIBGOLANG_OS_windows)
    err = ::_pipe(vfd, 4096, flags | _O_BINARY);
 #else
    err = ::pipe(vfd);
@@ -167,7 +167,7 @@ out:
    return err;
 }

-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 __Errno Sigaction(int signo, const struct ::sigaction *act, struct ::sigaction *oldact) {
    int save_errno = errno;
    int err = ::sigaction(signo, act, oldact);

--- a/golang/runtime/internal/syscall.h
+++ b/golang/runtime/internal/syscall.h
 #ifndef _NXD_LIBGOLANG_RUNTIME_INTERNAL_SYSCALL_H
 #define _NXD_LIBGOLANG_RUNTIME_INTERNAL_SYSCALL_H

-// Copyright (C) 2021-2023  Nexedi SA and Contributors.
+// Copyright (C) 2021-2024  Nexedi SA and Contributors.
 //                          Kirill Smelkov <kirr@nexedi.com>
 //
 // This program is free software: you can Use, Study, Modify and Redistribute
@@ -63,13 +63,13 @@ LIBGOLANG_API int/*n|err*/ Read(int fd, void *buf, size_t count);
 LIBGOLANG_API int/*n|err*/ Write(int fd, const void *buf, size_t count);

 LIBGOLANG_API __Errno Close(int fd);
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 LIBGOLANG_API __Errno Fcntl(int fd, int cmd, int arg);
 #endif
 LIBGOLANG_API __Errno Fstat(int fd, struct ::stat *out_st);
 LIBGOLANG_API int/*fd|err*/ Open(const char *path, int flags, mode_t mode);
 LIBGOLANG_API __Errno Pipe(int vfd[2], int flags);
-#ifndef _WIN32
+#ifndef LIBGOLANG_OS_windows
 LIBGOLANG_API __Errno Sigaction(int signo, const struct ::sigaction *act, struct ::sigaction *oldact);
 #endif
 typedef void (*sighandler_t)(int);

--- a/golang/runtime/libgolang.cpp
+++ b/golang/runtime/libgolang.cpp
@@ -52,7 +52,7 @@
 #include <linux/list.h>
 // MSVC does not support statement expressions and typeof
 // -> redo list_entry via C++ lambda.
-#ifdef _MSC_VER
+#ifdef LIBGOLANG_CC_msc
 # undef list_entry
 # define list_entry(ptr, type, member) [&]() {                      \
        const decltype( ((type *)0)->member ) *__mptr = (ptr);      \

--- a/golang/runtime/platform.h
+++ b/golang/runtime/platform.h
+#ifndef _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
+#define _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
+
+// Copyright (C) 2023-2024  Nexedi SA and Contributors.
+//                          Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// Header platform.h provides preprocessor defines that describe target platform.
+
+// LIBGOLANG_OS_<X> is defined on operating system X.
+//
+// List of supported operating systems: linux, darwin, windows.
+#ifdef __linux__
+# define LIBGOLANG_OS_linux     1
+#elif defined(__APPLE__)
+# define LIBGOLANG_OS_darwin    1
+#elif defined(_WIN32) || defined(__CYGWIN__)
+# define LIBGOLANG_OS_windows   1
+#else
+# error "unsupported operating system"
+#endif
+
+// LIBGOLANG_CC_<X> is defined on C/C++ compiler X.
+//
+// List of supported compilers: gcc, clang, msc.
+#ifdef __clang__
+# define LIBGOLANG_CC_clang     1
+#elif defined(_MSC_VER)
+# define LIBGOLANG_CC_msc       1
+// NOTE gcc comes last because e.g. clang and icc define __GNUC__ as well
+#elif __GNUC__
+# define LIBGOLANG_CC_gcc       1
+#else
+# error "unsupported compiler"
+#endif
+
+#endif  // _NXD_LIBGOLANG_RUNTIME_PLATFORM_H
--- a/golang/strconv.pxd
+++ b/golang/strconv.pxd
+# cython: language_level=2
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package strconv provides Go-compatible string conversions.
+
+See _strconv.pxd for package documentation.
+"""
+
+# redirect cimport: golang.strconv -> golang._strconv (see __init__.pxd for rationale)
+from golang._strconv cimport *
--- a/golang/strconv.py
+++ b/golang/strconv.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2022  Nexedi SA and Contributors.
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -21,174 +21,7 @@

 from __future__ import print_function, absolute_import

-import unicodedata, codecs
-from six import text_type as unicode        # py2: unicode      py3: str
-from six.moves import range as xrange
-
-from golang import b, u
-from golang._golang import _py_utf8_decode_rune as _utf8_decode_rune, _py_rune_error as _rune_error, _xunichr
-
-
-# _bstr is like b but also returns whether input was unicode.
-def _bstr(s):   # -> sbytes, wasunicode
-    return b(s), isinstance(s, unicode)
-
-# _ustr is like u but also returns whether input was bytes.
-def _ustr(s):   # -> sunicode, wasbytes
-    return u(s), isinstance(s, bytes)
-
-
-# quote quotes unicode|bytes string into valid "..." unicode|bytes string always quoted with ".
-def quote(s):
-    s, wasunicode = _bstr(s)
-    qs = _quote(s)
-    if wasunicode:
-        qs, _ = _ustr(qs)
-    return qs
-
-def _quote(s):
-    assert isinstance(s, bytes)
-
-    outv = []
-    emit = outv.append
-    i = 0
-    while i < len(s):
-        c = s[i:i+1]
-        # fast path - ASCII only
-        if ord(c) < 0x80:
-            if c in b'\\"':
-                emit(b'\\'+c)
-
-            # printable ASCII
-            elif b' ' <= c <= b'\x7e':
-                emit(c)
-
-            # non-printable ASCII
-            elif c == b'\t':
-                emit(br'\t')
-            elif c == b'\n':
-                emit(br'\n')
-            elif c == b'\r':
-                emit(br'\r')
-
-            # everything else is non-printable
-            else:
-                emit(br'\x%02x' % ord(c))
-
-            i += 1
-
-        # slow path - full UTF-8 decoding + unicodedata
-        else:
-            r, size = _utf8_decode_rune(s[i:])
-            isize = i + size
-
-            # decode error - just emit raw byte as escaped
-            if r == _rune_error  and  size == 1:
-                emit(br'\x%02x' % ord(c))
-
-            # printable utf-8 characters go as is
-            elif unicodedata.category(_xunichr(r))[0] in _printable_cat0:
-                emit(s[i:isize])
-
-            # everything else goes in numeric byte escapes
-            else:
-                for j in xrange(i, isize):
-                    emit(br'\x%02x' % ord(s[j:j+1]))
-
-            i = isize
-
-    return b'"' + b''.join(outv) + b'"'
-
-
-# unquote decodes "-quoted unicode|byte string.
-#
-# ValueError is raised if there are quoting syntax errors.
-def unquote(s):
-    us, tail = unquote_next(s)
-    if len(tail) != 0:
-        raise ValueError('non-empty tail after closing "')
-    return us
-
-# unquote_next decodes next "-quoted unicode|byte string.
-#
-# it returns -> (unquoted(s), tail-after-")
-#
-# ValueError is raised if there are quoting syntax errors.
-def unquote_next(s):
-    s, wasunicode = _bstr(s)
-    us, tail = _unquote_next(s)
-    if wasunicode:
-        us, _   = _ustr(us)
-        tail, _ = _ustr(tail)
-    return us, tail
-
-def _unquote_next(s):
-    assert isinstance(s, bytes)
-
-    if len(s) == 0 or s[0:0+1] != b'"':
-        raise ValueError('no starting "')
-
-    outv = []
-    emit= outv.append
-
-    s = s[1:]
-    while 1:
-        r, width = _utf8_decode_rune(s)
-        if width == 0:
-            raise ValueError('no closing "')
-
-        if r == ord('"'):
-            s = s[1:]
-            break
-
-        # regular UTF-8 character
-        if r != ord('\\'):
-            emit(s[:width])
-            s = s[width:]
-            continue
-
-        if len(s) < 2:
-            raise ValueError('unexpected EOL after \\')
-
-        c = s[1:1+1]
-
-        # \<c> -> <c>   ; c = \ "
-        if c in b'\\"':
-            emit(c)
-            s = s[2:]
-            continue
-
-        # \t \n \r
-        uc = None
-        if   c == b't':  uc = b'\t'
-        elif c == b'n':  uc = b'\n'
-        elif c == b'r':  uc = b'\r'
-        # accept also \a \b \v \f that Go might produce
-        # Python also decodes those escapes even though it does not produce them:
-        # https://github.com/python/cpython/blob/2.7.18-0-g8d21aa21f2c/Objects/stringobject.c#L677-L688
-        elif c == b'a':  uc = b'\x07'
-        elif c == b'b':  uc = b'\x08'
-        elif c == b'v':  uc = b'\x0b'
-        elif c == b'f':  uc = b'\x0c'
-
-        if uc is not None:
-            emit(uc)
-            s = s[2:]
-            continue
-
-        # \x?? hex
-        if c == b'x':   # XXX also handle octals?
-            if len(s) < 2+2:
-                raise ValueError('unexpected EOL after \\x')
-
-            b = codecs.decode(s[2:2+2], 'hex')
-            emit(b)
-            s = s[2+2:]
-            continue
-
-        raise ValueError('invalid escape \\%s' % chr(ord(c[0:0+1])))
-
-    return b''.join(outv), s
-
-
-_printable_cat0 = frozenset(['L', 'N', 'P', 'S'])   # letters, numbers, punctuation, symbols
+from golang._strconv import \
+    pyquote             as quote,       \
+    pyunquote           as unquote,     \
+    pyunquote_next      as unquote_next
--- a/golang/strconv_test.py
+++ b/golang/strconv_test.py
 # -*- coding: utf-8 -*-
-# Copyright (C) 2018-2022  Nexedi SA and Contributors.
+# Copyright (C) 2018-2023  Nexedi SA and Contributors.
 #                          Kirill Smelkov <kirr@nexedi.com>
 #
 # This program is free software: you can Use, Study, Modify and Redistribute
@@ -20,12 +20,16 @@

 from __future__ import print_function, absolute_import

+from golang import bstr
 from golang.strconv import quote, unquote, unquote_next
 from golang.gcompat import qq

-from six import int2byte as bchr, PY3
+from six import int2byte as bchr
 from six.moves import range as xrange
-from pytest import raises
+from pytest import raises, mark
+
+import codecs
+

 def byterange(start, stop):
    b = b""
@@ -34,16 +38,9 @@ def byterange(start, stop):

    return b

-# asstr converts unicode|bytes to str type of current python.
-def asstr(s):
-    if PY3:
-        if isinstance(s, bytes):
-            s = s.decode('utf-8')
-    # PY2
-    else:
-        if isinstance(s, unicode):
-            s = s.encode('utf-8')
-    return s
+def assert_bstreq(x, y):
+    assert type(x) is bstr
+    assert x == y

 def test_quote():
    testv = (
@@ -72,6 +69,9 @@ def test_quote():
        (u'\ufffd',         u'�'),
    )

+    # quote/unquote* always give bstr
+    BEQ = assert_bstreq
+
    for tin, tquoted in testv:
        # quote(in) == quoted
        # in = unquote(quoted)
@@ -79,14 +79,13 @@ def test_quote():
        tail = b'123' if isinstance(tquoted, bytes) else '123'
        tquoted = q + tquoted + q   # add lead/trail "

-        assert quote(tin) == tquoted
-        assert unquote(tquoted) == tin
-        assert unquote_next(tquoted) == (tin, type(tin)())
-        assert unquote_next(tquoted + tail) == (tin, tail)
+        BEQ(quote(tin), tquoted)
+        BEQ(unquote(tquoted), tin)
+        _, __ = unquote_next(tquoted);          BEQ(_, tin);  BEQ(__, "")
+        _, __ = unquote_next(tquoted + tail);   BEQ(_, tin);  BEQ(__, tail)
        with raises(ValueError): unquote(tquoted + tail)

-        # qq always gives str
-        assert qq(tin) == asstr(tquoted)
+        BEQ(qq(tin), tquoted)

        # also check how it works on complementary unicode/bytes input type
        if isinstance(tin, bytes):
@@ -103,14 +102,13 @@ def test_quote():
            tquoted = tquoted.encode('utf-8')
            tail = tail.encode('utf-8')

-        assert quote(tin) == tquoted
-        assert unquote(tquoted) == tin
-        assert unquote_next(tquoted) == (tin, type(tin)())
-        assert unquote_next(tquoted + tail) == (tin, tail)
+        BEQ(quote(tin), tquoted)
+        BEQ(unquote(tquoted), tin)
+        _, __ = unquote_next(tquoted);          BEQ(_, tin);  BEQ(__, "")
+        _, __ = unquote_next(tquoted + tail);   BEQ(_, tin);  BEQ(__, tail)
        with raises(ValueError): unquote(tquoted + tail)

-        # qq always gives str
-        assert qq(tin) == asstr(tquoted)
+        BEQ(qq(tin), tquoted)


 # verify that non-canonical quotation can be unquoted too.
@@ -143,3 +141,52 @@ def test_unquote_bad():
        with raises(ValueError) as exc:
            unquote(tin)
        assert exc.value.args == (err,)
+
+
+# ---- benchmarks ----
+
+# quoting + unquoting
+uchar_testv = ['a',               # ascii
+               u'α',              # 2-bytes utf8
+               u'\u65e5',         # 3-bytes utf8
+               u'\U0001f64f']     # 4-bytes utf8
+
+@mark.parametrize('ch', uchar_testv)
+def bench_quote(b, ch):
+    s = bstr_ch1000(ch)
+    q = quote
+    for i in xrange(b.N):
+        q(s)
+
+def bench_stdquote(b):
+    s = b'a'*1000
+    q = repr
+    for i in xrange(b.N):
+        q(s)
+
+
+@mark.parametrize('ch', uchar_testv)
+def bench_unquote(b, ch):
+    s = bstr_ch1000(ch)
+    s = quote(s)
+    unq = unquote
+    for i in xrange(b.N):
+        unq(s)
+
+def bench_stdunquote(b):
+    s = b'"' + b'a'*1000 + b'"'
+    escape_decode = codecs.escape_decode
+    def unq(s): return escape_decode(s[1:-1])[0]
+    for i in xrange(b.N):
+        unq(s)
+
+
+# bstr_ch1000 returns bstr with many repetitions of character ch occupying ~ 1000 bytes.
+def bstr_ch1000(ch): # -> bstr
+    assert len(ch) == 1
+    s = bstr(ch)
+    s = s * (1000 // len(s))
+    if len(s) % 3 == 0:
+        s += 'x'
+    assert len(s) == 1000
+    return s
--- a/golang/testprog/golang_test_str.py
+++ b/golang/testprog/golang_test_str.py
@@ -18,7 +18,7 @@
 #
 # See COPYING file for full licensing terms.
 # See https://www.nexedi.com/licensing for rationale and options.
-"""This program helps to verify _pystr and _pyunicode.
+"""This program helps to verify b, u and underlying bstr and ustr.

 It complements golang_str_test.test_strings_print.
 """
@@ -31,8 +31,17 @@ from golang.gcompat import qq
 def main():
    sb = b("привет αβγ b")
    su = u("привет αβγ u")
+    print("print(b):", sb)
+    print("print(u):", su)
    print("print(qq(b)):", qq(sb))
    print("print(qq(u)):", qq(su))
+    print("print(repr(b)):", repr(sb))
+    print("print(repr(u)):", repr(su))
+
+    # py2: print(dict) calls PyObject_Print(flags=0) for both keys and values,
+    #      not with flags=Py_PRINT_RAW used by default almost everywhere else.
+    #      this way we can verify whether bstr.tp_print handles flags correctly.
+    print("print({b: u}):", {sb: su})


 if __name__ == '__main__':

--- a/golang/testprog/golang_test_str.txt
+++ b/golang/testprog/golang_test_str.txt
+print(b): привет αβγ b
+print(u): привет αβγ u
 print(qq(b)): "привет αβγ b"
 print(qq(u)): "привет αβγ u"
+print(repr(b)): b('привет αβγ b')
+print(repr(u)): u('привет αβγ u')
+print({b: u}): {b('привет αβγ b'): u('привет αβγ u')}
--- a/golang/testprog/golang_test_str_index2.py
+++ b/golang/testprog/golang_test_str_index2.py
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# Copyright (C) 2022-2023  Nexedi SA and Contributors.
+#                          Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""This program helps to verify [:] handling for bstr and ustr.
+
+It complements golang_str_test.test_strings_index2.
+
+It needs to verify [:] only lightly because thorough verification is done in
+test_string_index, and here we need to verify only that __getslice__, inherited
+from builtin str/unicode, does not get into our way.
+"""
+
+from __future__ import print_function, absolute_import
+
+from golang import b, u, bstr, ustr
+from golang.gcompat import qq
+
+
+def main():
+    us = u("миру мир")
+    bs = b("миру мир")
+
+    def emit(what, uobj, bobj):
+        assert type(uobj) is ustr
+        assert type(bobj) is bstr
+        print("u"+what, qq(uobj))
+        print("b"+what, qq(bobj))
+
+    emit("s",       us,        bs)
+    emit("s[:]",    us[:],     bs[:])
+    emit("s[0:1]",  us[0:1],   bs[0:1])
+    emit("s[0:2]",  us[0:2],   bs[0:2])
+    emit("s[1:2]",  us[1:2],   bs[1:2])
+    emit("s[0:-1]", us[0:-1],  bs[0:-1])
+
+
+if __name__ == '__main__':
+    main()
--- a/golang/testprog/golang_test_str_index2.txt
+++ b/golang/testprog/golang_test_str_index2.txt
+us "миру мир"
+bs "миру мир"
+us[:] "миру мир"
+bs[:] "миру мир"
+us[0:1] "м"
+bs[0:1] "\xd0"
+us[0:2] "ми"
+bs[0:2] "м"
+us[1:2] "и"
+bs[1:2] "\xbc"
+us[0:-1] "миру ми"
+bs[0:-1] "миру ми\xd1"
--- a/golang/unicode/__init__.py
+++ b/golang/unicode/__init__.py
--- a/golang/unicode/_utf8.pxd
+++ b/golang/unicode/_utf8.pxd
+# cython: language_level=2
+# Copyright (C) 2023  Nexedi SA and Contributors.
+#                     Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package utf8 mirrors Go package utf8.
+
+See https://golang.org/pkg/unicode/utf8 for Go utf8 package documentation.
+"""
+
+from golang cimport rune
+
+cdef extern from "golang/unicode/utf8.h" namespace "golang::unicode::utf8" nogil:
+    rune RuneError
--- a/golang/unicode/utf8.h
+++ b/golang/unicode/utf8.h
+#ifndef _NXD_LIBGOLANG_UNICODE_UTF8_H
+#define _NXD_LIBGOLANG_UNICODE_UTF8_H
+
+// Copyright (C) 2023  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+// Package utf8 mirrors Go package utf8.
+
+#include <golang/libgolang.h>
+
+// golang::unicode::utf8::
+namespace golang {
+namespace unicode {
+namespace utf8 {
+
+constexpr rune RuneError = 0xFFFD;  // unicode replacement character
+
+}}} // golang::os::utf8::
+
+#endif  // _NXD_LIBGOLANG_UNICODE_UTF8_H
--- a/golang/unicode/utf8.pxd
+++ b/golang/unicode/utf8.pxd
+# cython: language_level=2
+# Copyright (C) 2023  Nexedi SA and Contributors.
+#                     Kirill Smelkov <kirr@nexedi.com>
+#
+# This program is free software: you can Use, Study, Modify and Redistribute
+# it under the terms of the GNU General Public License version 3, or (at your
+# option) any later version, as published by the Free Software Foundation.
+#
+# You can also Link and Combine this program with other software covered by
+# the terms of any of the Free Software licenses or any of the Open Source
+# Initiative approved licenses and Convey the resulting work. Corresponding
+# source of such a combination shall include the source code for all other
+# software used.
+#
+# This program is distributed WITHOUT ANY WARRANTY; without even the implied
+# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# See COPYING file for full licensing terms.
+# See https://www.nexedi.com/licensing for rationale and options.
+"""Package utf8 mirrors Go package utf8.
+
+See _utf8.pxd for package documentation.
+"""
+
+# redirect cimport: golang.unicode.utf8 -> golang.unicode._utf8 (see __init__.pxd for rationale)
+from golang.unicode._utf8 cimport *
--- a/gpython/gpython_test.py
+++ b/gpython/gpython_test.py
@@ -71,6 +71,12 @@ def test_golang_builtins():
    assert error  is golang.error
    assert b      is golang.b
    assert u      is golang.u
+    assert bstr   is golang.bstr
+    assert ustr   is golang.ustr
+    assert biter  is golang.biter
+    assert uiter  is golang.uiter
+    assert bbyte  is golang.bbyte
+    assert uchr   is golang.uchr

    # indirectly verify golang.__all__
    for k in golang.__all__:

--- a/setup.py
+++ b/setup.py
@@ -19,6 +19,25 @@
 # See COPYING file for full licensing terms.
 # See https://www.nexedi.com/licensing for rationale and options.

+# patch cython to allow `cdef class X(bytes)` while building pygolang to
+# workaround https://github.com/cython/cython/issues/711
+# see `cdef class pybstr` in golang/_golang_str.pyx for details.
+# (should become unneeded with cython 3 once https://github.com/cython/cython/pull/5212 is finished)
+import inspect
+from Cython.Compiler.PyrexTypes import BuiltinObjectType
+def pygo_cy_builtin_type_name_set(self, v):
+    self._pygo_name = v
+def pygo_cy_builtin_type_name_get(self):
+    name = self._pygo_name
+    if name == 'bytes':
+        caller = inspect.currentframe().f_back.f_code.co_name
+        if caller == 'analyse_declarations':
+            # need anything different from 'bytes' to deactivate check in
+            # https://github.com/cython/cython/blob/c21b39d4/Cython/Compiler/Nodes.py#L4759-L4762
+            name = 'xxx'
+    return name
+BuiltinObjectType.name = property(pygo_cy_builtin_type_name_get, pygo_cy_builtin_type_name_set)
+
 from setuptools import find_packages
 from setuptools.command.install_scripts import install_scripts as _install_scripts
 from setuptools.command.develop import develop as _develop
@@ -166,7 +185,8 @@ for pkg in R:
 R['all'] = Rall

 # ipython/pytest are required to test py2 integration patches
-R['all_test'] = Rall.union(['ipython', 'pytest']) # pip does not like "+" in all+test
+# zodbpickle is used to test pickle support for bstr/ustr
+R['all_test'] = Rall.union(['ipython', 'pytest', 'zodbpickle']) # pip does not like "+" in all+test

 # extras_require <- R
 extras_require = {}
@@ -207,6 +227,7 @@ setup(
                        ['golang/runtime/libgolang.cpp',
                         'golang/runtime/internal/atomic.cpp',
                         'golang/runtime/internal/syscall.cpp',
+                         'golang/runtime.cpp',
                         'golang/context.cpp',
                         'golang/errors.cpp',
                         'golang/fmt.cpp',
@@ -218,9 +239,11 @@ setup(
                         'golang/time.cpp'],
                        depends = [
                            'golang/libgolang.h',
+                            'golang/runtime.h',
                            'golang/runtime/internal.h',
                            'golang/runtime/internal/atomic.h',
                            'golang/runtime/internal/syscall.h',
+                            'golang/runtime/platform.h',
                            'golang/context.h',
                            'golang/cxx.h',
                            'golang/errors.h',
@@ -249,7 +272,9 @@ setup(
    ext_modules = [
                    Ext('golang._golang',
                        ['golang/_golang.pyx'],
-                        depends = ['golang/_golang_str.pyx']),
+                        depends = [
+                            'golang/_golang_str.pyx',
+                            'golang/_golang_str_pickle.pyx']),

                    Ext('golang.runtime._runtime_thread',
                        ['golang/runtime/_runtime_thread.pyx']),
@@ -301,6 +326,9 @@ setup(
                    Ext('golang.os._signal',
                        ['golang/os/_signal.pyx']),

+                    Ext('golang._strconv',
+                        ['golang/_strconv.pyx']),
+
                    Ext('golang._strings_test',
                        ['golang/_strings_test.pyx',
                         'golang/strings_test.cpp']),