- 20 Dec, 2024 17 commits
-
-
Kirill Smelkov authored
We want bstr/ustr pickling support to be robust. So we need to test it against all pickle modules that are in use. This includes python pickle version from stdlib (pickle.py), C pickle version from stdlib (cPickle on py2 and _pickle on py3) and, correspondingly, py and C versions from zodbpickle. -> Adjust pickling tests to cover all those variants.
-
Kirill Smelkov authored
In the future we will be adding more functionality and tests related to pickling. So it makes sense to keep pickle-related functionality in its own unit. -> Move the code to golang_str_pickle* as a preparatory step for that.
-
Kirill Smelkov authored
For pybstr/pyustr cython generates .tp_dealloc that refer to bytes/unicode types directly. That works ok in normal circumstances, but will lead to crash when gpython will start patching builtin str and unicode types with bstr and ustr: (py39.venv) kirr@deca:~/src/tools/go/pygolang-master$ gpython Ошибка сегментирования (образ памяти сброшен на диск) (py39.venv) kirr@deca:~/src/tools/go/pygolang-master$ gdb python core ... Core was generated by `/home/kirr/src/tools/go/py39.venv/bin/python3.9 /home/kirr/src/tools/go/py39.ve'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f2edb247d5c in PyType_HasFeature (type=<error reading variable: Cannot access memory at address 0x7ffc6ca1bff8>, feature=<error reading variable: Cannot access memory at address 0x7ffc6ca1bff0>) at /home/kirr/local/py3.9/include/python3.9/object.h:622 622 { (gdb) bt #0 0x00007f2edb247d5c in PyType_HasFeature (type=<error reading variable: Cannot access memory at address 0x7ffc6ca1bff8>, feature=<error reading variable: Cannot access memory at address 0x7ffc6ca1bff0>) at /home/kirr/local/py3.9/include/python3.9/object.h:622 #1 0x00007f2edb2f4b28 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88982 #2 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #3 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #4 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #5 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #6 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #7 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #8 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #9 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 #10 0x00007f2edb2f4bc8 in __pyx_tp_dealloc_6golang_7_golang__pyustr (o=0x7f2edae99030) at golang/_golang.cpp:88986 ... -> Fix that crash by manually repointing .tp_dealloc of bstr/ustr to .tp_dealloc of original bytes and unicode.
-
Kirill Smelkov authored
When gpython will start patching builtin str and unicode types with bstr and ustr the first argument to assertDeepEQ will have builtin str or unicode type and the existing assert not isinstance(a, (bstr, ustr)) will break. -> Rewrite that assert to do equivalent check carefully that does not break when str/unicode types are patched with bstr and ustr.
-
Kirill Smelkov authored
When gpython will start patching builtin str and unicode types with bstr and ustr it might be the case that b('abc') return the same 'abc' object and so the logic in this test will become broken. -> Avoid that by keeping the original data in bytearray which for sure won't overlap with bytes/str nor unicode irregardless whether those builtin types are patched or not.
-
Kirill Smelkov authored
Assert that input belongs to the set of expected types. Assert that the output has exactly the type we promised. No change in functionality. We are now just more certain that those functions work as intended and could be relied upon.
-
Kirill Smelkov authored
In 390fd810 (golang_str: bstr/ustr %-formatting) I've implemented percent formatting but missed to handle tuple-subclass argv correctly. For example the following works with std string: In [1]: import collections as cc In [5]: Point = cc.namedtuple('Point', ['x', 'y']) In [9]: 'α %s %s π' % Point('β','γ') Out[9]: '\xce\xb1 \xce\xb2 \xce\xb3 \xcf\x80' while it fails with ustr: In [8]: ustr('α %s %s π') % Point('β','γ') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-4f1a97267f2a> in <module>() ----> 1 ustr('α %s %s π') % Point('β','γ') /home/kirr/src/tools/go/pygolang/golang/_golang_str.pyx in golang._golang._pyustr.__mod__() 850 # %-formatting 851 def __mod__(a, b): --> 852 return pyu(pyb(a).__mod__(b)) 853 def __rmod__(b, a): 854 # ("..." % x) calls "x.__rmod__()" for string subtypes /home/kirr/src/tools/go/pygolang/golang/_golang_str.pyx in golang._golang._pybstr.__mod__() 473 # %-formatting 474 def __mod__(a, b): --> 475 return _bprintf(a, b) 476 def __rmod__(b, a): 477 # ("..." % x) calls "x.__rmod__()" for string subtypes /home/kirr/src/tools/go/pygolang/golang/_golang_str.pyx in golang._golang._bprintf() 1648 1649 if isinstance(xarg, tuple): -> 1650 argv = xarg 1651 xarg = _missing 1652 TypeError: Expected tuple, got Point -> Fix that.
-
Kirill Smelkov authored
Previously test_strings_mod_and_format was testing % and .format via compareing bstr and ustr results with similar result for unicode. This works reasonably ok. However under gpython, when unicode will be replaced with ustr, it will no longer compare results of bstr/ustr methods with something good and external - indeed in that case bstr/ustr e.g. result of % will be compared to result of ustr % which opens the door for bugs to stay unnoticed. -> Adjust the test, similarly to 9a075b17 (golang_str: tests: Make test_strings_methods more robust with upcoming unicode=ustr), to explicitly provide expected result for all entries in the test vector. We make sure those results are good and match std python because we also assert that unicode % and .format match it.
-
Kirill Smelkov authored
NumPy uses s.translate(str) and under gpython/py3 with str patched to be ustr it breaks with: File ".../numpy-1.24.4-py3.9-linux-x86_64.egg/numpy/core/_string_helpers.py", line 40, in english_lower lowered = s.translate(LOWER_TABLE) File "golang/_golang_str.pyx", line 909, in golang._golang._pyustr.translate AttributeError: 'str' object has no attribute 'items' https://docs.python.org/3/library/stdtypes.html#str.translate documents translate to work on both mappings and sequences, so my usage of table.items() in ff24be3d (golang_str: bstr/ustr string methods) was not correct. -> Fix it by reworking ustr.translate to use our proxy mapping instead of going through all items of original table in the beginning.
-
Kirill Smelkov authored
On py2 \u does not work in str literals - only in unicode ones. This corrects all tests that were doing x32 incorrectly due to the thinko.
-
Kirill Smelkov authored
This behaviour is provided by builtin str and we were not following it: $ python3 Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> class SSS(str): pass ... >>> z = SSS('abc') >>> z 'abc' >>> type(z) <class '__main__.SSS'> >>> q = str(z) >>> q 'abc' >>> type(q) <class 'str'> >>> r = z.__str__() >>> r 'abc' >>> type(r) <class 'str'> <-- NOTE str, not __main__.SSS $ gpython # with str patched to be ustr >>> class SSS(str): pass >>> z = SSS('abc') >>> z 'abc' >>> type(z) <class '__main__.SSS'> >>> q = str(z) >>> q 'abc' >>> type(q) <class 'str'> >>> r = z.__str__() >>> r 'abc' >>> type(r) <class '__main__.SSS'> <-- NOTE not str which leads to crash during IPython startup on py3.11: $ gpython -m IPython # with str patched to be ustr Traceback (most recent call last): File "/home/kirr/src/tools/go/py3.venv/bin/gpython", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 478, in main pymain(argv, init) File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 291, in pymain run(mmain) File "/home/kirr/src/tools/go/pygolang-master/gpython/__init__.py", line 162, in run runpy._run_module_as_main(mod) File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/__main__.py", line 15, in <module> start_ipython() File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/__init__.py", line 128, in start_ipython return launch_new_instance(argv=argv, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/application.py", line 1042, in launch_instance app.initialize(argv) File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/application.py", line 113, in inner return method(app, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ipapp.py", line 279, in initialize self.init_shell() File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ipapp.py", line 293, in init_shell self.shell = self.interactive_shell_class.instance(parent=self, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/traitlets/config/configurable.py", line 551, in instance inst = cls(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 856, in __init__ self.init_prompt_toolkit_cli() File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 648, in init_prompt_toolkit_cli **self._extra_prompt_options(), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/interactiveshell.py", line 751, in _extra_prompt_options "lexer": IPythonPTLexer(), ^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/IPython/terminal/ptutils.py", line 177, in __init__ self.python_lexer = PygmentsLexer(l.Python3Lexer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/prompt_toolkit/lexers/pygments.py", line 198, in __init__ self.pygments_lexer = pygments_lexer_cls( ^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 647, in __call__ cls._tokens = cls.process_tokendef('', cls.get_tokendefs()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 586, in process_tokendef cls._process_state(tokendefs, processed, state) File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 549, in _process_state tokens.extend(cls._process_state(unprocessed, processed, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kirr/src/tools/go/py3.venv/lib/python3.11/site-packages/pygments/lexer.py", line 533, in _process_state assert type(state) is str, "wrong state name %r (%r)" % (state, type(state)) ^^^^^^^^^^^^^^^^^^ AssertionError: wrong state name 'keywords' (<class 'pygments.lexer.include'>) If you suspect this is an IPython 8.12.0 bug, please report it at: https://github.com/ipython/ipython/issues or send an email to the mailing list at ipython-dev@python.org You can print a more detailed traceback right now with "%tb", or use "%debug" to interactively debug it. Extra-detailed tracebacks for bug-reporting purposes can be enabled via: c.Application.verbose_crash=True Here pygments define class include(str): pass and wants `str(obj)` to return str, not include if obj was instance of include. -> Adjust bstr/ustr .__str__() to always return bstr/ustr even for subclassed. For consistency, do the same for .__unicode__ . In case a subclass wants its __str__, or __unicode__ to return self without casting to bstr/ustr, it can override those methods.
-
Kirill Smelkov authored
In bbbb58f0 (golang_str: bstr/ustr support for + and *) I've added support for binary string operations, but similarly to __eq__ did not handle correctly the case for arbitrary arguments that potentially define __radd__ and similar. As the result it breaks when running e.g. bstr + pyparsing.Regex File ".../pyparsing-2.4.7-py2.7.egg/pyparsing.py", line 6591, in pyparsing_common _full_ipv6_address = (_ipv6_part + (':' + _ipv6_part) * 7).setName("full IPv6 address") File "golang/_golang_str.pyx", line 469, in golang._golang._pybstr.__add__ return pyb(zbytes.__add__(a, _pyb_coerce(b))) File "golang/_golang_str.pyx", line 243, in golang._golang._pyb_coerce raise TypeError("b: coerce: invalid type %s" % type(x)) TypeError: b: coerce: invalid type <class 'pyparsing.Regex'> because pyparsing.Regex is a type, that does not inherit from str, but defines its own __radd__ to handle str + Regex as Regex. -> Fix it by returning NotImplemented from under __add__ and other operations where it is needed so that bstr and ustr behave in the same way as builtin str wrt third types, but care to handle bstr/ustr promise that only explicit conversion through `b` and `u` accept objects with buffer interface. Automatic coercion does not.
-
Kirill Smelkov authored
In 54c2a3cf (golang_str: Teach bstr/ustr to compare wrt any string with automatic coercion) I've added __eq__, __ne__, __lt__ etc methods to our strings, but __lt__ and other comparison to raise TypeError against any non-string type. My idea was to mimic user-visible py3 behaviour such as >>> "abc" > 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: '>' not supported between instances of 'str' and 'int' However it turned out that the implementation was not exactly matching what Python is doing internally which lead to incorrect behaviour when bstr or ustr is compared wrt another type with its own __cmp__. In the general case for `a op b` Python first queries a.__op__(b) and b.__op'__(a) and sometimes other methods before going to .__cmp__. This relies on the methods to return NotImplemented instead of raising an exception and if a trial raises TypeError everything is stopped and that TypeError is returned to the caller. Jérome reports a real breakage due to this when bstr is compared wrt distutils.version.LooseVersion . LooseVersion is basically class LooseVersion(Version): def __cmp__ (self, other): if isinstance(other, StringType): other = LooseVersion(other) return cmp(self.version, other.version) but due to my thinko on `LooseVersion < bstr` the control flow was not getting into that LooseVersion.__cmp__ because bstr.__gt__ was tried first and raised TypeError. -> Fix all comparison operations to return NotImplemented instead of raising TypeError and make sure in the tests that this behaviour exactly matches what native str type does. The fix is needed not only for py2 because added test_strings_cmp_wrt_distutils_LooseVersion was failing on py3 as well without the fix. /reported-by @jerome /reported-on nexedi/slapos!1575 (comment 206080)
-
Kirill Smelkov authored
Without working unicode.decode gpython/py2 with unicode replaced by ustr fails when running ERP5 as follows: $ /srv/slapgrid/slappart49/t/ekg/i/5/bin/runTestSuite --help No handlers could be found for logger "SecurityInfo" Traceback (most recent call last): File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/bin/.runTestSuite.pyexe", line 296, in <module> main() File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 484, in main pymain(argv, init) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 292, in pymain run(mmain) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 192, in run _execfile(filepath, mmain.__dict__) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/pygolang/gpython/__init__.py", line 339, in _execfile six.exec_(code, globals, locals) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/eggs/six-1.16.0-py2.7.egg/six.py", line 735, in exec_ exec("""exec _code_ in _globs_, _locs_""") File "<string>", line 1, in <module> File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/bin/runTestSuite", line 10, in <module> from Products.ERP5Type.tests.runTestSuite import main; sys.exit(main()) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/__init__.py", line 96, in <module> from . import ZopePatch File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/ZopePatch.py", line 75, in <module> from Products.ERP5Type.patches import ZopePageTemplateUtils File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/parts/erp5/product/ERP5Type/patches/ZopePageTemplateUtils.py", line 58, in <module> convertToUnicode(u'', 'text/xml', ()) File "/srv/slapgrid/slappart49/t/ekg/soft/b5048b47894a7612651c7fe81c2c8636/eggs/Zope-4.8.9+slapospatched002-py2.7.egg/Products/PageTemplates/utils.py", line 73, in convertToUnicode return source.decode(encoding), encoding AttributeError: unreadable attribute and in general if we treat both bstr ans ustr being two different representations of the same entity, if we have bstr.decode, having ustr.decode is also needed for symmetry with both operations converting bytes representation of the string into unicode. Now there is full symmetry in between bstr/ustr and encode/decode. Quoting updated encode/decode text: Encode encodes unicode representation of the string into bytes, leaving string domain. Decode decodes bytes representation of the string into ustr, staying inside string domain. Both bstr and ustr are accepted by encode and decode treating them as two different representations of the same entity. On encoding, for bstr, the string representation is first converted to unicode and encoded to bytes from there. For ustr unicode representation of the string is directly encoded. On decoding, for ustr, the string representation is first converted to bytes and decoded to unicode from there. For bstr bytes representation of the string is directly decoded.
-
Kirill Smelkov authored
Initially in 023907ee (golang_str: bstr/ustr encode/decode) I implemented things in such a way that (b|u)str.__bytes__ were giving bstr and ustr.encode() was giving bstr as well. My logic here was that bstr is based on bytes and it is ok to give that. However this logic did not pass backward compatibility test: for example when LXML is imported it does cdef bytes _FILENAME_ENCODING = (sys.getfilesystemencoding() or sys.getdefaultencoding() or 'ascii').encode("UTF-8") and under gpython/py3 with unicode patched to be ustr it breaks with File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/bin/runwsgi", line 4, in <module> from Products.ERP5.bin.zopewsgi import runwsgi; sys.exit(runwsgi()) File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5/__init__.py", line 36, in <module> from Products.ERP5Type.Utils import initializeProduct, updateGlobals File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/__init__.py", line 42, in <module> from .patches import pylint File "/srv/slapgrid/slappart47/srv/runner/software/7f1663e8148f227ce3c6a38fc52796e2/parts/erp5/product/ERP5Type/patches/pylint.py", line 524, in <module> __import__(module_name, fromlist=[module_name], level=0)) File "src/lxml/sax.py", line 18, in init lxml.sax File "src/lxml/etree.pyx", line 154, in init lxml.etree TypeError: Expected bytes, got golang.bstr The breakage highlights a thinko in my previous reasoning: yes bstr is based on bytes, but bstr has different semantics compared to bytes: even though e.g. __getitem__ works the same way for bytes on py2, it works differently compared to py3. This way if on py3 a program is doing bytes(x) or x.encode() it then expects the result to have bytes semantics of current python which is not the case if the result is bstr. -> Fix that by adjusting .encode() and .__bytes__() to produce bytes type of current python and leave string domain. I initially was contemplating for some time to introduce a third type, e.g. bvec also based on bytes, but having bytes semantic and that bvec.decode would return back to pygolang strings domain. But due to the fact that bytes semantic is different in between py2 and py3, it would mean that bvec provided by pygolang would need to have different behaviours dependent on current python version which is undesirable. In the end with leaving into native bytes the "bytes inconsistency" problem is left to remain under std python with pygolang targeting only to fix strings inconsistency in between py2 and py3 and providing the same semantic for bstr and ustr on all python versions. It also does not harm that bytes.decode() returns std unicode instead of ustr: for programs that run under unpatched python we have u() to convert the result to ustr, while under gpython std unicode is actually ustr which makes bytes.decode() behaviour still quite ok. P.S. we enable bstr.encode for consistency and because under py2, if not enabled, it will break when running pytest under gpython in File ".../_pytest/assertion/rewrite.py", line 352, in <module> RN = "\r\n".encode("utf-8") AttributeError: unreadable attribute
-
Kirill Smelkov authored
In a72c1c1a (golang_str: bstr/ustr iteration) things were initially implemented to follow Go semantic exactly with bytestring iteration yielding unicode characters as explained in https://blog.golang.org/strings. However this makes bstr not a 100% drop-in compatible replacement for std str under py2, and even though my initial testing was saying this change does not affect programs in practice it turned out to be not the case. For example with bstr.__iter__ yielding unicode characters running gpython on py2 with builtin str patched to be bstr will break sometimes when importing uuid: There uuid reads 16 bytes from /dev/random and then wants to iterate those 16 bytes as single bytes and then expects that the length of the resulting sequence is exactly 16: int = long(('%02x'*16) % tuple(map(ord, bytes)), 16) ( https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/uuid.py#L147 ) which breaks if some of the read bytes are higher than 0x7f. Even though this particular problem could be worked-around with patching uuid, there is no evidence that there will be no similar problems later, which could be many. -> So adjust bstr semantic instead to follow semantic of str under py2 and introduce uiter() primitive to still be able to iterate bytestrings as unicode characters. This makes bstr, hopefully, to be fully compatible with str on py2 while still providing reasonably good approach for strings processing the Go-way when needed. Add biter as well for symmetry. See nexedi/pygolang!21 (comment 170754) nexedi/pygolang!21 (comment 170782) ... and nexedi/pygolang!21 (comment 206044) for discussion on iter(bstr) topic.
-
Kirill Smelkov authored
Add type annotations and use C-level objects instead of py-ones where it is easy to do. We are not all-good yet, but this already brings some noticable speedup: name old time/op new time/op delta quote[a] 786µs ± 1% 10µs ± 0% -98.76% (p=0.016 n=4+5) quote[\u03b1] 1.12ms ± 0% 0.41ms ± 0% -63.37% (p=0.008 n=5+5) quote[\u65e5] 738µs ± 2% 258µs ± 0% -65.07% (p=0.016 n=4+5) quote[\U0001f64f] 920µs ± 1% 78µs ± 0% -91.46% (p=0.016 n=5+4) stdquote 1.19µs ± 0% 1.19µs ± 0% ~ (p=0.794 n=5+5) unquote[a] 1.08ms ± 0% 1.08ms ± 1% ~ (p=0.548 n=5+5) unquote[\u03b1] 797µs ± 0% 807µs ± 1% +1.23% (p=0.008 n=5+5) unquote[\u65e5] 522µs ± 0% 520µs ± 1% ~ (p=0.056 n=5+5) unquote[\U0001f64f] 3.21ms ± 0% 3.14ms ± 0% -2.13% (p=0.008 n=5+5) stdunquote 815ns ± 0% 836ns ± 0% +2.63% (p=0.008 n=5+5)
-
- 16 Dec, 2024 14 commits
-
-
Kirill Smelkov authored
Since 50b8cb7e (strconv: Move functionality related to UTF8 encode/decode into _golang_str) both golang_str and strconv import each other. Before this patch that import was done at py level at runtime from outside to workaround the import cycle. This results in that strconv functionality is not available while golang is only being imported. So far it was not a problem, but when builtin string types will become patched with bstr and ustr, that will become a problem because string repr starts to be used at import time, which for pybstr is implemented via strconv.quote . -> Fix this by switching golang and strconv to cimport each other at pyx level. There, similarly to C, the cycle works just ok out of the box. This also automatically helps performance a bit: name old time/op new time/op delta quote[a] 805µs ± 0% 786µs ± 1% -2.40% (p=0.016 n=5+4) quote[\u03b1] 1.21ms ± 0% 1.12ms ± 0% -7.47% (p=0.008 n=5+5) quote[\u65e5] 785µs ± 0% 738µs ± 2% -5.97% (p=0.016 n=5+4) quote[\U0001f64f] 1.04ms ± 0% 0.92ms ± 1% -11.73% (p=0.008 n=5+5) stdquote 1.18µs ± 0% 1.19µs ± 0% +0.54% (p=0.008 n=5+5) unquote[a] 1.26ms ± 0% 1.08ms ± 0% -14.66% (p=0.008 n=5+5) unquote[\u03b1] 911µs ± 1% 797µs ± 0% -12.55% (p=0.008 n=5+5) unquote[\u65e5] 592µs ± 0% 522µs ± 0% -11.81% (p=0.008 n=5+5) unquote[\U0001f64f] 3.46ms ± 0% 3.21ms ± 0% -7.34% (p=0.008 n=5+5) stdunquote 812ns ± 1% 815ns ± 0% ~ (p=0.183 n=5+5)
-
Kirill Smelkov authored
So far this is plain code movement with no type annotations added and internal from-strconv imports still being done via py level. As expected this does not help practically for performance yet: name old time/op new time/op delta quote[a] 910µs ± 0% 805µs ± 0% -11.54% (p=0.008 n=5+5) quote[\u03b1] 1.23ms ± 0% 1.21ms ± 0% -1.24% (p=0.008 n=5+5) quote[\u65e5] 800µs ± 0% 785µs ± 0% -1.86% (p=0.016 n=4+5) quote[\U0001f64f] 1.06ms ± 1% 1.04ms ± 0% -1.92% (p=0.008 n=5+5) stdquote 1.17µs ± 0% 1.18µs ± 0% +0.80% (p=0.008 n=5+5) unquote[a] 1.33ms ± 1% 1.26ms ± 0% -5.13% (p=0.008 n=5+5) unquote[\u03b1] 952µs ± 2% 911µs ± 1% -4.25% (p=0.008 n=5+5) unquote[\u65e5] 613µs ± 2% 592µs ± 0% -3.48% (p=0.008 n=5+5) unquote[\U0001f64f] 3.62ms ± 1% 3.46ms ± 0% -4.32% (p=0.008 n=5+5) stdunquote 788ns ± 0% 812ns ± 1% +3.07% (p=0.016 n=4+5)
-
Kirill Smelkov authored
We will soon need to use error rune codepoint from both golang_str.pyx and strconv.pyx - so we need to move that definition into shared place. What fits best is unicode/utf8, so start that package and move the constant there.
-
Kirill Smelkov authored
We added byte and rune types in the previous patch. Let's use them now throughout whole codebase where appropriate. Currently the only place where unicode-codepoint is used is _utf8_decode_rune. uint8_t was used in many places.
-
Kirill Smelkov authored
Those types are the base when working with byte- and unicode strings. It will be clearer to use them explicitly instead of uint8_t and int32_t when processing string.
-
Kirill Smelkov authored
This functions are currently relatively slow. They were initially used in zodbdump and zodbrestore, where their speed did not matter much, but with bstr and ustr, since e.g. quote is used in repr, not having them to perform with speed similar to builtin string escaping starts to be an issue. Tatuya Kamada reports at nexedi/pygolang!21 (comment 170833) : ### 3. `u` seems slow with large arrays especially when `repr` it I have faced a slowness while testing `u`, `b` with python 2.7, especially with `repr`. ```python >>> timeit.timeit("from golang import b,u; u('あ'*199998)", number=10) 2.02020001411438 >>> timeit.timeit("from golang import b,u; repr(u('あ'*199998))", number=10) 54.60263395309448 ``` `bytes`(str) is very fast. ```python >>> timeit.timeit("from golang import b,u; bytes('あ'*199998)", number=10) 0.000392913818359375 >>> timeit.timeit("from golang import b,u; repr(bytes('あ'*199998))", number=10) 0.4604980945587158 ``` `b` is much faster than `u`, but still the repr seems slow. ``` >>> timeit.timeit("from golang import b,u; b('あ'*199998)", number=10) 0.0009968280792236328 >>> timeit.timeit("from golang import b,u; repr(b('あ'*199998))", number=10) 25.498882055282593 ``` The "repr" part of this problem is due to that both bstr.__repr__ and ustr.__repr__ use custom quoting routines which currently are implemented in pure python in strconv module: https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L282-291 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L582-591 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/_golang_str.pyx#L941-970 https://lab.nexedi.com/kirr/pygolang/blob/300d7dfa/golang/strconv.py#L31-92 The fix would be to move strconv.py to Cython and to correspondingly rework it to avoid using python-level constructs during quoting internally. Working on that was not a priority, but soon I will need to move strconv to Cython for another reason: to be able to break import cycle in between _golang and strconv. So it makes sense to add strconv benchmark first - since we'll start moving it to Cython anyway - to see where we are and how further changes will help performance-wise. Currently we are at name time/op quote[a] 910µs ± 0% quote[\u03b1] 1.23ms ± 0% quote[\u65e5] 800µs ± 0% quote[\U0001f64f] 1.06ms ± 1% stdquote 1.17µs ± 0% unquote[a] 1.33ms ± 1% unquote[\u03b1] 952µs ± 2% unquote[\u65e5] 613µs ± 2% unquote[\U0001f64f] 3.62ms ± 1% stdunquote 788ns ± 0% i.e. on py2 quoting is ~ 1000x slower than builtin string escaping, and unquoting is even slower. on py3 the situation is better, but still not good: name time/op quote[a] 579µs ± 1% quote[\u03b1] 942µs ± 1% quote[\u65e5] 595µs ± 0% quote[\U0001f64f] 274µs ± 1% stdquote 2.70µs ± 0% unquote[a] 696µs ± 1% unquote[\u03b1] 763µs ± 0% unquote[\u65e5] 474µs ± 1% unquote[\U0001f64f] 187µs ± 0% stdunquote 808ns ± 0% δ(py2, py3) for the reference: name py2 time/op py3 time/op delta quote[a] 910µs ± 0% 579µs ± 1% -36.42% (p=0.008 n=5+5) quote[\u03b1] 1.23ms ± 0% 0.94ms ± 1% -23.17% (p=0.008 n=5+5) quote[\u65e5] 800µs ± 0% 595µs ± 0% -25.63% (p=0.016 n=4+5) quote[\U0001f64f] 1.06ms ± 1% 0.27ms ± 1% -74.23% (p=0.008 n=5+5) stdquote 1.17µs ± 0% 2.70µs ± 0% +129.71% (p=0.008 n=5+5) unquote[a] 1.33ms ± 1% 0.70ms ± 1% -47.71% (p=0.008 n=5+5) unquote[\u03b1] 952µs ± 2% 763µs ± 0% -19.82% (p=0.008 n=5+5) unquote[\u65e5] 613µs ± 2% 474µs ± 1% -22.76% (p=0.008 n=5+5) unquote[\U0001f64f] 3.62ms ± 1% 0.19ms ± 0% -94.84% (p=0.016 n=5+4) stdunquote 788ns ± 0% 808ns ± 0% +2.59% (p=0.016 n=4+5)
-
Kirill Smelkov authored
And let pybstr/pyustr point to version of bstr/ustr types that is actually in use: - when bytes/unicode are not patched -> to _pybstr/_pyustr - when bytes/unicode will be patched -> to bytes/unicode to where original _pybstr/_pyustr were copied during bytes/unicode patching. at runtime the code uses pybstr/pyustr instead of _pybstr/_pyustr.
-
Kirill Smelkov authored
GPython will patch builtin bytes and unicode types. zbytes and zunicode will refer to original unpatched types. We will use them to invoke original bytes/unicode methods. NOTE we will test against bytes/unicode - not zbytes/zunicode - when inspecting type of objects. In other words we will use original bytes/unicode types only to refer to their original methods and code.
-
Kirill Smelkov authored
For gpython to switch builtin str/unicode to bstr/ustr we will need bstr/ustr to have exactly the same C layout as builtin string types. This is possible to achieve only via `cdef class`. It is also good to switch to `cdef class` for RAM savings - from https://github.com/cython/cython/pull/5212#issuecomment-1387659026 : # what Cython does at runtime for `class MyBytes(bytes)` In [3]: MyBytes = type('MyBytes', (bytes,), {'__slots__': ()}) In [4]: MyBytes Out[4]: __main__.MyBytes In [5]: a = bytes(b'123') In [6]: b = MyBytes(b'123') In [7]: a Out[7]: b'123' In [8]: b Out[8]: b'123' In [9]: a == b Out[9]: True In [10]: import sys In [11]: sys.getsizeof(a) Out[11]: 36 In [12]: sys.getsizeof(b) Out[12]: 52 So with `cdef class` we gain more control and optimize memory usage. This was not done before because cython forbids to `cdef class X(bytes)` due to https://github.com/cython/cython/issues/711. We work it around in setup.py with draft for proper patch pre-posted to upstream in https://github.com/cython/cython/pull/5212 .
-
Kirill Smelkov authored
Previously test_strings_methods was testing a method via comparing bstr and ustr results of .method() with similar result of unicode.method(). This works reasonably ok. However under gpython, when unicode will be replaced with ustr, it will no longer compare results of bstr/ustr methods with something good and external - indeed in that case bstr/ustr .method() will be compared to result of ustr.method() which opens the door for bugs to stay unnoticed. -> Adjust the test to explicitly provide expected result for all entries in the test vector. We make sure those results are good and match std python because we also assert that unicode.method() matches it.
-
Kirill Smelkov authored
On py2 str.decode('string-escape') returns str, not unicode and this property is actually being used and relied upon by Lib/pickle.py: https://github.com/python/cpython/blob/v2.7.18-0-g8d21aa21f2c/Lib/pickle.py#L967-L977 We promised bstr to be drop-in replacement for str on py2, so let's adjust its behaviour to match the original because if we do not, unpickling strings will break when str is replaced by bstr under gpython. Do not add bstr.encode yet until we hit a real case where it is actually used.
-
Kirill Smelkov authored
repr(ustr|bstr) will change behaviour depending on whether we are running under regular python, or gpython with string types replaced by bstr/ustr. But this test is completely orthogonal to that. -> Let's untie it from particular repr behaviour by emitting verified items in quoted form + asserting their types in the code.
-
Kirill Smelkov authored
In ebd18f3f the code was ok but there is a thinko in test: it needs to test all pickle protocols from 0 to _including_ HIGHEST_PROTOCOL.
-
Kirill Smelkov authored
-
- 04 Dec, 2024 3 commits
-
-
Kirill Smelkov authored
Since the beginning of pygolang it is possible to define methods separate from class. For example @func(MyClass) def my_method(self, ...): ... will define MyClass.my_method(*). This works for regular functions and staticmethod/classmethod as well. But support for properties was missing because there was no use case so far. -> Add support for properties as well as I hit the need for it during my work on wendelin.core monitoring. Test class changed to inherit from object since on py2 properties work only for new-style classes. (*) see afa46cf5 (Turn pygopath into full pygolang) and 942ee900 (golang: Deprecate @method(cls) in favour of @func(cls)) for details. /reviewed-by @levin.zimmermann /reviewed-on nexedi/pygolang!31
-
Kirill Smelkov authored
i.e. make double call of func(func(f)) to return exactly the same as func(f). This is correct to do as the first func call already returns a wrapper that setups additional frame for defer. The second func call, if doing the same, will wrap the thing just one more time and there will be two frames for defer, but defer needs only one to work correctly. So far we had no case when such double func calls would appear in practice, because @func @func def f(): ... would immediately catch attention. However in the next patch we will have this case to appear internally when handling properties. So it is better to make sure beforehand no waste of resources will happen. /reviewed-by @levin.zimmermann /reviewed-on !31
-
Kirill Smelkov authored
Since 5146eb0b (Add support for defer & recover) we have func, which for @func def f(): ... will turn f to be run with additional frame where defer can register calls. This works ok, but so far the worker of the wrapper was defined inside func itself - each time func was used, and also the worker had "no speaking" name _. The latter was making tracebacks a bit harder to read. -> Move the wrapper to be standalone function with _goframe name. This removes a bit of import-time overhead when @func is called, and makes tracebacks a bit more readable. But my original motivation here is to be able to detect double func(func(·)) calls and make it idempotent - see next patch for that. /reviewed-by @levin.zimmermann /reviewed-on !31
-
- 25 Sep, 2024 3 commits
-
-
Kirill Smelkov authored
Tracing import statements might be handy while debugging things related to initialization. Implementation is simple reexecution of underlying python with that same -v like we already do for -O, -E and -X. /reviewed-by @jerome /reviewed-on !30
-
Kirill Smelkov authored
We already handle -X gpython.* starting from a6b993c8 (gpython: Add way to run it with threads runtime). However any other non-gpython -X option was leading to failure - for example: (z-dev) kirr@deca:~/src/tools/go/pygolang$ gpython -X faulthandler unknown option: '-X' (well the error message was also not good) However on py3 there are useful -X options that might be handy to use, for example `-X faulthandler` and `-X importtime`. -> Add support to pymain to handle those via reexecuting underlying interpreter like we already do for -O and -E. /reviewed-by @jerome /reviewed-on nexedi/pygolang!30
-
Kirill Smelkov authored
Let's teach gpython and pymain about -E (ignore $PYTHON* environment variables) because new buildout runs python -E inside. Xavier reports: Since slapos was upgraded zc.buildout 3.0.1+slapos004, tests for slapos.rebootstrap and slapos.recipe.template fail because buildout now installs in develop with pip install --editable instead of python setup.py develop and in the process pip runs python -E, e.g. https://erp5js.nexedi.net/#/test_result_module/20240912-837A12F7/10 For the implementation use the same approach to reexecute underlying interpreter with given low-level option as we already did for -O in 8564dfdd (gpython: Implement -O). /reported-and-tested-by @xavier_thompson /reviewed-by @jerome /reviewed-on !30
-
- 23 Sep, 2024 1 commit
-
-
Kirill Smelkov authored
In 74a9838c (golang: tests: Fix for Pytest ≥ 7.4) I fixed test_defer_excchain_dump_pytest for Pytest ≥ 7.4 but missed that pytest.version_tuple is not available for Pytest < 7.0(*) which started to lead to pygolang test failures on py3 under SlapOS becuase there we are still using pytest 4.6.11 : _______________________ test_defer_excchain_dump_pytest ________________________ def test_defer_excchain_dump_pytest(): # pytest 7.4 also changed traceback output format # similarly to ipython we do not need to test it becase we activate # pytest-related patch only on py2 for which latest pytest version is 4.6.11 . import pytest > if six.PY3 and pytest.version_tuple >= (7,4): E AttributeError: module 'pytest' has no attribute 'version_tuple' https://stack.nexedi.com/test_result_module/20240920-666C5CF1/3 -> Fix that by checking pytest.version_tuple more carefully. (*) see https://docs.pytest.org/en/stable/reference/reference.html#pytest-version-tuple /reviewed-by @jerome /reviewed-on nexedi/pygolang!29
-
- 20 Jun, 2024 2 commits
-
-
Kirill Smelkov authored
This is take 2 after 924a808c (golang: Fix `@func(cls) def name` not to override `name` in calling context). There we fixed it not to override name if name was already set, but for the case of unset name it was still set. The following example was thus not working correctly as builtin `next` was shadowed: class BitSync @func(BitSync) def next(): ... # this was shadowing access to builtin next def peek(seq): return next(...) # here next was taken not from builtin, but # from result of above shadowing To solve the problem in the patch from 2019 I initially contemplated patching bytecode because python unconditionally does STORE_NAME after a function is defined with decorator: In [2]: c = """ ...: @fff ...: def ccc(): ...: return 1 ...: """ In [3]: cc = compile(c, "file", "exec") In [4]: dis(cc) 2 0 LOAD_NAME 0 (fff) 3 LOAD_CONST 0 (<code object ccc at 0x7fafe58d0130, file "file", line 2>) 6 MAKE_FUNCTION 0 9 CALL_FUNCTION 1 12 STORE_NAME 1 (ccc) <-- NOTE means: ccc = what fff() call returns 15 LOAD_CONST 1 (None) 18 RETURN_VALUE However after hitting this problem for real again and taking a fresh look I found a way to arrange for the good end result without bytecode magic: if name is initially unset @func can install its own custom object, which, when overwritten by normal python codeflow of invoking STORE_NAME after decorator, unsets the attribute. That works quite ok and the patch with the fix is small. /cc @jerome /proposed-for-review-on nexedi/pygolang!28
-
Kirill Smelkov authored
After previous patch it became unused. Should we need it again we can revert hereby commit or write it anew.
-