- 24 Sep, 2018 2 commits
-
-
Kirill Smelkov authored
Even though we tried to catch whether dict keys are ok to be used via reflect.TypeOf(key).Comparable() (see da5f0342 "decoder: Fix crashes found by fuzzer (#32)"), that turned out to be not enough. For example if key is a struct, e.g. of the following type type Ref struct { Pid interface{} } it will be comparable. But the comparision, depending on dynamic .Pid type, might panic. This is what was actually cauht by fuzz-testing recently: https://github.com/kisielk/og-rek/issues/50 (second part of the report) So instead of recursively walking a key type and checking each subfield with reflect.TypeOf().Comparable(), switch for using panic/recover for detecting the "unhashable key" situation. This slows down decoding a bit (only cumulative figure for all-test-vectors decoding): name old time/op new time/op delta DecodeLong-4 361ns ± 0% 362ns ± 0% ~ (p=0.238 n=5+4) Decode-4 93.2µs ± 0% 95.6µs ± 0% +2.54% (p=0.008 n=5+5) Encode-4 16.5µs ± 0% 16.6µs ± 0% ~ (p=0.841 n=5+5) but that is the price of correctness. And with manually recursively walking key type I doubt it would be faster. The defer overhead should be less once https://github.com/golang/go/issues/14939 is fixed. Updates: https://github.com/kisielk/og-rek/issues/30
-
Kirill Smelkov authored
In 06e06939 (encoder: Allow to specify pickle protocol version) I added ability to add to tests error cases - object inputs that on encoding should produce error. For decoding we should skip those cases as there pickle.data = "", and if not skipped it leads to --- FAIL: BenchmarkDecode ogorek_test.go:803: unexpected # of decode steps: got 100 ; want 102
-
- 21 Sep, 2018 12 commits
-
-
Kirill Smelkov authored
Updates coming via `go generate` since main tests table was amended.
-
Kirill Smelkov authored
Appologize for the breakage there.
-
Kirill Smelkov authored
When encoding tests fails, the "want" and "have" pickles are printed. It is handy to copy-paste those pickles into Python console and check them further there. Pickle printing currently uses %q. However in Go fmt's %q can use \u and \U if byte sequence form a valid UTF-8 character. That poses a problem: in Python str (py2) or bytes (py3) literal \uXXXX are not processed as unicode-escapes and enter the string as is. This result in different pickle data pasted into Python and further confusion. Entering data into Python as unicode literals (where \u works) and then adding .encode('utf-8') also does not generally work - as pickle data is generally arbitrary it can be a not valid UTF-8, for example: "\x80\u043c\u0438\u0440" (= "\x80мир" = "\x80\xd0\xbc\xd0\xb8\xd1\x80") end unicode-encoding them in python also gives different data: In [1]: u"\x80\u043c\u0438\u0440".encode('utf-8') Out[1]: '\xc2\x80\xd0\xbc\xd0\xb8\xd1\x80' (note leading extra \xc2) For this reason let's implement quoting - that Python can understand - ourselves. This dumping functionality was very handy during recent encoder fixes debugging.
-
Kirill Smelkov authored
- we can use STACK_GLOBAL only if protocol >= 4. - for earlier protocols we have to use text-based GLOBAL.
-
Kirill Smelkov authored
Similarly to dict, for struct encoding switch from protocol 1 opcodes into always using protocol 0 opcodes, which is by the way 1 byte shorter. For the reference - for structs, unlike maps, the order of emitted keys is well-defined - it is the order of fields as they are defined in the struct. This way we can precisely test encoder output on structs with more than 1 field.
-
Kirill Smelkov authored
- we can use EMPTY_DICT only if protocol >= 1 Also: similarly to list (33d1926f), since we are now using EMPTY_DICT only optionally, it is logical to swit to MARK + ... + DICT from EMPTY_DICT (or MARK + DICT @proto=0) + MARK + ... + SETITEMS which is at least 1 byte longer. For the reference - SETITEMS is also from protocol 1, while DICT is from protocol 0.
-
Kirill Smelkov authored
More corpus files appeared while running fuzz testing today for ~ 1 hour.
-
Kirill Smelkov authored
Should be better in 302c79ea (fuzz: Hook encoder into the loop), but it is hopefully never too late.
-
Kirill Smelkov authored
Found via fuzzing: "I-7\n." panic: protocol 1: decode·encode != identity: have: 4294967289 want: -7 goroutine 1 [running]: github.com/kisielk/og-rek.Fuzz(0x7f99bd8b4000, 0x5, 0x200000, 0x3) /tmp/go-fuzz-build914098789/gopath/src/github.com/kisielk/og-rek/fuzz.go:50 +0x604 go-fuzz-dep.Main(0x524df8) /tmp/go-fuzz-build914098789/goroot/src/go-fuzz-dep/main.go:49 +0xad main.main() /tmp/go-fuzz-build914098789/gopath/src/github.com/kisielk/og-rek/go.fuzz.main/main.go:10 +0x2d exit status 2 I've checked other handlers, like BININT1 and BININT2, and since there everywhere argument is unsigned, there is no similar problem. We needed previous patch on proper readLine EOF detection, because else the testcase for P0("I-7\n.") would be breaking: --- FAIL: TestDecode/int(-7)/"I-7\n." (0.00s) ogorek_test.go:401: no ErrUnexpectedEOF on [:2] truncated stream: v = <nil> err = &strconv.NumError{Func:"ParseInt", Num:"-", Err:(*errors.errorString)(0xc00000e1b0)}
-
Kirill Smelkov authored
Currently we use bufio.Reader.ReadLine which accepts either \n or \r\n as line ending. That is however not correct: - we should not accept e.g. "S'abc'\r\n." pickle, because it is invalid: In [32]: pickle.loads(b"S'abc'\r\n.") --------------------------------------------------------------------------- UnpicklingError Traceback (most recent call last) <ipython-input-32-b1da1988bae1> in <module>() ----> 1 pickle.loads(b"S'abc'\r\n.") UnpicklingError: the STRING opcode argument must be quoted - we should not accept e.g. "L123L\r\n.", because it is also invalid: In [33]: pickle.loads(b"L123L\r\n.") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-33-7231ec07f5c4> in <module>() ----> 1 pickle.loads(b"L123L\r\n.") ValueError: invalid literal for int() with base 10: '123L\r\n' - treating \r as part of EOL in e.g. UNICODE pickle would just drop encoded information: # python In [34]: pickle.loads(b"Vabc\r\n.") Out[34]: 'abc\r' while ogórek currently decodes it as just 'abc' (no trailing \r). For this reason let's fix Decoder.readLine to treat only \n as EOL. Besides this fix, we now get another property: previously, when internally using bufio.Reader.ReadLine we were not able to distinguish two situations: - a line was abruptly ended without any EOL characters at all, - a line was properly ended with EOL character. Now after we switched to internally using bufio.Reader.ReadSlice, we will be able to properly detect EOF and return that as error. This property will be needed in the following patch.
-
Kirill Smelkov authored
This way whatever/whenever we add a tricky test pickle into main tests table, it should be automatically also be present as a starting point in the fuzz corpus. This should hopefully improve fuzzing coverage.
-
Kirill Smelkov authored
Continuing 5dbc8a1b (decoder: Don't allow mark to be returned as pickle result) I discovered that the mark object can be still exposed to user, but not directly. For example the following pickle: "(\x85." // MARK + TUPLE1 was creating Tuple{mark} and returning it just ok to the user. As marker must be used only internally it is invalid to do so. Python also forbids this: In [3]: s = b"(\x85." In [4]: dis(s) 0: ( MARK 1: \x85 TUPLE1 2: . STOP highest protocol among opcodes = 2 In [5]: pickle.loads(s) --------------------------------------------------------------------------- UnpicklingError Traceback (most recent call last) <ipython-input-5-764e4625bc41> in <module>() ----> 1 pickle.loads(s) UnpicklingError: unexpected MARK found So let's close all (hopefully) holes where mark object could be returned to user in a similar way.
-
- 20 Sep, 2018 5 commits
-
-
Kirill Smelkov authored
New test vectors caught by decode·encode idempotency fuzzing.
-
Kirill Smelkov authored
A pickle is considered as invalid if it tries to return MARK as the result by both Python2 and Python3, e.g.: In [2]: pickle.loads(b"(.") --------------------------------------------------------------------------- UnpicklingError Traceback (most recent call last) <ipython-input-2-0c142c82b126> in <module>() ----> 1 pickle.loads(b"(.") UnpicklingError: unexpected MARK found However until now, despite mark is unexported ogórek type, we were allowing for it to be returned just ok. The problem was caught by decode/encode roundtrip fuzz tests, e.g. "(Q." panic: protocol 1: decode·encode != identity: have: ogórek.Ref{Pid:map[interface {}]interface {}{}} want: ogórek.Ref{Pid:ogórek.mark{}} goroutine 1 [running]: github.com/kisielk/og-rek.Fuzz(0x7fbe6c15f000, 0x3, 0x200000, 0x3) /tmp/go-fuzz-build697921479/gopath/src/github.com/kisielk/og-rek/fuzz.go:87 +0x604 go-fuzz-dep.Main(0x524d78) /tmp/go-fuzz-build697921479/goroot/src/go-fuzz-dep/main.go:49 +0xad main.main() /tmp/go-fuzz-build697921479/gopath/src/github.com/kisielk/og-rek/go.fuzz.main/main.go:10 +0x2d exit status 2
-
Kirill Smelkov authored
We can enhance our fuzz-testing coverage by hooking Encoder also into the loop: if input data is suceessfully decoded, we have an object that can be passed back to Encoder to generate a pickle. We can't tell that that pickle must be the same as original input data, since pickle machine allows multiple representations of the same data. However we can assert that when that pickle is decoded back it should be the same as encoded object. This catches several problems: - marker is currently returned as pickle result (see next patch). - text-based STRING and UNICODE are not properly decoded (no fix yet). - self-referencing data structures kill Encoder (no fix yet).
-
Kirill Smelkov authored
Add more files go-fuzz put to corpus while discovering the crash fixed in previous commit.
-
Kirill Smelkov authored
BINSTRING opcode follows 4-byte length of the data and the data itself. Upon seeing the BINSTRING len header we were preallocating destination buffer with len as optimization (see 14aaa14f "decoder: Preallocate .buf capacity when we know (approximate) size of output to it"). However this way it is easy for a malicious pickle to specify BINSTRING biglength (4GB) and no data, and then the goroutine that processes such input will allocate 4GB immediately, which in turn might cause out-of-memory DOS. The other places where we currently grow Decoder.buf all either have 1 or 2 byte length (thus limited to 64K), or the length of the data that was already read from input stream. The problem was found by rerunning fuzz tests: "T0000" fatal error: runtime: out of memory runtime stack: runtime.throw(0x514fff, 0x16) /tmp/go-fuzz-build515164548/goroot/src/runtime/panic.go:608 +0x72 runtime.sysMap(0xc004000000, 0x34000000, 0x5f83d8) /tmp/go-fuzz-build515164548/goroot/src/runtime/mem_linux.go:156 +0xc7 runtime.(*mheap).sysAlloc(0x5dfd40, 0x34000000, 0x43c0e3, 0x7ffc94fe56b8) /tmp/go-fuzz-build515164548/goroot/src/runtime/malloc.go:619 +0x1c7 runtime.(*mheap).grow(0x5dfd40, 0x18182, 0x0) /tmp/go-fuzz-build515164548/goroot/src/runtime/mheap.go:920 +0x42 runtime.(*mheap).allocSpanLocked(0x5dfd40, 0x18182, 0x5f83e8, 0x203000) /tmp/go-fuzz-build515164548/goroot/src/runtime/mheap.go:848 +0x337 runtime.(*mheap).alloc_m(0x5dfd40, 0x18182, 0x7ffc94fe0101, 0x40a87f) /tmp/go-fuzz-build515164548/goroot/src/runtime/mheap.go:692 +0x119 runtime.(*mheap).alloc.func1() /tmp/go-fuzz-build515164548/goroot/src/runtime/mheap.go:759 +0x4c runtime.(*mheap).alloc(0x5dfd40, 0x18182, 0x7ffc94010101, 0x4135f5) /tmp/go-fuzz-build515164548/goroot/src/runtime/mheap.go:758 +0x8a runtime.largeAlloc(0x30303030, 0x440101, 0xc00005e240) /tmp/go-fuzz-build515164548/goroot/src/runtime/malloc.go:1019 +0x97 runtime.mallocgc.func1() /tmp/go-fuzz-build515164548/goroot/src/runtime/malloc.go:914 +0x46 runtime.systemstack(0x44eea9) /tmp/go-fuzz-build515164548/goroot/src/runtime/asm_amd64.s:351 +0x66 runtime.mstart() /tmp/go-fuzz-build515164548/goroot/src/runtime/proc.go:1229 goroutine 1 [running]: runtime.systemstack_switch() /tmp/go-fuzz-build515164548/goroot/src/runtime/asm_amd64.s:311 fp=0xc000034310 sp=0xc000034308 pc=0x44efa0 runtime.mallocgc(0x30303030, 0x4edc00, 0x1, 0xc0000343e8) /tmp/go-fuzz-build515164548/goroot/src/runtime/malloc.go:913 +0x896 fp=0xc0000343b0 sp=0xc000034310 pc=0x40ae26 runtime.makeslice(0x4edc00, 0x30303030, 0x30303030, 0xc000018108, 0x4, 0x4) /tmp/go-fuzz-build515164548/goroot/src/runtime/slice.go:70 +0x77 fp=0xc0000343e0 sp=0xc0000343b0 pc=0x43b1d7 bytes.makeSlice(0x30303030, 0x0, 0x0, 0x0) /tmp/go-fuzz-build515164548/goroot/src/bytes/buffer.go:231 +0x9d fp=0xc000034420 sp=0xc0000343e0 pc=0x4b204d bytes.(*Buffer).grow(0xc000084030, 0x30303030, 0x0) /tmp/go-fuzz-build515164548/goroot/src/bytes/buffer.go:144 +0x2e4 fp=0xc000034470 sp=0xc000034420 pc=0x4b1604 bytes.(*Buffer).Grow(0xc000084030, 0x30303030) /tmp/go-fuzz-build515164548/goroot/src/bytes/buffer.go:163 +0x86 fp=0xc000034498 sp=0xc000034470 pc=0x4b18b6 github.com/kisielk/og-rek.(*Decoder).loadBinString(0xc000084000, 0x203054, 0x0) /tmp/go-fuzz-build515164548/gopath/src/github.com/kisielk/og-rek/ogorek.go:646 +0x143 fp=0xc000034520 sp=0xc000034498 pc=0x4d5103 github.com/kisielk/og-rek.(*Decoder).Decode(0xc000084000, 0xc000080000, 0xc000084000, 0xf7eb9fa, 0x586dc) /tmp/go-fuzz-build515164548/gopath/src/github.com/kisielk/og-rek/ogorek.go:227 +0xf94 fp=0xc0000346d8 sp=0xc000034520 pc=0x4d09b4 github.com/kisielk/og-rek.Fuzz(0x7fbccd400000, 0x5, 0x200000, 0xc000034748) /tmp/go-fuzz-build515164548/gopath/src/github.com/kisielk/og-rek/fuzz.go:12 +0xb0 fp=0xc000034710 sp=0xc0000346d8 pc=0x4cf640 go-fuzz-dep.Main(0x519cf0) /tmp/go-fuzz-build515164548/goroot/src/go-fuzz-dep/main.go:49 +0xad fp=0xc000034780 sp=0xc000034710 pc=0x4642fd main.main() /tmp/go-fuzz-build515164548/gopath/src/github.com/kisielk/og-rek/go.fuzz.main/main.go:10 +0x2d fp=0xc000034798 sp=0xc000034780 pc=0x4db1ad runtime.main() /tmp/go-fuzz-build515164548/goroot/src/runtime/proc.go:201 +0x207 fp=0xc0000347e0 sp=0xc000034798 pc=0x428ec7 runtime.goexit() /tmp/go-fuzz-build515164548/goroot/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0000347e8 sp=0xc0000347e0 pc=0x450f01 exit status 2
-
- 19 Sep, 2018 16 commits
-
-
Kirill Smelkov authored
- we can use BINSTRING* only if protocol >= 1; - at protocol 0 we thus have to use text STRING; - if protocol >= 3 we have to emit the string as unicode pickle object the same way as Python3 does. If we don't do - Python3 won't be generally able to load our pickle: In [1]: s = b'U\x06\xd0\xbc\xd0\xb8\xd1\x80q\x00.' In [2]: from pickletools import dis In [3]: dis(s) 0: U SHORT_BINSTRING 'миÑ\x80' 8: q BINPUT 0 10: . STOP highest protocol among opcodes = 1 In [4]: import pickle In [5]: pickle.loads(s) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-5-764e4625bc41> in <module>() ----> 1 pickle.loads(s) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) We already decode unicode pickle objects into string, this way decode(encode(string)) remains always idempotent.
-
Kirill Smelkov authored
- we can use BINPERSID only if protocol >= 1 - we can use text PERSID only if protocol < 4 and Ref string does not contain \n - else encoding have to fail. Tests for Ref encoding at protocols 3 & 4 will follow after string encoding is fixed.
-
Kirill Smelkov authored
- we can use EMPTY_LIST only if protocol >= 1 Also: since we are now using EMPTY_LIST only optionally, it is logical to switch to MARK + ... + LIST instead of EMPTY_LIST (or MARK + LIST @proto=0) + MARK + ... + APPENDS which is at least 1 byte longer. For the reference - APPENDS is also from protocol 1, while LIST is from protocol 0.
-
Kirill Smelkov authored
- we can use EMPTY_TUPLE only if protocol >= 1 - also: if protocol >= 2 we can now use TUPLE{1,2,3} opcodes.
-
Kirill Smelkov authored
Corresponding TODO in the code was added in 4fd6be93 (encoder: Adjust it so that decode(encode(v)) == v)
-
Kirill Smelkov authored
We can use BINFLOAT opcode only starting from protocol >= 1. At protocol 0 we must use ASCII FLOAT.
-
Kirill Smelkov authored
We can use BININT* opcodes only starting from protocol >= 1. At protocol 0 we must use ASCII INT.
-
Kirill Smelkov authored
Starting from protocol 2 1-byte NEWTRUE/NEWFALSE opcodes are more efficient compared to 5-bytes e.g. "I01\n.". It is not only about efficiency, as protocol 4 _forbids_ use of variable length ASCII-only opcodes - whose data length is determined by doing forward scan for '\n'. Without encodeBool changes and only with the tests added it would fail this way: --- FAIL: TestEncode/True/proto=2 (0.00s) ogorek_test.go:383: encode: have: "\x80\x02I01\n." want: "\x80\x02\x88." --- FAIL: TestEncode/True/proto=3 (0.00s) ogorek_test.go:383: encode: have: "\x80\x03I01\n." want: "\x80\x03\x88." --- FAIL: TestEncode/True/proto=4 (0.00s) ogorek_test.go:383: encode: have: "\x80\x04I01\n." want: "\x80\x04\x88." --- FAIL: TestEncode/False/proto=2 (0.00s) ogorek_test.go:383: encode: have: "\x80\x02I00\n." want: "\x80\x02\x89." --- FAIL: TestEncode/False/proto=3 (0.00s) ogorek_test.go:383: encode: have: "\x80\x03I00\n." want: "\x80\x03\x89." --- FAIL: TestEncode/False/proto=4 (0.00s) ogorek_test.go:383: encode: have: "\x80\x04I00\n." want: "\x80\x04\x89."
-
Kirill Smelkov authored
There are many pickle protocol versions - 0 to 4. Python2 for example understands only versions 0 - 2. However we currently unconditionally emit opcodes from higher versions, for example STACK_GLOBAL - from version 4 - when encoding a Class, which leads to inability to decode pickles generated by ogórek on Python2. Similarly protocol 0 states that only text opcodes should be used, however we currently unconditionally emit e.g. BININT (from protocol 1) when encoding integers. Changing to always using protocol 0 opcodes would be not good, since many opcodes for efficiently encoding either integers, booleans, unicode etc are available only in protocol versions 2 and 4. For this reason, similarly to Python[1], let's allow users to specify desired pickle protocol when creating Encoder with config. For backward compatibility and common sense the protocol version that plain NewEncoder selects is 2. This commit adds only above-described user interface and testing infrastructure for verifying what was the result of encoding an object at particular protocol version. For now only a few of pickle test vectors are right wrt what the encoder should be or currently generates. Thus in the next patches we'll be step-by-step fixing encoder on this topic. [1] https://docs.python.org/3/library/pickle.html#pickle.dump
-
Kirill Smelkov authored
For now all decoding all those pickles is tested to give the same object, as in e.g. "(I1\nI2\ntp0\n.", // MARK + TUPLE + INT and "I1\nI2\n\x86."), // TUPLE2 + INT But having a way to specify several pickles to a test case will become even more handy when later adding precise tests for Encoder - there we will need to assert that at such and such protocol encoding gives such and such pickles. And there are 5 protocol versions to test...
-
Kirill Smelkov authored
Previously there were two separate tables - for decode and encode tests. The table for encode tests was very small. TestDecode, which was operating on data from decode table, was also performing checks that decode(encode(object)) is idempotent - the work which is already too by TestEncode. However the coverage of input objects from decode table was not a strict superset of encode table objects. For the reasons above let's stop this divergence. Let's have a common table that define tests where for every test case there can be: - an "in" object, - a pickle, and - an "out" object where 1. pickle must decode to "out" object, and 2. encoding "in" object must give some pickle that decodes to "out" object. NOTE: In the usual case "in" object == "out" object and they can only differ if "in" object contains a Go struct. This will allow us to cover all existing decode and encode tests logic. However the coverage of each logic is now higher - for example Encoder tests are now run on every object from main table, not only for 3 cases like it was before.
-
Kirill Smelkov authored
Similarly to TestDecode. No integration with main tests table yet.
-
Kirill Smelkov authored
We are going to integrate encoding tests with the main tests data table in several steps. Only code movement here, no other change.
-
Kirill Smelkov authored
We are going to modify the main tests table and for every case (e.g. int(1) add pickles with various encoding that all represent the same object when decoded. The main TestDecode driver will iterate over all those input data and hand it over for particular testing to testDecode worker. Use t.Run for spawning each case - it is less typing (we do not need to manually print test.name on errors, etc), and it allows to select which subtests to run via e.g. `go test -run Decode/int`
-
Kirill Smelkov authored
Having that long data makes the table clumsy and harder to understand. By moving such data definition out of the table, we make it a bit easier to understand. In the case of "long line", the pickle input and the line itself were almost duplicating each other, so instead of having two long lines explicitly pasted, let's have the test input be defined as + {"too long line", "V" + longLine + "\n.", longLine}, In the case of Graphite messages, one long Graphite object was also duplicated in encode_test.go . Just a cleanup, no semantic change.
-
Kirill Smelkov authored
It will be handy to use testing.T.Run in the upcoming patches that add more decode/encode tests. However since testing.T.Run was was added in Go1.7 the code won't work with Go1.6 . Since Go1.6 was released in the beginning of 2016 - more than 2.5 years ago, and upstream Go policy is to support only current stable (currently Go1.11) and two previous releases (currently Go1.10 and Go1.9), as of today Go1.6 is both not supported and outdated. So let's drop explicit support for it in the name of progress. By above logic we could also drop support for 1.7 and 1.8, but since there is no pressing particular need to do so I'm not dropping them for now.
-
- 18 Sep, 2018 5 commits
-
-
Kirill Smelkov authored
Currently we often use constructs like _, err := e.w.Write([]byte{opNone}) return err however with added Encoder.emit helper it can become only return e.emit(opNone) which is more clearly readable. We can do similarly for formatted output: - _, err := fmt.Fprintf(e.w, "%c%dL\n", opLong, b) - return err + return e.emitf("%c%dL\n", opLong, b) Due to much boilerplate in one place (encodeInt) the return of fmt.Fprintf was not even checked. We change the code there with - _, err = e.w.Write([]byte{opInt}) - if err != nil { - return err - } - fmt.Fprintf(e.w, "%d\n", i) + return e.emitf("%c%d\n", opInt, i) which is now hopefully more visible for what is going on and is easier to catch potential mistake by eyes. A corresponding test for "int64 encoded as string" will be added later.
-
Kirill Smelkov authored
Protocol 3 adds opcodes to represent Python bytes. For Protocol 4 we were missing definitions of several of the opcodes, such as BINUNICODE8, EMPTY_SET etc. Add definitions for them all.
-
Kirill Smelkov authored
We already keep "Protocol 2" and "Protocol 4" opcodes separately, and for a good reason - it must be clear for a person just by looking at opcodes definition from which protocol version an opcode comes. However at present Protocol 0 and Protocol 1 opcodes come all intermixed with each other. In the upcoming patches we'll be teaching Encoder to take desired pickle protocol version into account (similarly to how pickle.dumps accepts desired protocol version in Python), and then the encoder will have to use different opcodes for different versions: for example if user asks to produce "protocol 0" pickle stream it will be invalid to use "protocol 1" opcodes - e.g. BININT, EMPTY_TUPLE, etc. So I went through https://github.com/python/cpython/blob/master/Lib/pickletools.py for each opcode searching its protocol version and separated the opcodes with proto=1 from protocol 0 opcodes.
-
Kirill Smelkov authored
It was probably a thinko in d00e99e7 (Add more opcodes so we can read Graphite's Carbon stream) which changed opcodes from string ("X") to character ('X') constants and marked the first opcode with byte type, probably with the idea that following opcodes will have the same type. Unfortunately it is not so as the following program demonstrates: package main const ( opAAA byte = 'a' opBBB = 'b' ) func main() { op := opBBB if true { op = opAAA // <-- line 11 } println(op) } --> ./bc.go:11:6: cannot use opAAA (type byte) as type rune in assignment Similarly if we try comparing opcodes, it also results in compile error: func main() { op := opBBB if op == opAAA { // <-- line 10 panic(0) } println(op) } --> ./bc.go:10:8: invalid operation: op == opAAA (mismatched types rune and byte) Since in the following patches it will be handy for encoder to e.g. set default opcode first, and then change it under some conditions to another opcode, exactly the same compile error(s) will pop up. So let's fix all opcode constants to have their type as byte to avoid such unexpected errors.
-
Kirill Smelkov authored
For example d.push(math.Float64frombits(u)) looks more clear compared to d.stack = append(d.stack, math.Float64frombits(u)) and we already use push in many other places.
-