Commit b28613c2 authored by Kirill Smelkov's avatar Kirill Smelkov Committed by Kamil Kisiel

Add StrictUnicode mode

Up till now ogórek works relatively well for me but experience gained
with ZODB/go and Wendelin.core highlighted two problems with strings:

1) loading, then re-saving string data is not generally identity, and
2) there is no way to distinguish binary data saved via py2 str from
   unicode strings saved via either py2 unicode or py3 str.

Let me explain those problems:

1) Loading, then re-saving string data is not generally identity
----------------------------------------------------------------

On decoding ogórek currently loads both byte-string (*STRING opcodes)
and unicode string (*UNICODE opcodes) into the same Go type string. And
on encoding that string type is encoded as *STRING for protocol <= 2 and
as *UNICODE for protocol >= 3. It was me to implement that encoding
logic in 2018 in e7d96969 (encoder: Fix string wrt protocol version)
where in particular I did

    - if protocol >= 3 we have to emit the string as unicode pickle object
      the same way as Python3 does. If we don't do - Python3 won't be
      generally able to load our pickle ...

with the idea that protocol=3 always means that the pickle is intended
to be for Python3.

But there I missed that zodbpickle can use and generate pickles with
protocol=3 even on py2 and that ZODB/py2 actually uses this protocol=3
mode since https://github.com/zopefoundation/ZODB/commit/12ee41c4
authored in the same 2018.

So, there can be pickles saved under protocol=3 that do contain both
*STRING and *UNICODE opcodes, and when ogórek sees those it loads that
*STRING and *UNICODE data as just string loosing the information about
type of particular variant. And then on encoding the data is saved as
all UNICODE, or all STRING, thus breaking decode/encode=identity
property.

This breakage is there even with plain pickle and protocol=2 - when both
*STRING and *UNICODE are in the database, ogórek loads them both as the
same Go type string, and then saving back under the same protocol=2 goes
as all STRING resulting in unicode objects becoming str on resave
without any intended change.

2) there is no way to distinguish binary data saved via py2 str from unicode strings saved via either py2 unicode or py3 str
----------------------------------------------------------------------------------------------------------------------------

Continuing above example of py2 database with both *STRING and *UNICODE
opcodes present there is currently no way for application to distinguish
those from each other. In other words there is currently no way for the
application to distinguish whether it is binary data coming from py2
protocol <= 2 era, from unicode text.

The latter problem I hit for real: with Wendelin.core we have lots of
data saved from under Python2 as just str. And the Go part of
Wendlin.core, upon loading block of data, wants to accept only binary -
either bytes from py3 or bytestring from py2, but not unicode, because
it indicates a mistake if e.g. a ZBlk object would come with unicode data

https://lab.nexedi.com/nexedi/wendelin.core/-/blob/07087ec8/bigfile/file_zodb.py#L267-300
https://lab.nexedi.com/nexedi/wendelin.core/-/blob/07087ec8/wcfs/internal/zdata/zblk.go#L31-32
https://lab.nexedi.com/nexedi/wendelin.core/-/blob/07087ec8/wcfs/internal/zdata/zblk.go#L98-107

but there is currently no way to distinguish whether it was unicode or
bytestring saved into the database becase they both are represented as
the same Go string type after decoding.

----------------------------------------

So to solve those problems I thought it over and understood that the
issues start to appear becase we let *STRING and *UNICODE to become
mixed into the same entity on loading. This behaviour is there from
ogórek beginning and the intention, it seems, was to get the data from
the pickle stream in an easily-accepted form on the Go side. However the
convenience turned out to come with cost of loosing some correctness in
the general case as explained above.

So if we are to fix the correctness we need to change that behaviour and
load *STRING and *UNICODE opcodes into distinct types, so that the
information about what was what is preserved and it becomes possible to
distinguish bytestring from unicode strings and resave the data in
exactly the same form as loaded. Though we can do this only under opt-in
option with default behaviour staying as it was before to preserve
backward compatibility.

-> Do it.

Below is excerpt from doc.go and DecoderConfig and EncoderConfig changes that
describe the new system:

    For strings there are two modes. In the first, default, mode both py2/py3
    str and py2 unicode are decoded into string with py2 str being considered
    as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go string is encoded as
    UTF-8 encoded py2 str, and for protocol ≥ 3 as py3 str / py2 unicode.
    ogórek.ByteString can be used to produce bytestring objects after encoding
    even for protocol ≥ 3. This mode tries to match Go string with str type of
    target Python depending on protocol version, but looses information after
    decoding/encoding cycle:

        py2/py3 str    string                       StrictUnicode=n mode, default
        py2 unicode  →  string
        py2 str      ←  ogórek.ByteString

    However with StrictUnicode=y mode there is 1-1 mapping in between py2
    unicode / py3 str vs Go string, and between py2 str vs ogórek.ByteString.
    In this mode decoding/encoding and encoding/decoding operations are always
    identity with respect to strings:

        py2 unicode / py3 str    string             StrictUnicode=y mode
        py2 str                  ogórek.ByteString

    For bytes, unconditionally to string mode, there is direct 1-1 mapping in
    between Python and Go types:

        bytes          ogórek.Bytes   (~)
        bytearray      []byte

    --------

    type DecoderConfig struct {
        // StrictUnicode, when true, requests to decode to Go string only
        // Python unicode objects. Python2 bytestrings (py2 str type) are
        // decoded into ByteString in this mode...
        StrictUnicode bool
    }

    type EncoderConfig struct {
        // StrictUnicode, when true, requests to always encode Go string
        // objects as Python unicode independently of used pickle protocol...
        StrictUnicode bool
    }

Since strings are now split into two types, string and ByteString, and
ByteString can either mean text or binary data, new AsString and AsBytes
helpers are also added to handle string and binary data in uniform way
supporting both py2 and py3 databases. Corresponding excerpts from
doc.go and typeconv.go changes with the description of those helpers
come below:

    On Python3 strings are unicode strings and binary data is represented by
    bytes type. However on Python2 strings are bytestrings and could contain
    both text and binary data. In the default mode py2 strings, the same way as
    py2 unicode, are decoded into Go strings. However in StrictUnicode mode py2
    strings are decoded into ByteString - the type specially dedicated to
    represent them on Go side. There are two utilities to help programs handle
    all those bytes/string data in the pickle stream in uniform way:

        - the program should use AsString if it expects text   data -
          either unicode string, or byte string.
        - the program should use AsBytes  if it expects binary data -
          either bytes, or byte string.

    Using the helpers fits into Python3 strings/bytes model but also allows to
    handle the data generated from under Python2.

    --------

    // AsString tries to represent unpickled value as string.
    //
    // It succeeds only if the value is either string, or ByteString.
    // It does not succeed if the value is Bytes or any other type.
    //
    // ByteString is treated related to string because ByteString represents str
    // type from py2 which can contain both string and binary data.
    func AsString(x interface{}) (string, error) {

    // AsBytes tries to represent unpickled value as Bytes.
    //
    // It succeeds only if the value is either Bytes, or ByteString.
    // It does not succeed if the value is string or any other type.
    //
    // ByteString is treated related to Bytes because ByteString represents str
    // type from py2 which can contain both string and binary data.
    func AsBytes(x interface{}) (Bytes, error) {

ZODB/go and Wendelin.core intend to switch to using StrictUnicode mode
while leaving ogórek to remain 100% backward-compatible in its default
mode for other users.
parent 010fbd2e
...@@ -26,11 +26,37 @@ ...@@ -26,11 +26,37 @@
// tuple ↔ ogórek.Tuple // tuple ↔ ogórek.Tuple
// dict ↔ map[interface{}]interface{} // dict ↔ map[interface{}]interface{}
// //
// str ↔ string (+) //
// For strings there are two modes. In the first, default, mode both py2/py3
// str and py2 unicode are decoded into string with py2 str being considered
// as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go string is encoded as
// UTF-8 encoded py2 str, and for protocol ≥ 3 as py3 str / py2 unicode.
// ogórek.ByteString can be used to produce bytestring objects after encoding
// even for protocol ≥ 3. This mode tries to match Go string with str type of
// target Python depending on protocol version, but looses information after
// decoding/encoding cycle:
//
// py2/py3 str ↔ string StrictUnicode=n mode, default
// py2 unicode → string
// py2 str ← ogórek.ByteString
//
// However with StrictUnicode=y mode there is 1-1 mapping in between py2
// unicode / py3 str vs Go string, and between py2 str vs ogórek.ByteString.
// In this mode decoding/encoding and encoding/decoding operations are always
// identity with respect to strings:
//
// py2 unicode / py3 str ↔ string StrictUnicode=y mode
// py2 str ↔ ogórek.ByteString
//
//
// For bytes, unconditionally to string mode, there is direct 1-1 mapping in
// between Python and Go types:
//
// bytes ↔ ogórek.Bytes (~) // bytes ↔ ogórek.Bytes (~)
// bytearray ↔ []byte // bytearray ↔ []byte
// //
// //
//
// Python classes and instances are mapped to Class and Call, for example: // Python classes and instances are mapped to Class and Call, for example:
// //
// Python Go // Python Go
...@@ -112,15 +138,28 @@ ...@@ -112,15 +138,28 @@
// stream. For example AsInt64 tries to represent unpickled value as int64 if // stream. For example AsInt64 tries to represent unpickled value as int64 if
// possible and errors if not. // possible and errors if not.
// //
// For strings the situation is similar, but a bit different.
// On Python3 strings are unicode strings and binary data is represented by
// bytes type. However on Python2 strings are bytestrings and could contain
// both text and binary data. In the default mode py2 strings, the same way as
// py2 unicode, are decoded into Go strings. However in StrictUnicode mode py2
// strings are decoded into ByteString - the type specially dedicated to
// represent them on Go side. There are two utilities to help programs handle
// all those bytes/string data in the pickle stream in uniform way:
//
// - the program should use AsString if it expects text data -
// either unicode string, or byte string.
// - the program should use AsBytes if it expects binary data -
// either bytes, or byte string.
//
// Using the helpers fits into Python3 strings/bytes model but also allows to
// handle the data generated from under Python2.
//
// //
// -------- // --------
// //
// (*) ogórek is Polish for "pickle". // (*) ogórek is Polish for "pickle".
// //
// (+) for Python2 both str and unicode are decoded into string with Python
// str being considered as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go
// string is encoded as UTF-8 encoded Python str, and for protocol ≥ 3 as unicode.
//
// (~) bytes can be produced only by Python3 or zodbpickle (https://pypi.org/project/zodbpickle), // (~) bytes can be produced only by Python3 or zodbpickle (https://pypi.org/project/zodbpickle),
// not by standard Python2. Respectively, for protocol ≤ 2, what ogórek produces // not by standard Python2. Respectively, for protocol ≤ 2, what ogórek produces
// is unpickled as bytes by Python3 or zodbpickle, and as str by Python2. // is unpickled as bytes by Python3 or zodbpickle, and as str by Python2.
......
...@@ -14,7 +14,7 @@ import ( ...@@ -14,7 +14,7 @@ import (
const highestProtocol = 5 // highest protocol version we support generating const highestProtocol = 5 // highest protocol version we support generating
// unicode is string that always encodes as unicode pickle object. // unicode is string that always encodes as unicode pickle object.
// (regular string encodes to unicode pickle object only for protocol >= 3) // (regular string encodes to unicode pickle object only for protocol >= 3 by default)
type unicode string type unicode string
type TypeError struct { type TypeError struct {
...@@ -45,6 +45,12 @@ type EncoderConfig struct { ...@@ -45,6 +45,12 @@ type EncoderConfig struct {
// //
// See Ref documentation for more details. // See Ref documentation for more details.
PersistentRef func(obj interface{}) *Ref PersistentRef func(obj interface{}) *Ref
// StrictUnicode, when true, requests to always encode Go string
// objects as Python unicode independently of used pickle protocol.
// See StrictUnicode mode documentation in top-level package overview
// for details.
StrictUnicode bool
} }
// NewEncoder returns a new Encoder struct with default values // NewEncoder returns a new Encoder struct with default values
...@@ -120,6 +126,8 @@ func (e *Encoder) encode(rv reflect.Value) error { ...@@ -120,6 +126,8 @@ func (e *Encoder) encode(rv reflect.Value) error {
return e.encodeUnicode(rv.String()) return e.encodeUnicode(rv.String())
case Bytes: case Bytes:
return e.encodeBytes(Bytes(rv.String())) return e.encodeBytes(Bytes(rv.String()))
case ByteString:
return e.encodeByteString(rv.String())
default: default:
return e.encodeString(rv.String()) return e.encodeString(rv.String())
} }
...@@ -302,7 +310,7 @@ func (e *Encoder) encodeBytes(byt Bytes) error { ...@@ -302,7 +310,7 @@ func (e *Encoder) encodeBytes(byt Bytes) error {
return e.encodeCall(&Call{ return e.encodeCall(&Call{
Callable: Class{Module: "_codecs", Name: "encode"}, Callable: Class{Module: "_codecs", Name: "encode"},
Args: Tuple{ulatin1, "latin1"}, Args: Tuple{ulatin1, ByteString("latin1")},
}) })
} }
...@@ -329,12 +337,17 @@ func (e *Encoder) encodeByteArray(bv []byte) error { ...@@ -329,12 +337,17 @@ func (e *Encoder) encodeByteArray(bv []byte) error {
} }
func (e *Encoder) encodeString(s string) error { func (e *Encoder) encodeString(s string) error {
// protocol >= 3 -> encode string as unicode object // StrictUnicode || protocol >= 3 -> encode string as unicode object as py3 does
// (as python3 does) if e.config.StrictUnicode || e.config.Protocol >= 3 {
if e.config.Protocol >= 3 {
return e.encodeUnicode(s) return e.encodeUnicode(s)
// !StrictUnicode && protocol <= 2 -> encode string as bytestr object as py2 does
} else {
return e.encodeByteString(s)
} }
}
func (e *Encoder) encodeByteString(s string) error {
l := len(s) l := len(s)
// protocol >= 1 -> BINSTRING* // protocol >= 1 -> BINSTRING*
......
...@@ -9,9 +9,22 @@ import ( ...@@ -9,9 +9,22 @@ import (
) )
func Fuzz(data []byte) int { func Fuzz(data []byte) int {
f1 := fuzz(data, false)
f2 := fuzz(data, true)
f := f1+f2
if f > 1 {
f = 1
}
return f
}
func fuzz(data []byte, strictUnicode bool) int {
// obj = decode(data) - this tests things like stack overflow in Decoder // obj = decode(data) - this tests things like stack overflow in Decoder
buf := bytes.NewBuffer(data) buf := bytes.NewBuffer(data)
dec := NewDecoder(buf) dec := NewDecoderWithConfig(buf, &DecoderConfig{
StrictUnicode: strictUnicode,
})
obj, err := dec.Decode() obj, err := dec.Decode()
if err != nil { if err != nil {
return 0 return 0
...@@ -25,9 +38,12 @@ func Fuzz(data []byte) int { ...@@ -25,9 +38,12 @@ func Fuzz(data []byte) int {
// because obj - as we got it as decoding from input - is known not to // because obj - as we got it as decoding from input - is known not to
// contain arbitrary Go structs. // contain arbitrary Go structs.
for proto := 0; proto <= highestProtocol; proto++ { for proto := 0; proto <= highestProtocol; proto++ {
subj := fmt.Sprintf("strictUnicode %v: protocol %d", strictUnicode, proto)
buf.Reset() buf.Reset()
enc := NewEncoderWithConfig(buf, &EncoderConfig{ enc := NewEncoderWithConfig(buf, &EncoderConfig{
Protocol: proto, Protocol: proto,
StrictUnicode: strictUnicode,
}) })
err = enc.Encode(obj) err = enc.Encode(obj)
if err != nil { if err != nil {
...@@ -46,19 +62,21 @@ func Fuzz(data []byte) int { ...@@ -46,19 +62,21 @@ func Fuzz(data []byte) int {
// we cannot encode Class (GLOBAL opcode) with \n at proto <= 4 // we cannot encode Class (GLOBAL opcode) with \n at proto <= 4
continue continue
} }
panic(fmt.Sprintf("protocol %d: encode error: %s", proto, err)) panic(fmt.Sprintf("%s: encode error: %s", subj, err))
} }
encoded := buf.String() encoded := buf.String()
dec = NewDecoder(bytes.NewBufferString(encoded)) dec = NewDecoderWithConfig(bytes.NewBufferString(encoded), &DecoderConfig{
StrictUnicode: strictUnicode,
})
obj2, err := dec.Decode() obj2, err := dec.Decode()
if err != nil { if err != nil {
// must succeed, as buf should contain valid pickle from encoder // must succeed, as buf should contain valid pickle from encoder
panic(fmt.Sprintf("protocol %d: decode back error: %s\npickle: %q", proto, err, encoded)) panic(fmt.Sprintf("%s: decode back error: %s\npickle: %q", subj, err, encoded))
} }
if !reflect.DeepEqual(obj, obj2) { if !reflect.DeepEqual(obj, obj2) {
panic(fmt.Sprintf("protocol %d: decode·encode != identity:\nhave: %#v\nwant: %#v", proto, obj2, obj)) panic(fmt.Sprintf("%s: decode·encode != identity:\nhave: %#v\nwant: %#v", subj, obj2, obj))
} }
} }
......
V\u65e5\u672c\u8a9e
.
\ No newline at end of file
...@@ -132,11 +132,19 @@ type Tuple []interface{} ...@@ -132,11 +132,19 @@ type Tuple []interface{}
// Bytes represents Python's bytes. // Bytes represents Python's bytes.
type Bytes string type Bytes string
// make Bytes and unicode to be represented by %#v distinctly from string // ByteString represents str from Python2 in StrictUnicode mode.
//
// See StrictUnicode mode documentation in top-level package overview for details.
type ByteString string
// make Bytes, ByteString and unicode to be represented by %#v distinctly from string
// (without GoString %#v emits just "..." for all string, Bytes and unicode) // (without GoString %#v emits just "..." for all string, Bytes and unicode)
func (v Bytes) GoString() string { func (v Bytes) GoString() string {
return fmt.Sprintf("%T(%#v)", v, string(v)) return fmt.Sprintf("%T(%#v)", v, string(v))
} }
func (v ByteString) GoString() string {
return fmt.Sprintf("%T(%#v)", v, string(v))
}
func (v unicode) GoString() string { func (v unicode) GoString() string {
return fmt.Sprintf("%T(%#v)", v, string(v)) return fmt.Sprintf("%T(%#v)", v, string(v))
} }
...@@ -175,6 +183,12 @@ type DecoderConfig struct { ...@@ -175,6 +183,12 @@ type DecoderConfig struct {
// //
// See Ref documentation for more details. // See Ref documentation for more details.
PersistentLoad func(ref Ref) (interface{}, error) PersistentLoad func(ref Ref) (interface{}, error)
// StrictUnicode, when true, requests to decode to Go string only
// Python unicode objects. Python2 bytestrings (py2 str type) are
// decoded into ByteString in this mode. See StrictUnicode mode
// documentation in top-level package overview for details.
StrictUnicode bool
} }
// NewDecoder constructs a new Decoder which will decode the pickle stream in r. // NewDecoder constructs a new Decoder which will decode the pickle stream in r.
...@@ -687,7 +701,7 @@ var errCallNotHandled = errors.New("handleCall: call not handled") ...@@ -687,7 +701,7 @@ var errCallNotHandled = errors.New("handleCall: call not handled")
func (d *Decoder) handleCall(class Class, argv Tuple) error { func (d *Decoder) handleCall(class Class, argv Tuple) error {
// for protocols <= 2 Python3 encodes bytes as `_codecs.encode(byt.decode('latin1'), 'latin1')` // for protocols <= 2 Python3 encodes bytes as `_codecs.encode(byt.decode('latin1'), 'latin1')`
if class.Module == "_codecs" && class.Name == "encode" && if class.Module == "_codecs" && class.Name == "encode" &&
len(argv) == 2 && argv[1] == "latin1" { len(argv) == 2 && stringEQ(argv[1], "latin1") {
// bytes as latin1-decoded unicode // bytes as latin1-decoded unicode
data, err := decodeLatin1Bytes(argv[0]) data, err := decodeLatin1Bytes(argv[0])
...@@ -713,7 +727,7 @@ func (d *Decoder) handleCall(class Class, argv Tuple) error { ...@@ -713,7 +727,7 @@ func (d *Decoder) handleCall(class Class, argv Tuple) error {
} }
// bytearray(unicode, encoding) // bytearray(unicode, encoding)
if len(argv) == 2 && argv[1] == "latin-1" { if len(argv) == 2 && stringEQ(argv[1], "latin-1") {
// bytes as latin1-decode unicode // bytes as latin1-decode unicode
data, err := decodeLatin1Bytes(argv[0]) data, err := decodeLatin1Bytes(argv[0])
if err != nil { if err != nil {
...@@ -728,6 +742,15 @@ func (d *Decoder) handleCall(class Class, argv Tuple) error { ...@@ -728,6 +742,15 @@ func (d *Decoder) handleCall(class Class, argv Tuple) error {
return errCallNotHandled return errCallNotHandled
} }
// pushByteString pushes str as either ByteString or string depending on StrictUnicode setting.
func (d *Decoder) pushByteString(str string) {
if d.config.StrictUnicode {
d.push(ByteString(str))
} else {
d.push(str)
}
}
// Push a string // Push a string
func (d *Decoder) loadString() error { func (d *Decoder) loadString() error {
line, err := d.readLine() line, err := d.readLine()
...@@ -758,7 +781,7 @@ func (d *Decoder) loadString() error { ...@@ -758,7 +781,7 @@ func (d *Decoder) loadString() error {
return err return err
} }
d.push(s) d.pushByteString(s)
return nil return nil
} }
...@@ -812,7 +835,7 @@ func (d *Decoder) loadBinString() error { ...@@ -812,7 +835,7 @@ func (d *Decoder) loadBinString() error {
if err != nil { if err != nil {
return err return err
} }
d.push(d.buf.String()) d.pushByteString(d.buf.String())
return nil return nil
} }
...@@ -847,7 +870,7 @@ func (d *Decoder) loadShortBinString() error { ...@@ -847,7 +870,7 @@ func (d *Decoder) loadShortBinString() error {
if err != nil { if err != nil {
return err return err
} }
d.push(d.buf.String()) d.pushByteString(d.buf.String())
return nil return nil
} }
......
...@@ -83,6 +83,9 @@ type TestEntry struct { ...@@ -83,6 +83,9 @@ type TestEntry struct {
objectIn interface{} objectIn interface{}
picklev []TestPickle picklev []TestPickle
objectOut interface{} objectOut interface{}
strictUnicodeN bool // whether to test with StrictUnicode=n while decoding/encoding
strictUnicodeY bool // whether to test with StrictUnicode=y while decoding/encoding
} }
// X, I, P0, P1, P* form a language to describe decode/encode tests: // X, I, P0, P1, P* form a language to describe decode/encode tests:
...@@ -98,8 +101,25 @@ type TestEntry struct { ...@@ -98,8 +101,25 @@ type TestEntry struct {
// Decoding the pickle data must give the object. // Decoding the pickle data must give the object.
// X is syntatic sugar to prepare one TestEntry. // X is syntatic sugar to prepare one TestEntry.
//
// the entry is tested under both StrictUnicode=n and StrictUnicode=y modes.
func X(name string, object interface{}, picklev ...TestPickle) TestEntry { func X(name string, object interface{}, picklev ...TestPickle) TestEntry {
return TestEntry{name: name, objectIn: object, objectOut: object, picklev: picklev} return TestEntry{name: name, objectIn: object, objectOut: object, picklev: picklev,
strictUnicodeN: true, strictUnicodeY: true}
}
// Xuauto is syntactic sugar to prepare one TestEntry that is tested only under StrictUnicode=n mode.
func Xuauto(name string, object interface{}, picklev ...TestPickle) TestEntry {
x := X(name, object, picklev...)
x.strictUnicodeY = false
return x
}
// Xustrict is syntactic sugar to prepare one TestEntry that is tested only under StrictUnicode=y mode.
func Xustrict(name string, object interface{}, picklev ...TestPickle) TestEntry {
x := X(name, object, picklev...)
x.strictUnicodeN = false
return x
} }
// Xloosy is syntatic sugar to prepare one TestEntry with loosy incoding. // Xloosy is syntatic sugar to prepare one TestEntry with loosy incoding.
...@@ -235,7 +255,9 @@ var tests = []TestEntry{ ...@@ -235,7 +255,9 @@ var tests = []TestEntry{
P2_("(K\x01K\x02K\x03\x88l."), // MARK + BININT1 + NEW_TRUE + LIST P2_("(K\x01K\x02K\x03\x88l."), // MARK + BININT1 + NEW_TRUE + LIST
I("(lp0\nI1\naI2\naI3\naI01\na.")), I("(lp0\nI1\naI2\naI3\naI01\na.")),
X("str('abc')", "abc", // strings in default StrictUnicode=n mode
Xuauto("str('abc')", "abc",
P0("S\"abc\"\n."), // STRING P0("S\"abc\"\n."), // STRING
P12("U\x03abc."), // SHORT_BINSTRING P12("U\x03abc."), // SHORT_BINSTRING
P3("X\x03\x00\x00\x00abc."), // BINUNICODE P3("X\x03\x00\x00\x00abc."), // BINUNICODE
...@@ -244,7 +266,7 @@ var tests = []TestEntry{ ...@@ -244,7 +266,7 @@ var tests = []TestEntry{
I("S'abc'\np0\n."), I("S'abc'\np0\n."),
I("S'abc'\n.")), I("S'abc'\n.")),
X("unicode('日本語')", "日本語", Xuauto("unicode('日本語')", "日本語",
P0("S\"日本語\"\n."), // STRING P0("S\"日本語\"\n."), // STRING
P12("U\x09日本語."), // SHORT_BINSTRING P12("U\x09日本語."), // SHORT_BINSTRING
P3("X\x09\x00\x00\x00日本語."), // BINUNICODE P3("X\x09\x00\x00\x00日本語."), // BINUNICODE
...@@ -254,7 +276,7 @@ var tests = []TestEntry{ ...@@ -254,7 +276,7 @@ var tests = []TestEntry{
I("X\x09\x00\x00\x00\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e.")), // BINUNICODE I("X\x09\x00\x00\x00\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e.")), // BINUNICODE
// TODO BINUNICODE8 // TODO BINUNICODE8
X("unicode('\\' 知事少时烦恼少、识人多处是非多。')", "' 知事少时烦恼少、识人多处是非多。", Xuauto("unicode('\\' 知事少时烦恼少、识人多处是非多。')", "' 知事少时烦恼少、识人多处是非多。",
// UNICODE // UNICODE
I("V' \\u77e5\\u4e8b\\u5c11\\u65f6\\u70e6\\u607c\\u5c11\\u3001\\u8bc6\\u4eba\\u591a\\u5904\\u662f\\u975e\\u591a\\u3002\n."), I("V' \\u77e5\\u4e8b\\u5c11\\u65f6\\u70e6\\u607c\\u5c11\\u3001\\u8bc6\\u4eba\\u591a\\u5904\\u662f\\u975e\\u591a\\u3002\n."),
...@@ -266,8 +288,35 @@ var tests = []TestEntry{ ...@@ -266,8 +288,35 @@ var tests = []TestEntry{
// TODO BINUNICODE8 // TODO BINUNICODE8
// NOTE loosy because *UNICODE currently decodes as string // strings in StrictUnicode=y mode
Xloosy("unicode(non-utf8)", unicode("\x93"), "\x93",
Xustrict("str('abc')", ByteString("abc"),
P0("S\"abc\"\n."), // STRING
P1_("U\x03abc."), // SHORT_BINSTRING
I("T\x03\x00\x00\x00abc."), // BINSTRING
I("S'abc'\np0\n."),
I("S'abc'\n.")),
Xustrict("unicode('abc')", "abc",
P0("Vabc\n."), // UNICODE
P123("X\x03\x00\x00\x00abc."), // BINUNICODE
P4_("\x8c\x03abc.")), // SHORT_BINUNICODE
// TODO BINUNICODE8
Xustrict("str('日本語')", ByteString("日本語"),
P0("S\"日本語\"\n."), // STRING
P1_("U\x09日本語.")), // SHORT_BINSTRING
Xustrict("unicode('日本語')", "日本語",
P0("V\\u65e5\\u672c\\u8a9e\n."), // UNICODE
P123("X\x09\x00\x00\x00日本語."), // BINUNICODE
P4_("\x8c\x09\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e."), // SHORT_BINUNICODE
I("V\\u65e5\\u672c\\u8a9e\np0\n."), // UNICODE
I("X\x09\x00\x00\x00\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e.")), // BINUNICODE
// TODO BINUNICODE8
Xustrict("unicode(non-utf8)", "\x93",
P0(errP0UnicodeUTF8Only), // UNICODE cannot represent non-UTF8 sequences P0(errP0UnicodeUTF8Only), // UNICODE cannot represent non-UTF8 sequences
P123("X\x01\x00\x00\x00\x93."), // BINUNICODE P123("X\x01\x00\x00\x00\x93."), // BINUNICODE
P4_("\x8c\x01\x93.")), // SHORT_BINUNICODE P4_("\x8c\x01\x93.")), // SHORT_BINUNICODE
...@@ -275,15 +324,15 @@ var tests = []TestEntry{ ...@@ -275,15 +324,15 @@ var tests = []TestEntry{
// str/unicode with many control characters at P0 // str/unicode with many control characters at P0
// this exercises escape-based STRING/UNICODE coding // this exercises escape-based STRING/UNICODE coding
X(`str('\x80ми\nр\r\u2028\\u1234\\U00004321') # text escape`, "\x80ми\nр\r\u2028\\u1234\\U00004321", Xustrict(`str('\x80ми\nр\r\u2028\\u1234\\U00004321') # text escape`, ByteString("\x80ми\nр\r\u2028\\u1234\\U00004321"),
P0("S\"\\x80ми\\\\r\\xe2\\x80\\xa8\\\\u1234\\\\U00004321\"\n."), P0("S\"\\x80ми\\\\r\\xe2\\x80\\xa8\\\\u1234\\\\U00004321\"\n."),
I("S\"\\x80ми\\\\r\\xe2\\x80\\xa8\\u1234\\U00004321\"\n.")), // \u and \U not decoded I("S\"\\x80ми\\\\r\\xe2\\x80\\xa8\\u1234\\U00004321\"\n.")), // \u and \U not decoded
X(`str("hel'lo")`, "hel'lo", I("S'hel'lo'\n.")), // non-escaped ' inside '-quotes Xustrict(`str("hel'lo")`, ByteString("hel'lo"), I("S'hel'lo'\n.")), // non-escaped ' inside '-quotes
X(`str("hel\"lo")`, "hel\"lo", I("S\"hel\"lo\"\n.")), // non-escaped " inside "-quotes Xustrict(`str("hel\"lo")`, ByteString("hel\"lo"), I("S\"hel\"lo\"\n.")), // non-escaped " inside "-quotes
X(`unicode(r'мир\n\r\x00'+'\r') # text escape`, `мир\n\r\x00`+"\r", Xuauto(`unicode(r'мир\n\r\x00'+'\r') # text escape`, `мир\n\r\x00`+"\r",
I("V\\u043c\\u0438\\u0440\\n\\r\\x00" + // only \u and \U are decoded - not \n \r ... I("V\\u043c\\u0438\\u0440\\n\\r\\x00" + // only \u and \U are decoded - not \n \r ...
"\r" + // raw \r - ok, not lost "\r" + // raw \r - ok, not lost
"\n.")), "\n.")),
...@@ -330,13 +379,13 @@ var tests = []TestEntry{ ...@@ -330,13 +379,13 @@ var tests = []TestEntry{
P1_("}."), // EMPTY_DICT P1_("}."), // EMPTY_DICT
I("(dp0\n.")), I("(dp0\n.")),
X("dict({'a': '1'})", map[interface{}]interface{}{"a": "1"}, Xuauto("dict({'a': '1'})", map[interface{}]interface{}{"a": "1"},
P0("(S\"a\"\nS\"1\"\nd."), // MARK + STRING + DICT P0("(S\"a\"\nS\"1\"\nd."), // MARK + STRING + DICT
P12("(U\x01aU\x011d."), // MARK + SHORT_BINSTRING + DICT P12("(U\x01aU\x011d."), // MARK + SHORT_BINSTRING + DICT
P3("(X\x01\x00\x00\x00aX\x01\x00\x00\x001d."), // MARK + BINUNICODE + DICT P3("(X\x01\x00\x00\x00aX\x01\x00\x00\x001d."), // MARK + BINUNICODE + DICT
P4_("(\x8c\x01a\x8c\x011d.")), // MARK + SHORT_BINUNICODE + DICT P4_("(\x8c\x01a\x8c\x011d.")), // MARK + SHORT_BINUNICODE + DICT
X("dict({'a': '1', 'b': '2'})", map[interface{}]interface{}{"a": "1", "b": "2"}, Xuauto("dict({'a': '1', 'b': '2'})", map[interface{}]interface{}{"a": "1", "b": "2"},
// map iteration order is not stable - test only decoding // map iteration order is not stable - test only decoding
I("(S\"a\"\nS\"1\"\nS\"b\"\nS\"2\"\nd."), // P0: MARK + STRING + DICT I("(S\"a\"\nS\"1\"\nS\"b\"\nS\"2\"\nd."), // P0: MARK + STRING + DICT
I("(U\x01aU\x011U\x01bU\x012d."), // P12: MARK + SHORT_BINSTRING + DICT I("(U\x01aU\x011U\x01bU\x012d."), // P12: MARK + SHORT_BINSTRING + DICT
...@@ -349,7 +398,7 @@ var tests = []TestEntry{ ...@@ -349,7 +398,7 @@ var tests = []TestEntry{
I("}(U\x01aU\x011U\x01bU\x012u."), // EMPTY_DICT + MARK + SHORT_BINSTRING + SETITEMS I("}(U\x01aU\x011U\x01bU\x012u."), // EMPTY_DICT + MARK + SHORT_BINSTRING + SETITEMS
I("(dp0\nS'a'\np1\nS'1'\np2\nsS'b'\np3\nS'2'\np4\ns.")), I("(dp0\nS'a'\np1\nS'1'\np2\nsS'b'\np3\nS'2'\np4\ns.")),
X("foo.bar # global", Class{Module: "foo", Name: "bar"}, Xuauto("foo.bar # global", Class{Module: "foo", Name: "bar"},
P0123("cfoo\nbar\n."), // GLOBAL P0123("cfoo\nbar\n."), // GLOBAL
P4_("\x8c\x03foo\x8c\x03bar\x93."), // SHORT_BINUNICODE + STACK_GLOBAL P4_("\x8c\x03foo\x8c\x03bar\x93."), // SHORT_BINUNICODE + STACK_GLOBAL
I("S'foo'\nS'bar'\n\x93.")), // STRING + STACK_GLOBAL I("S'foo'\nS'bar'\n\x93.")), // STRING + STACK_GLOBAL
...@@ -358,20 +407,20 @@ var tests = []TestEntry{ ...@@ -358,20 +407,20 @@ var tests = []TestEntry{
P0123(errP0123GlobalStringLineOnly), P0123(errP0123GlobalStringLineOnly),
P4_("\x8c\x05foo\n2\x8c\x03bar\x93.")), // SHORT_BINUNICODE + STACK_GLOBAL P4_("\x8c\x05foo\n2\x8c\x03bar\x93.")), // SHORT_BINUNICODE + STACK_GLOBAL
X(`foo.bar("bing") # global + reduce`, Call{Callable: Class{Module: "foo", Name: "bar"}, Args: []interface{}{"bing"}}, Xuauto(`foo.bar("bing") # global + reduce`, Call{Callable: Class{Module: "foo", Name: "bar"}, Args: []interface{}{"bing"}},
P0("cfoo\nbar\n(S\"bing\"\ntR."), // GLOBAL + MARK + STRING + TUPLE + REDUCE P0("cfoo\nbar\n(S\"bing\"\ntR."), // GLOBAL + MARK + STRING + TUPLE + REDUCE
P1("cfoo\nbar\n(U\x04bingtR."), // GLOBAL + MARK + SHORT_BINSTRING + TUPLE + REDUCE P1("cfoo\nbar\n(U\x04bingtR."), // GLOBAL + MARK + SHORT_BINSTRING + TUPLE + REDUCE
P2("cfoo\nbar\nU\x04bing\x85R."), // GLOBAL + SHORT_BINSTRING + TUPLE1 + REDUCE P2("cfoo\nbar\nU\x04bing\x85R."), // GLOBAL + SHORT_BINSTRING + TUPLE1 + REDUCE
P3("cfoo\nbar\nX\x04\x00\x00\x00bing\x85R."), // GLOBAL + BINUNICODE + TUPLE1 + REDUCE P3("cfoo\nbar\nX\x04\x00\x00\x00bing\x85R."), // GLOBAL + BINUNICODE + TUPLE1 + REDUCE
P4_("\x8c\x03foo\x8c\x03bar\x93\x8c\x04bing\x85R.")), // SHORT_BINUNICODE + STACK_GLOBAL + TUPLE1 + REDUCE P4_("\x8c\x03foo\x8c\x03bar\x93\x8c\x04bing\x85R.")), // SHORT_BINUNICODE + STACK_GLOBAL + TUPLE1 + REDUCE
X(`persref("abc")`, Ref{"abc"}, Xuauto(`persref("abc")`, Ref{"abc"},
P0("Pabc\n."), // PERSID P0("Pabc\n."), // PERSID
P12("U\x03abcQ."), // SHORT_BINSTRING + BINPERSID P12("U\x03abcQ."), // SHORT_BINSTRING + BINPERSID
P3("X\x03\x00\x00\x00abcQ."), // BINUNICODE + BINPERSID P3("X\x03\x00\x00\x00abcQ."), // BINUNICODE + BINPERSID
P4_("\x8c\x03abcQ.")), // SHORT_BINUNICODE + BINPERSID P4_("\x8c\x03abcQ.")), // SHORT_BINUNICODE + BINPERSID
X(`persref("abc\nd")`, Ref{"abc\nd"}, Xuauto(`persref("abc\nd")`, Ref{"abc\nd"},
P0(errP0PersIDStringLineOnly), // cannot be encoded P0(errP0PersIDStringLineOnly), // cannot be encoded
P12("U\x05abc\ndQ."), // SHORT_BINSTRING + BINPERSID P12("U\x05abc\ndQ."), // SHORT_BINSTRING + BINPERSID
P3("X\x05\x00\x00\x00abc\ndQ."), // BINUNICODE + BINPERSID P3("X\x05\x00\x00\x00abc\ndQ."), // BINUNICODE + BINPERSID
...@@ -388,10 +437,10 @@ var tests = []TestEntry{ ...@@ -388,10 +437,10 @@ var tests = []TestEntry{
X("LONG_BINPUT", []interface{}{int64(17)}, X("LONG_BINPUT", []interface{}{int64(17)},
I("(lr0000I17\na.")), I("(lr0000I17\na.")),
X("graphite message1", graphiteObject1, graphitePickle1), Xuauto("graphite message1", graphiteObject1, graphitePickle1),
X("graphite message2", graphiteObject2, graphitePickle2), Xuauto("graphite message2", graphiteObject2, graphitePickle2),
X("graphite message3", graphiteObject3, graphitePickle3), Xuauto("graphite message3", graphiteObject3, graphitePickle3),
X("too long line", longLine, I("V" + longLine + "\n.")), Xuauto("too long line", longLine, I("V" + longLine + "\n.")),
// opcodes from protocol 4 // opcodes from protocol 4
...@@ -430,6 +479,15 @@ var protoPrefixTemplate = string([]byte{opProto, 0xff}) ...@@ -430,6 +479,15 @@ var protoPrefixTemplate = string([]byte{opProto, 0xff})
// TestDecode verifies ogórek decoder. // TestDecode verifies ogórek decoder.
func TestDecode(t *testing.T) { func TestDecode(t *testing.T) {
for _, test := range tests { for _, test := range tests {
for _, strictUnicode := range []bool{false, true} {
if strictUnicode && !test.strictUnicodeY {
continue
}
if !strictUnicode && !test.strictUnicodeN {
continue
}
testname := fmt.Sprintf("%s/StrictUnicode=%s", test.name, yn(strictUnicode))
for _, pickle := range test.picklev { for _, pickle := range test.picklev {
if pickle.err != nil { if pickle.err != nil {
continue continue
...@@ -442,22 +500,32 @@ func TestDecode(t *testing.T) { ...@@ -442,22 +500,32 @@ func TestDecode(t *testing.T) {
data := string([]byte{opProto, byte(proto)}) + data := string([]byte{opProto, byte(proto)}) +
pickle.data[len(protoPrefixTemplate):] pickle.data[len(protoPrefixTemplate):]
t.Run(fmt.Sprintf("%s/%q/proto=%d", test.name, data, proto), func(t *testing.T) { t.Run(fmt.Sprintf("%s/%q/proto=%d", testname, data, proto), func(t *testing.T) {
testDecode(t, test.objectOut, data) testDecode(t, strictUnicode, test.objectOut, data)
}) })
} }
} else { } else {
t.Run(fmt.Sprintf("%s/%q", test.name, pickle.data), func(t *testing.T) { t.Run(fmt.Sprintf("%s/%q", testname, pickle.data), func(t *testing.T) {
testDecode(t, test.objectOut, pickle.data) testDecode(t, strictUnicode, test.objectOut, pickle.data)
}) })
} }
} }
} }
}
} }
// TestEncode verifies ogórek encoder. // TestEncode verifies ogórek encoder.
func TestEncode(t *testing.T) { func TestEncode(t *testing.T) {
for _, test := range tests { for _, test := range tests {
for _, strictUnicode := range []bool{false, true} {
if strictUnicode && !test.strictUnicodeY {
continue
}
if !strictUnicode && !test.strictUnicodeN {
continue
}
testname := fmt.Sprintf("%s/StrictUnicode=%s", test.name, yn(strictUnicode))
alreadyTested := make(map[int]bool) // protocols we tested encode with so far alreadyTested := make(map[int]bool) // protocols we tested encode with so far
for _, pickle := range test.picklev { for _, pickle := range test.picklev {
for _, proto := range pickle.protov { for _, proto := range pickle.protov {
...@@ -467,8 +535,8 @@ func TestEncode(t *testing.T) { ...@@ -467,8 +535,8 @@ func TestEncode(t *testing.T) {
dataOk = string([]byte{opProto, byte(proto)}) + dataOk dataOk = string([]byte{opProto, byte(proto)}) + dataOk
} }
t.Run(fmt.Sprintf("%s/proto=%d", test.name, proto), func(t *testing.T) { t.Run(fmt.Sprintf("%s/proto=%d", testname, proto), func(t *testing.T) {
testEncode(t, proto, test.objectIn, test.objectOut, dataOk, pickle.err) testEncode(t, proto, strictUnicode, test.objectIn, test.objectOut, dataOk, pickle.err)
}) })
alreadyTested[proto] = true alreadyTested[proto] = true
...@@ -481,21 +549,28 @@ func TestEncode(t *testing.T) { ...@@ -481,21 +549,28 @@ func TestEncode(t *testing.T) {
continue continue
} }
t.Run(fmt.Sprintf("%s/proto=%d(roundtrip)", test.name, proto), func(t *testing.T) { t.Run(fmt.Sprintf("%s/proto=%d(roundtrip)", testname, proto), func(t *testing.T) {
testEncode(t, proto, test.objectIn, test.objectOut, "", nil) testEncode(t, proto, strictUnicode, test.objectIn, test.objectOut, "", nil)
}) })
} }
} }
}
} }
// testDecode decodes input and verifies it is == object. // testDecode decodes input and verifies it is == object.
// //
// It also verifies decoder robustness - via feeding it various kinds of // It also verifies decoder robustness - via feeding it various kinds of
// corrupt data derived from input. // corrupt data derived from input.
func testDecode(t *testing.T, object interface{}, input string) { func testDecode(t *testing.T, strictUnicode bool, object interface{}, input string) {
newDecoder := func(r io.Reader) *Decoder {
return NewDecoderWithConfig(r, &DecoderConfig{
StrictUnicode: strictUnicode,
})
}
// decode(input) -> expected // decode(input) -> expected
buf := bytes.NewBufferString(input) buf := bytes.NewBufferString(input)
dec := NewDecoder(buf) dec := newDecoder(buf)
v, err := dec.Decode() v, err := dec.Decode()
if err != nil { if err != nil {
t.Error(err) t.Error(err)
...@@ -514,7 +589,7 @@ func testDecode(t *testing.T, object interface{}, input string) { ...@@ -514,7 +589,7 @@ func testDecode(t *testing.T, object interface{}, input string) {
// decode(truncated input) -> must return io.ErrUnexpectedEOF // decode(truncated input) -> must return io.ErrUnexpectedEOF
for l := len(input) - 1; l > 0; l-- { for l := len(input) - 1; l > 0; l-- {
buf := bytes.NewBufferString(input[:l]) buf := bytes.NewBufferString(input[:l])
dec := NewDecoder(buf) dec := newDecoder(buf)
v, err := dec.Decode() v, err := dec.Decode()
if !(v == nil && err == io.ErrUnexpectedEOF) { if !(v == nil && err == io.ErrUnexpectedEOF) {
t.Errorf("no ErrUnexpectedEOF on [:%d] truncated stream: v = %#v err = %#v", l, v, err) t.Errorf("no ErrUnexpectedEOF on [:%d] truncated stream: v = %#v err = %#v", l, v, err)
...@@ -525,7 +600,7 @@ func testDecode(t *testing.T, object interface{}, input string) { ...@@ -525,7 +600,7 @@ func testDecode(t *testing.T, object interface{}, input string) {
// it must not panic. // it must not panic.
for i := 0; i < len(input); i++ { for i := 0; i < len(input); i++ {
buf := bytes.NewBufferString(input[i:]) buf := bytes.NewBufferString(input[i:])
dec := NewDecoder(buf) dec := newDecoder(buf)
func() { func() {
defer func() { defer func() {
if r := recover(); r != nil { if r := recover(); r != nil {
...@@ -547,11 +622,22 @@ func testDecode(t *testing.T, object interface{}, input string) { ...@@ -547,11 +622,22 @@ func testDecode(t *testing.T, object interface{}, input string) {
// encode-back tests are still performed. // encode-back tests are still performed.
// //
// If errOk != nil, object encoding must produce that error. // If errOk != nil, object encoding must produce that error.
func testEncode(t *testing.T, proto int, object, objectDecodedBack interface{}, dataOk string, errOk error) { func testEncode(t *testing.T, proto int, strictUnicode bool, object, objectDecodedBack interface{}, dataOk string, errOk error) {
buf := &bytes.Buffer{} newEncoder := func(w io.Writer) *Encoder {
enc := NewEncoderWithConfig(buf, &EncoderConfig{ return NewEncoderWithConfig(w, &EncoderConfig{
Protocol: proto, Protocol: proto,
StrictUnicode: strictUnicode,
})
}
newDecoder := func(r io.Reader) *Decoder {
return NewDecoderWithConfig(r, &DecoderConfig{
StrictUnicode: strictUnicode,
}) })
}
buf := &bytes.Buffer{}
enc := newEncoder(buf)
// encode(object) == expected data // encode(object) == expected data
err := enc.Encode(object) err := enc.Encode(object)
...@@ -573,9 +659,7 @@ func testEncode(t *testing.T, proto int, object, objectDecodedBack interface{}, ...@@ -573,9 +659,7 @@ func testEncode(t *testing.T, proto int, object, objectDecodedBack interface{},
// encode | limited writer -> write error // encode | limited writer -> write error
for l := int64(len(data))-1; l >= 0; l-- { for l := int64(len(data))-1; l >= 0; l-- {
buf.Reset() buf.Reset()
enc = NewEncoderWithConfig(LimitWriter(buf, l), &EncoderConfig{ enc = newEncoder(LimitWriter(buf, l))
Protocol: proto,
})
err = enc.Encode(object) err = enc.Encode(object)
if err != io.EOF { if err != io.EOF {
...@@ -584,7 +668,7 @@ func testEncode(t *testing.T, proto int, object, objectDecodedBack interface{}, ...@@ -584,7 +668,7 @@ func testEncode(t *testing.T, proto int, object, objectDecodedBack interface{},
} }
// decode(encode(object)) == object // decode(encode(object)) == object
dec := NewDecoder(bytes.NewBufferString(data)) dec := newDecoder(bytes.NewBufferString(data))
v, err := dec.Decode() v, err := dec.Decode()
if err != nil { if err != nil {
t.Errorf("encode -> decode -> error: %s", err) t.Errorf("encode -> decode -> error: %s", err)
...@@ -971,6 +1055,7 @@ func TestStringsFmt(t *testing.T) { ...@@ -971,6 +1055,7 @@ func TestStringsFmt(t *testing.T) {
}{ }{
{"мир", `"мир"`}, {"мир", `"мир"`},
{Bytes("мир"), `ogórek.Bytes("мир")`}, {Bytes("мир"), `ogórek.Bytes("мир")`},
{ByteString("мир"), `ogórek.ByteString("мир")`},
{unicode("мир"), `ogórek.unicode("мир")`}, {unicode("мир"), `ogórek.unicode("мир")`},
} }
...@@ -1002,3 +1087,12 @@ func (l *LimitedWriter) Write(p []byte) (n int, err error) { ...@@ -1002,3 +1087,12 @@ func (l *LimitedWriter) Write(p []byte) (n int, err error) {
} }
func LimitWriter(w io.Writer, n int64) io.Writer { return &LimitedWriter{w, n} } func LimitWriter(w io.Writer, n int64) io.Writer { return &LimitedWriter{w, n} }
// yn returns "y" or "n" for a boolean.
func yn(b bool) string {
if b {
return "y"
}
return "n"
}
...@@ -24,3 +24,50 @@ func AsInt64(x interface{}) (int64, error) { ...@@ -24,3 +24,50 @@ func AsInt64(x interface{}) (int64, error) {
} }
return 0, fmt.Errorf("expect int64|long; got %T", x) return 0, fmt.Errorf("expect int64|long; got %T", x)
} }
// AsBytes tries to represent unpickled value as Bytes.
//
// It succeeds only if the value is either Bytes, or ByteString.
// It does not succeed if the value is string or any other type.
//
// ByteString is treated related to Bytes because ByteString represents str
// type from py2 which can contain both string and binary data.
func AsBytes(x interface{}) (Bytes, error) {
switch x := x.(type) {
case Bytes:
return x, nil
case ByteString:
return Bytes(x), nil
}
return "", fmt.Errorf("expect bytes|bytestr; got %T", x)
}
// AsString tries to represent unpickled value as string.
//
// It succeeds only if the value is either string, or ByteString.
// It does not succeed if the value is Bytes or any other type.
//
// ByteString is treated related to string because ByteString represents str
// type from py2 which can contain both string and binary data.
func AsString(x interface{}) (string, error) {
switch x := x.(type) {
case string:
return x, nil
case ByteString:
return string(x), nil
}
return "", fmt.Errorf("expect unicode|bytestr; got %T", x)
}
// stringEQ compares arbitrary x to string y.
//
// It succeeds only if AsString(x) succeeds and string data of x equals to y.
func stringEQ(x interface{}, y string) bool {
s, err := AsString(x)
if err != nil {
return false
}
return s == y
}
...@@ -52,3 +52,62 @@ func TestAsInt64(t *testing.T) { ...@@ -52,3 +52,62 @@ func TestAsInt64(t *testing.T) {
} }
} }
func TestAsBytesString(t *testing.T) {
Ebytes := func(x interface{}) error {
return fmt.Errorf("expect bytes|bytestr; got %T", x)
}
Estring := func(x interface{}) error {
return fmt.Errorf("expect unicode|bytestr; got %T", x)
}
const y = true
const n = false
testv := []struct {
in interface{}
bok bool // AsBytes succeeds
sok bool // AsString succeeds
}{
{"мир", n, y},
{Bytes("мир"), y, n},
{ByteString("мир"), y, y},
{1.0, n, n},
{None{}, n, n},
}
for _, tt := range testv {
bout, berr := AsBytes(tt.in)
sout, serr := AsString(tt.in)
sin := ""
xin := reflect.ValueOf(tt.in)
if xin.Kind() == reflect.String {
sin = xin.String()
}
boutOK := Bytes(sin)
var berrOK error
if !tt.bok {
boutOK = ""
berrOK = Ebytes(tt.in)
}
soutOK := sin
var serrOK error
if !tt.sok {
soutOK = ""
serrOK = Estring(tt.in)
}
if !(bout == boutOK && reflect.DeepEqual(berr, berrOK)) {
t.Errorf("%#v: AsBytes:\nhave %#v %#v\nwant %#v %#v",
tt.in, bout, berr, boutOK, berrOK)
}
if !(sout == soutOK && reflect.DeepEqual(serr, serrOK)) {
t.Errorf("%#v: AsString:\nhave %#v %#v\nwant %#v %#v",
tt.in, sout, serr, soutOK, serrOK)
}
}
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment