View articles in the go-codec
series, source at http://github.com/ugorji/go
go-codec is a high performance and feature rich library that provides idiomatic encoding and decoding support for msgpack, binc, cbor, json and simple formats. It supports both runtime introspection (reflection) and code generation. Below, we will walk you through using it for your serialization needs.
Supported formats:
Source code: http://github.com/ugorji/go
godoc
documentation: http://godoc.org/github.com/ugorji/go/codec
go get -u github.com/ugorji/go/codec/...
The will install:
go/codec
: a runtime library for encoding/decoding via runtime introspection of named typesThis package understands the safe
tag, to ensure the unsafe package is not used for performance optimizations.
unsafe
will bypass the allocation and copying overhead of []byte->string
conversion.To not use unsafe package for performance (e.g. in appengine environment, etc),
you MUST pass the safe
(or appengine
) tag during build:
go get -tags=safe -u github.com/ugorji/go/codec/...
Unfortunately, many users already depend on the import path “github.com/ugorji/go/codec”, so I cannot change it to a preferred “github.com/ugorji/go-codec”.
As go-codec supports multiple formats, a user will need to configure a Handle. The handle tells what format to use.
The handles supported are:
MsgpackHandle
BincHandle
CborHandle
SimpleHandle
JsonHandle
In the rest of the article, I will mostly use JsonHandle
.
It is straightforward to use a different Handle.
Let us first show how to use without much configuration.
We will leverage these go values
in the rest of the article:
type A struct {
I int
S string
}
type B float64
var v1 A
var v2 *A = &v1
var v3 int = 9
var v4 bool = false
var v5 interface{} = v3
var v6 interface{} = nil
var v7 B
var v8 *B = &v7
The Handle is SAFE for concurrent READ, but NOT SAFE for concurrent modification. This means that you should configure Handle completely before use, and then pass it on to different initialization of Encoder or Decoders.
The Encoder and Decoder are NOT SAFE for concurrent use. However, they support a Reset method to allow them to be re-used. Reuse helps leverage the state maintained e.g. mapping of type ids to dedicated functions, byte buffers used, etc.
The general usage model for go-codec is:
Any go value
can be encoded. In the example above,
any of v1 through v8 can be encoded.
A user may want to encode a value
as a []byte. Doing this will use
direct updates to a []byte, and bypass interface calls and overhead
of io.Writer interface. It strives for a zero-copy model.
var b []byte = make([]byte, 0, 64)
var h codec.Handle = new(codec.JsonHandle)
var enc *codec.Encoder = codec.NewEncoderBytes(&b, h)
var err error = enc.Encode(v1) //any of v1 ... v8
// b now contains the encoded value.
A value
can also be serialized into an io.Writer.
var w io.Writer = new(bytes.Buffer)
var h codec.Handle = new(codec.JsonHandle)
var enc *codec.Encoder = codec.NewEncoder(w, h)
var err error = enc.Encode(v1) //any of v1 ... v8
We recommend that the user specifies a buffer size, and we will internally use a buffered writer for performance.
// ...
var h codec.Handle = new(codec.JsonHandle)
h.WriterBufferSize = 8192
// ...
To decode, pass a pointer to a value. go-codec
will then decode into that
value.
A pointer must be passed, so that we can decode into the value behind the pointer.
A user may want to decode directly from a []byte. This is fastest, bypasses interface calls and other overhead of io.Reader, and strives for zero-copy mode while reading.
var b []byte
// ... assume b contains the bytes to decode from
var h codec.Handle = new(codec.JsonHandle)
var dec *codec.Decoder = codec.NewDecoderBytes(b, h)
var err error = dec.Decode(v2) //v2 or v8, or a pointer to v1, v3, v4, v5, v6, v7
A user may also decode from an io.Reader.
var r io.Reader
// ... assume r contains the data to decode from
var h codec.Handle = new(codec.JsonHandle)
var dec *codec.Decoder = codec.NewDecoderBytes(r, h)
var err error = dec.Decode(v2) //v2 or v8, or a pointer to v1, v3, v4, v5, v6, v7
We recommend that the user specifies a buffer size, and we will internally use a buffered reader for performance. If not specified, go-codec will NOT internally do a buffering because users may want to combine encoded data with other data, and want go-codec to only read the bytes needed.
// ...
var h codec.Handle = new(codec.JsonHandle)
h.ReaderBufferSize = 8192
// ...
Decode will update the value passed.
The key thing to note is that we update during a decode.
Consequently, if you have the following:
var m = map[string]*A{"1": &A{I:1, S:"one"}, "2": &A{I:2, S:"two"} }
fmt.Printf("before: %v\n", m)
var b = []byte(`{"1": {"I":111}, "3": {"I": 333} }`)
var err error = codec.NewDecoderBytes(b, new(codec.JsonHandle)).Decode(&m)
fmt.Printf(" after: %v\n", m)
for k, v := range m {
fmt.Printf("\t%v: %v\n, k, v)
}
Running that code should output:
before: map[2:0xc20801f100 1:0xc20801f0e0]
after: map[1:0xc20801f0e0 2:0xc20801f100 3:0xc20801f220]
1: &{111 one}
2: &{2 two}
3: &{333 }
You will notice the following:
When decoding into a map, we DO NOT delete map keys which do not exist in the stream. This gives symmetry as we only update tables (maps, structs) and never truncate.
To have a decoded value mirror exactly what was in the encoded stream, you should decode into a zero’ed value e.g. empty map, empty slice, new(XYZ) where XYZ is a struct, etc.
What if you don’t know the structure of your data beforehand?
We use the excellent support for interfaces in go.
Every go value can be converted to an interface{}. A type switch or type assertion can be used to retrieve the value back from the interface{}.
A nil interface{} is an interface{} without a value inside. When a pointer to this is passed into Decode(), go-codec will decode a value based on the structure of the stream as it is parsed.
Sample code to decode is below:
var b = []byte(`{"1": {"I":111}, "3": {"I": 333} }`)
var m interface{}
var err error = codec.NewDecoderBytes(b, new(codec.JsonHandle)).Decode(&m)
fmt.Printf("decoded type : %T\n", m)
fmt.Printf("decoded value: %v\n", m)
Output:
decoded type : map[interface {}]interface {}
decoded value: map[1:map[I:111] 3:map[I:333]]
go-codec will never silently skip data in the stream without allowing the user decide how to handle it.
There are a few scenarios where there is ambiguity:
FieldN
but there is no corresponding field in the struct.For both of these, go-codec allows the user configure whether an error should be
returned or the data silently skipped. See ErrorIfNoField
configuration below.
go-codec supports configuration at 2 levels:
go-codec allows you configure how you want a struct to be encoded using struct tags.
Encoding can be configured via the struct tag for the fields.
The “codec” key in the struct field’s tag value is the key name, followed by an optional comma and options. Note that the “json” key is used in the absence of the “codec” key.
To set an option on all fields (e.g. omitempty on all fields), you can create a field called _struct, and set flags on it.
Struct values “usually” encode as maps. Each exported struct field is encoded unless:
Note that omitempty is ignored when encoding struct values as arrays, as an entry must be encoded for each field, to maintain its position.
When encoding as a map, the first string in the tag (before the comma) is the map key string to use when encoding.
However, struct values may encode as arrays. This happens when the tag on the _struct field sets the “toarray” option
Values with types that implement codec.MapBySlice are encoded as stream maps.
The empty values (for omitempty option) are
Note that omitEmpty does not apply to structs, as there is no efficient way to test that a struct is equal to its zero value.
Anonymous fields are encoded inline except
When encoding a struct, all unexported fields are skipped. Exported fields can be:
The code snippet below illustrates configuration of a struct.
type Anon struct {
S string
}
// NOTE: 'json:' can be used as struct tag key, in place 'codec:' below.
type My struct {
_struct struct{} `codec:",omitempty"` //set omitempty for every field
Field1 string `codec:"-"` //skip this field
Field2 int `codec:"myName"` //Use key "myName" in encode stream
Field3 int32 `codec:",omitempty"` //use key "Field3". Omit if empty.
Field4 bool `codec:"f4,omitempty"` //use key "f4". Omit if empty.
field5 bool // unexported, so skipped
Anon // anonymous field, S is inlined.
// stream will contain S as if a regular field.
}
// NOTE: 'json:' can be used as struct tag key, in place 'codec:' below.
type My2 struct {
_struct bool `codec:",omitempty,toarray"` //set omitempty for every field
//and encode struct as an array
*Anon `codec:""` // anonymous field, with no struct name, so inline it.
// stream wll contain "S"
// OR
*Anon `codec:"abc"` // anonymous field, with specified struct name, so DO NOT inline it.
// stream wll contain "abc.S"
}
Every Handle has a set of basic options:
General options include:
EncodeOptions take precendence over values defined using struct tags.
DecodeOptions configure what happens during a decode:
Note that there are more options available, all of which are viewable in the package documentation at
https://godoc.org/github.com/ugorji/go/codec#EncodeOptions
https://godoc.org/github.com/ugorji/go/codec#DecodeOptions .
As an example, you can configure your handle as below:
var jh codec.JsonHandle
jh.MapType = reflect.TypeOf(map[string]int(nil))
jh.SliceType = reflect.TypeOf([]string(nil))
// for encoding
var w io.Writer
var enc *codec.Encoder = codec.NewEncoder(w, h)
// for decoding
var r io.Reader
var dec *codec.Decoder = codec.NewDecoder(r, h)
Some formats support extra configuration options.
Up until 2013, Messagepack had a single type: raw
, which was used for
raw bytes. Different libraries interpreted it either as a binary array of
bytes, or a unicode-style string. For languages which supported different
binary vs string types (e.g. java, go, python, etc), this presented a
problem.
In 2013, the spec was upated: raw
is renamed to Str
, and a new Bin
was introduced to represent binary data.
However, libraries want to maintain compatibility with the choices they made
previously in interpreting raw
. The go-codec
library previously treated raw
as []byte by default, with an option to treat it as string.
Furthermore, formal extension support was added in the updated spec.
Consequently, legacy applications do not understand the ext
or the Bin
messagepack type.
The options below are setup so that Messagepack handle is compatible with the legacy spec by default.
If a user wants to be compatible with the updated spec, he/she just has to setup his MsgpackHandle as below:
var h MsgpackHandle
h.WriteExt=true
See the legacy and new/updated messpagepack specs for more information.
Anonymous fields are encoded inline except
To encode as a separate regular field, specify a name in the struct tag (first value in the struct tag).
go-codec
has support for Anonymous fields which are pointers or non-pointers.
When encoding named types, we follow the following sequence:
When decoding named types, we follow a similar sequence:
From this, we see that users have a few options for controlling how values are encoded or decoded into:
The most robust solution is an extension. This is detailed below.
An extension plays nicely with decoding into a nil interface{}. This is because we can see the tag in the stream, and find out what type is mapped to that tag, and decode into a new instance of it.
To use an extension, register a codec.BytesExt
or codec.InterfaceExt
using
one of SetBytesExt
or SetInterfaceExt
exported by the Handle.
To illustrate, the snippet below creates an extension that encodes/decodes a time.Time to/from a 64-bit integer. It then sets it on a CborHandle which is passed into NewDecoder or NewEncoder functions.
type TimeExt struct{}
func (x TimeExt) ConvertExt(v interface{}) interface{} {
v2 := v.(*time.Time) // structs are encoded by passing the ptr
return v2.UTC().UnixNano()
}
func (x TimeExt) UpdateExt(dest interface{}, v interface{}) {
tt := dest.(*time.Time)
*tt = time.Unix(0, v.(int64)).UTC()
}
func main() {
var h codec.CborHandle
h.SetInterfaceExt(reflect.TypeOf(time.Time{}), 1, TimeExt{})
// now use h as your handle.
// time.Time will now be encoded as a uint64, and decoded from a uint64 or int64
}
Some users may represent a table with key-value pairs as a slice with an even length.
type X []interface{}
func (_ X) MapBySlice() { }
v := X{"key1", 1, "key2", 2, "key3", 3}
when we encode v
above, it will be encoded in the stream as a map.
This is the best way to enforce a specific order in a map, as iteration of a go map has no defined order.
Some users will convert a map into a MapBySlice implementation, then encode that to force a specific order in the stream.
go-codec
will encode a chan
as an array in the stream. It will also
decode an array in the stream into a chan
.
This allows a typical request, where a user wants to decode a very large array without loading up all the values in memory first in a slice. This will save memory and CPU time in a BIG WAY.
Sample usecase:
// You have a large number of values encoded in json format as below:
// [
// { object },
// { object },
// (trillions of entries)
// ]
//
// For best performance, you will decode the elements into a channel and
// concurrently process them one by one.
//
var h codec.JsonHandle // or CborHandle, or MsgpackHandle, etc
if _, ok := r.(io.ByteScanner); !ok {
r = bufio.NewReader(r)
} // use a buffered reader for efficiency
ch := make(chan Adresse, 128) // channel to decode into
finish := make(chan struct{}) // unbuffered channel for signaling goroutine finish
go func() {
for e := range ch { // process till all values are received and channel is closed
// process e
}
finish <- struct{}{} // send signal closing channel
}()
var dec *codec.Decoder = codec.NewDecoder(r, &h)
var err error = dec.Decode(&ch)
close(ch)
<- finish // wait for goroutine to finish processing channel
Note: When encoding, user can configure the ChanRecvTimeout parameter to specify whether to only receive the available elements in the chan, all elements received within a timeout, or all elements until chan is closed.
We have already seen how structs can be encoded as a map or an array:
Also, we see above that a Slice type can be encoded as a map:
Consequently,
Canonical representation means that encoding a value will always result in the same sequence of bytes. This applies ONLY to maps, which iterate (via range call) in random order.
codec will attempt to sort based on the natural ordering of the keys (numerically or lexicographically). However, if there is no natural ordering, then the keys will be encoded out of band to []byte, and the []byte sorted instead.
There is a slight performance hit if Canonical flag is on, as we MAY have to encode the keys out-of-band, and then sort them, before encoding the whole map.
This is configured using the Canonical
flag on the Handle.
** Canonical flag is ignored by codecgen (code generation). **
go-codec will treat NIL in a stream as the zero value.
It will then set the value being decoded into to its zero value.
This means that NIL can come in as the value of an int, and we decode that int as 0.
This is important for usecases as below:
encoding/json
uses the json:
key in the struct tag value to configure how the
struct is encoded.
go-codec
will use the json
key as a fallback, if the codec
key is unavailable
in the struct tag value.
This allows go-codec
to be used as a drop-in replacement for encoding/json
without
having to make changes to the structs.
go-codec also provides RPC support that integrates with the net/rpc package.
Please read the net/rpc package to see how to use it. Fundamentally, the net/rpc package requires a ServerCodec and ClientCodec.
go-codec provides these implementations.
go-codec also supports messagepack’s custom RPC communication model.
To use, replace codec.GoRpc with codec.MsgpackSpecRpc in snippets below.
RPC Server would look like this:
//RPC Server
go func() {
for {
conn, err := listener.Accept()
rpcCodec := codec.GoRpc.ServerCodec(conn, h) // OR codec.MsgpackSpecRpc...
rpc.ServeCodec(rpcCodec)
}
}()
RPC Clients would look like this:
//RPC Communication (client side)
conn, err = net.Dial("tcp", "localhost:5555")
rpcCodec := codec.GoRpc.ClientCodec(conn, h) // OR codec.MsgpackSpecRpc...
client := rpc.NewClientWithCodec(rpcCodec)
go-codec can be used via its runtime introspection or its code generation support.
The code generation support works by creating codec.Selfer implementation methods.
It integrates seamlessly with everything written in this article.