By Ugorji Nwoke   15 Dec 2014   /blog   technology go-codec

Re-Introducing Go Codec Library: msgpack, binc, cbor, json and more formats

The go-codec library is a High Performance, Feature-Rich and Idiomatic Go encoding/decoding library for binc, msgpack, cbor, json, with runtime reflection or compile-time code generation support. View Source at http://github.com/ugorji/go .

Sometime in 2013, we announced go-codec as a library for msgpack. The go-codec library has come a long way since then.


NOTE: *This is the first article of a series on go-codec, which includes*:

  1. Serialization in Go
  2. Re-Introducing Go Codec Library: msgpack, binc, cbor, json and more formats
  3. Supporting CBOR Binary Format
  4. Yet Another JSON library???
  5. Benchmarking Serialization in Go
  6. How we gain such extreme performance
  7. Code generation for even more performance
  8. Detailed primer on how to use the go-codec

To whet your appetite, I will show you some benchmark numbers gathered comparing encoding/json from the standard library to json support offered by go-codec. See below for the raw data and further quick analysis of the results.

New Features provided by go-codec

Currently, go-codec provides best-of-breed support for the following formats:

  1. messagepack: binary
  2. cbor: binary, streaming, explicit map/array delimited NEW
  3. json: text, streaming, explicit delimited NEW
  4. binc: binary, symbols

All these formats inherit the following features of go-codec mentioned below.

This update provides the following new features:

  1. Much Increased performance
  2. Fast (no-reflection) encoding/decoding of common maps and slices.
    The fast non-reflection support is enabled for all combinations of builtin types of maps and slices e.g. map[string]uint32, []int16, etc.
  3. Support for code generation
    This gives up to 2-20X performance improvement over the already stellar performance.
  4. Support for text-based formats i.e. json
  5. Support for IETF proposed Internet-Of-Things format i.e. cbor
  6. Support for indefinite-length formats to enable true streaming
  7. Read only what is needed
    This allows a stream to contain some encoded data, and other data e.g. a stream contains some msgpack encoded data, then \r\n delimiter, then some json-encoded data. go-codec supports that efficiently, as it never reads more from the stream than it needs, and it doesn’t do buffering.
  8. NEVER silently skip data when decoding
    User decides whether to return an error or silently skip data when keys or indexes in the data stream do not map to fields in the struct.
  9. Drop-in replacement for encoding/json. json: key in struct tag supported.

This is in addition to the features already supported, but now made more robust and fully supported via all encoding/decoding paths i.e. runtime reflection, code generation and fast-path for common maps and slices:

  1. Encode based on the destination data structure.
    For example,
    • decode a uint64 from any kind of number in the data stream (float, unsigned integer, signed integer, etc)
    • decode a string from a string or binary byte array in the data stream
  2. Support NIL in the stream in multiple contexts,
    decoding it as the zero-value of the data structure.
  3. Full support for encoding.(Text|Binary)(M|Unm)arshaler interfaces.
  4. Decoding without a schema (into a interface{}).
    This means decoding into a nil interface{}. Users can specify the type of maps and slices to use; these default to map[interface{}]interface{} and []interface{} respectively.
  5. RPC Server and Client Codecs for integration with net/rpc. This allows the seamless use of the go-codec library for rpc. You do not have to use gob.
  6. Standard field renaming via tags
  7. Support for omitting empty fields during an encoding.
    If a field has the zero value, it can be skipped. This will reduce the encoded length and reduce the decoding time. Make sure that the value being decoded into is a zero-value or a struct which has all fields initialized to their zero-values.
  8. Extensions to support efficient encoding/decoding of any named types.
    For example, type XYZ [8]uint8, type 3DPoint struct { X, Y, Z uint8 }. A user can encode and decode XYZ or 3DPoint above to/from a single unsigned integer.
  9. Encode a struct as an array, and decode struct from an array in the data stream.
    This is more compact and efficient, but requires that the exported fields of the struct stay in the same order.
  10. Comprehensive support for anonymous fields.
    Whether the anonymous field is a pointer or a value, codec will handle it.

Accompanying raw data for timing results shown

ENCODE (RUNTIME)
Benchmark__Std_Json___Encode	    5000	    124477 ns/op	   16313 B/op	     207 allocs/op
Benchmark__Json_______Encode	   10000	    108092 ns/op	    9267 B/op	      70 allocs/op

ENCODE (CODE GENERATION)
Benchmark__Json_______Encode	   20000	     33873 ns/op	     304 B/op	       3 allocs/op

DECODE (RUNTIME)
Benchmark__Std_Json___Decode	    2000	    359771 ns/op	   17992 B/op	     629 allocs/op
Benchmark__Json_______Decode	    3000	    205337 ns/op	   15344 B/op	     455 allocs/op

DECODE (CODE GENERATION)
Benchmark__Json_______Decode	    5000	    103326 ns/op	    8680 B/op	     210 allocs/op

You can see that, even without code generation, the performance of the go-codec library is extremely impressive. The standard library takes 20% more time and uses double the allocations during encode, and almost double the time during decode.

Getting this level of performance was no easy feat. But it was possible because we built the go-codec library to be high-performance, and support pluggable Handles. Each of these handles comes to about 500 lines of code.

The other articles in this go-codec series dig deeper into features of the library.

Tags: technology go-codec


Subscribe: Technology
© Ugorji Nwoke