View articles in the go-codec
series, source at http://github.com/ugorji/go
For data transfer between systems to occur, the sending side must encode the data structures into a stream of bytes, and the receiving side must efficiently decode the stream of bytes into a representative data structure.
There is efficient and extensive support for this when using go as your language runtime. The standard library provides support for the following general-purpose encodings:
In addition, the Go Authors and the community at large provide libraries for the following popular encodings:
Let’s compare these in the table below:
Encoding | Binary | Streaming | Mandatory Codegen Phase | Symbols |
---|---|---|---|---|
Json | N | Y | N | N |
msgpack | Y | N | N | N |
cbor | Y | Y | N | N |
protocol buffers | Y | N | Y | Y |
cap’n’proto | Y | N | Y | N |
binc | Y | N | N | Y |
gob | Y | N | N | N |
Each of the columns in the table above show an orthogonal way of comparing different encoding formats.
Text encodings are human-readable and human-writable, and can be examined by the human eye. JSON is a perfect example.
The format of a binary encoding is usually much simpler. Binary encodings do not need separators between values in the stream. They do not need delimiters to separate key-value mappings from sequences, etc.
Due to these, binary encodings have the following advantages:
Some would argue that the “compactness” argument can be mitigated by using post-compression. This is true, but that increases the CPU and memory usage when encoding into and decoding from a stream.
Streaming support refers to the ability to encode a sequence of items or key-value pairs into a stream, or decode same sequence from a stream, without knowing the number of elements in the sequence.
This is required for proper memory management. The encoder need not know the full number of elements before encoding starts. On the other side, the decoder need not reserve a large amount of memory before decoding starts.
Due to use of separators and delimiters, many text formats support streaming implicitly. However, many binary formats (e.g. messagepack) do not support streaming natively.
Some encoding formats pride themselves on requiring a schema, to dictate the structure of the data and ensure that binary compatibility is maintained as the data changes.
I believe this was a great idea at a time. However, many languages have strict type systems that could enforce the schema without an external compiler being required to specify the format.
This was a motivating factor in creating gob, go’s native binary format provided with the standard library.
Symbols are a way of de-duplicating values (especially strings) which repeat a lot in the stream.
Consider a key-value map which has the same set of keys for each object in the stream.
Without symbols, the keys will be repeated unnecessarily, leading to increased CPU and wall time during encoding and decoding.
With symbols, the key is stored as a symbol in the stream, and the symbol is put in the stream wherever that key would have been. This leads to reduced length of the encoded byte stream, at the expense of slightly increased encoding time due to the requirement to lookup the integer symbol mapped to a string value.
The standard library provides support for gob and json.
The argument against gob is that:
There are other high-quality encodings with high-quality libraries available, if gob does not fit your use-case.
Serialization formats enable more efficient remote procedure calls in Go.
Go has a net/rpc package which allows you use any serialization format of your choosing.
There are implementations for:
The go ecosystem provides a number of high-quality packages for popular encoding formats. These packages have been extensively tested and used by companies who trust that the libraries do not introduce dirty data into their systems.
Go forth and transfer your data, knowing that the go community has your back.