View articles in the go-codec
series, source at http://github.com/ugorji/go
go-codec
supports compile-time generation of encoders and decoders for named types,
which does not incur the overhead of reflection in the typical case,
giving 40% to 100% performance improvement over the idiomatic runtime introspection mode.
Idiomatic encoding and decoding types within go typically relies on the reflection capabilities of the go runtime. This affords flexible performance without the need for a pre-compilation step; the go types contain all the information needed and the runtime exposes the full types via reflection. However, introspecting the runtime to get this information has a noticeable overhead, which can be eliminated by a pre-compilation/code-generation step.
To eliminate that overhead, a pre-compilation step must be done to create the code
which would have been inferred at runtime.
This is why Protocol Buffers, Avro, etc have better performance than runtime-based systems.
go-codec
now provides the same capabilities, with the accompanying 2X-20X performance improvement
depending on the size and structure of the named type.
Let us start with some benchmark numbers to whet your appetite.
Encoding - Runtime
Benchmark__Msgpack____Encode-8 14095 84318 ns/op 3192 B/op 44 allocs/op
Benchmark__Binc_______Encode-8 14058 85184 ns/op 3192 B/op 44 allocs/op
Benchmark__Simple_____Encode-8 13978 85796 ns/op 3192 B/op 44 allocs/op
Benchmark__Cbor_______Encode-8 13983 87215 ns/op 3192 B/op 44 allocs/op
Benchmark__Json_______Encode-8 6051 188551 ns/op 3256 B/op 44 allocs/op
Benchmark__Std_Json___Encode-8 5514 218973 ns/op 74474 B/op 444 allocs/op
Benchmark__Gob________Encode-8 6646 177393 ns/op 170414 B/op 591 allocs/op
Benchmark__Bson_______Encode-8 4936 239069 ns/op 222828 B/op 364 allocs/op
Encoding - CodeGen
Benchmark__Msgpack____Encode-8 28369 41501 ns/op 288 B/op 2 allocs/op
Benchmark__Binc_______Encode-8 26284 45098 ns/op 288 B/op 2 allocs/op
Benchmark__Simple_____Encode-8 26959 44700 ns/op 288 B/op 2 allocs/op
Benchmark__Cbor_______Encode-8 26628 44320 ns/op 288 B/op 2 allocs/op
Benchmark__Json_______Encode-8 8064 141844 ns/op 352 B/op 2 allocs/op
Decoding - Runtime
Benchmark__Msgpack____Decode-8 5866 203320 ns/op 67387 B/op 913 allocs/op
Benchmark__Binc_______Decode-8 5438 223080 ns/op 67390 B/op 913 allocs/op
Benchmark__Simple_____Decode-8 5958 203158 ns/op 67360 B/op 913 allocs/op
Benchmark__Cbor_______Decode-8 5793 206755 ns/op 67373 B/op 913 allocs/op
Benchmark__Json_______Decode-8 3105 390624 ns/op 89300 B/op 1041 allocs/op
Benchmark__Std_Json___Decode-8 1365 855218 ns/op 138558 B/op 3032 allocs/op
Benchmark__Gob________Decode-8 4135 296280 ns/op 156140 B/op 2242 allocs/op
Benchmark__Bson_______Decode-8 2582 467415 ns/op 183853 B/op 4085 allocs/op
Decoding - CodeGen
Benchmark__Msgpack____Decode-8 9934 121373 ns/op 64070 B/op 871 allocs/op
Benchmark__Binc_______Decode-8 9210 131006 ns/op 64072 B/op 871 allocs/op
Benchmark__Simple_____Decode-8 9733 122189 ns/op 64068 B/op 871 allocs/op
Benchmark__Cbor_______Decode-8 9968 123628 ns/op 64085 B/op 871 allocs/op
Benchmark__Json_______Decode-8 4257 283405 ns/op 87471 B/op 1002 allocs/op
We see that the encoding and decoding times for the binary formats supported by go-codec
are pretty similar, so we will just use cbor
as representative of the binary formats,
and also compare json
benchmark numbers.
The table below compares encode using runtime support only against a baseline of code generation.
Time | Memory | Allocations | |
---|---|---|---|
Cbor | 2.0 X | 11 X | 22 X |
Json | 1.3 X | 9 X | 22 X |
The table below compares decode using runtime support only against a baseline of code generation.
Time | Memory | Allocations | |
---|---|---|---|
Cbor | 1.7 X | 1.05 X | 1.05 X |
Json | 1.4 X | 1.02 X | 1.04 X |
There is very clear benefit to code generation. Code generation gives you better performance in clock time, cpu time and memory usage/allocations. The benefits, especially in memory use, are more pronounced during encoding than during decoding.
I call sheninegens! reflection in go is not slow. In fact, interfaces/type-switch/etc use the same runtime introspection mechanism under the hood that reflection does.
Let me explain. In go, reflection is a thin layer of runtime introspection support. There is a small computational cost to compute or expose requested information about the types already known to the runtime, or to create new values and return a wrapper (reflect.Value) around them.
However, that thin layer requires that most values be allocated on the heap, and the use of interfaces prevents benefits of escape analysis and inlining. We see a consistent overhead of about 35% added by the runtime.
Note that reflection is an intrinsic part of the go runtime,
and used in core foundational packages like fmt
.
codecgen works off a single interface.
type Selfer interface {
CodecEncodeSelf(*Encoder)
CodecDecodeSelf(*Decoder)
}
When encoding or decoding a type, if it implements the codec.Selfer
interface above,
then it will handle its own encoding and decoding. The Encoder/Decoder checks this before
extension support or if the type also implements encoding.(Text|Binary)(M|Unm)arshaler
interfaces.
NOTE: the Canonical option is ignored (not supported AT THIS TIME). If you need Canonical support (e.g. for cbor), then do not use codecgen.
codecgen uses this knowledge to generate type-safe code which does exactly what the regular runtime introspection code does at run-time. It is an amazing feat.
With codecgen, the full feature-set of codec is still supported, including:
codecgen builds fully atop the go-codec package. We needed it to work exactly as the runtime introspection works, so we can leverage all the IP built into the package already.
go-codec at runtime will parse each type needed and create an in-memory structure specifying all important information about the type. codecgen uses all that information and replicates the runtime logic exactly.
codecgen runs in multiple phases:
codec.Selfer
implementationcodec.Gen(...)
function, passing in all the types gatheredgo run -tags=XYZ transient-file.go
The transient file looks like this (*error handling removed for conciseness):
fout, err := os.Create("values_codecgen_generated_test.go")
var out bytes.Buffer
var typs []reflect.Type
var t0 codec.AnonInTestStruc
typs = append(typs, reflect.TypeOf(t0))
var t1 codec.AnonInTestStrucIntf
typs = append(typs, reflect.TypeOf(t1))
// <snip>
codec.Gen(&out, "codecgen", "codec", false, typs...)
bout, err := format.Source(out.Bytes())
fout.Write(bout)
The generated file looks like this (details elided):
func (x *MyType) CodecEncodeSelf(e *Encoder) {
}
func (x *MyType) CodecDecodeSelf(e *Decoder) {
}
Using codecgen is very straightforward.
Download and install the tool
go get -u github.com/ugorji/go/codec/codecgen
Run the tool on your files
The command line format is:
codecgen [options] (-o outfile) (infile ...)
% codecgen -?
Usage of codecgen:
-c string
codec path (default "github.com/ugorji/go/codec")
-d int
random identifier for use in generated code
-nr string
regex for type name to exclude (default "^$")
-nx
do not support extensions - support of extensions may cause extra allocation
-o string
out file
-r string
regex for type name to match (default ".*")
-rt string
tags for go run
-st string
struct tag keys to introspect (default "codec,json")
-t string
build tag to put in file
-x keep temp file
% codecgen -o values_codecgen.go values.go values2.go moretypedefs.go
That is it
Option | Description |
---|---|
-o | codecgen will generate a single output file. |
-c | If you have used vendored the codec package into a different place, use this option to specify a different package path for the codec package. Most users do not need this. |
-t | Users may want to only use the code generated file when specific build tags are specified. You can pass some tags and the generated file will have them. |
-st | Users can customize the struct tags keys to introspect |
-rt | codecgen runs by creating a temporary file, and then using go run to execute it. If the file that you are generating values against needs a build tag, specify it to the codecgen tool. |
-x | This is a debugging switch to not delete the transient file which must be passed to go run . |
-d | Specify the random integer used during codecgen. This helps reduce churn in generated output, etc. |
-r | Specify regex for type name to match (default “.*“) |
-nr | Specify regex for type name to exclude (default “^$”) |
-nx | do not support extensions in generated files - this may help reduce some allocation if you know that you never use extensions |
Yes.
codecgen can be used easily with go generate.
The easiest way is to create a file, add the generate tag to it, and call codecgen in it. A sample file looks like this:
//+build generate
package mypackage
//go:generate codecgen -o values.generated.go file1.go file2.go file3.go
Run go generate
in the directory containing the file.
go-codec updates an internal version each time an incompatible change occurs to the library.
Within an init
function, we check that the generated code matches the current supporting library.
If the check fails, we panic in the init
so that the application never starts until
the user updates.
The error message looks like:
codecgen version mismatch: current: 1, need 2. Re-generate file: /home/ugorji/depot/repo/src/ugorji.net/codec/values_codecgen_generated_test.go
If you get a similar panic message, please use an old library or regenerate your file.
There are a few other code-generation libraries created for specific formats. They had issues which I will list below:
msgp https://github.com/philhofer/msgp/
The others were non-starters, as they failed to generate implementations for TestStruc.
megajson https://github.com/benbjohnson/megajson
Error: Field contains no name: &{<nil> [] AnonInTestStruc <nil> <nil>}:
ffjson https://github.com/pquerna/ffjson
panic: runtime error: index out of range
bsongen http://godoc.org/github.com/youtube/vitess/go/cmd/bsongen
&{Struct:696 Fields:0xc208063380 Incomplete:false} is not a simple type