By Ugorji Nwoke   30 May 2013   /blog   technology

Announcing Binc data interchange format

Binc is a lightweight, compact, limitless, schema-free, precise, binary, high-performance, feature-rich, language-independent, multi-domain, extensible, data interchange format for structured data.

See the format documented at http://www.ugorji.net/project/binc

UPDATE:

See Announcement of enhancements. Highlights:

  1. Binc spec is now stored https://github.com/ugorji/binc
  2. Binc now support for symbols, compact variable-length integers and compact floats
  3. Encoded size is now 25% less than in v0.1.0 for representative datasets. v0.1.0 size was already lower than all compared encodings.
  4. Performance is still better than compared encodings.

Let’s talk about each of these descriptions one by one:

  1. Lightweight/Compact:
    In tests, Binc encoding has been shown to take up less than 60% the size of JSON, BSON and other lightweight encodings.

    Care was taken to support compact encodings for common values. For example, signed integers from -1 to 16, booleans, and other special values are encoded with only one byte. For small containers, the size(length) is encoded into a single byte.

  2. Limitless:
    Binc allows for extremely high precision integers (up to 2^15 bits of precision) both signed and unsigned, and the full spectrum of IEEE 754 floating point types (including decimals, extended precision binary floats, etc). Maps and arrays can have lengths that fit into a unsigned 64 bit integer value.

  3. Schema-Free:
    Just like JSON, a schema is not required. This is conceptually an advantage of others like protocol buffers, Thrift, etc which require a schema and a compilation step before use.

  4. Precise:
    Binc aims to remove all ambiguity in the format. There are distinct signed and unsigned integers, distinct precisions, distinct unicode strings (utf8 vs utf16LE, utf16BE, utf32LE, utf32BE), distinct bytearray (binary) type, etc.

  5. Binary:
    Binc is a binary encoding format. This affords significant benefits in space (encoding size) and time (encoding and decoding real and cpu time).

  6. High-Performance:
    By “packing” the bits intelligently while still allowing easy traversal, encoding and decoding performance is achieved. We have tests that show encoding and decoding the same structure taking about less than 40% the time that JSON takes.

  7. Feature-Rich:
    By not taking a lowest-common denominator approach, the codec can represent a larger surface area, even beyond types natively supported by any target language. JSON for example has types which are limited to what Javascript supports. Binc goes beyond that to support arbitrary precision signed and unsigned integers, all IEEE 754 2008 floating point types, very large arrays and maps (with size up to maximum value of unsigned 64-bit integer), special values like NaN, +/- Infinity, etc. Binc also supports rich timestamps (with timezone data, dst flag, nanosecond precision) using 4-14 bytes only.

  8. Language-Independent:
    Binc is not limited by any specific language. Instead, implementations are free to expose the extent of their support.

  9. Multi-Domain Use:
    By supporting arbitrary precision integers, it is a good fit for scientific data interchange. By supporting precise decimal types, it is a good fit for financial data interchange. Different domains would require different levels of support.

  10. Extensible:
    Binc natively supports user-defined extensions. This allows users to transfer custom types and expose how they will be encoded and decoded.

The decision to create Binc was not done lightly. A lot of analysis for features and performance was done. Other schema-free binary codecs were evaluated before Binc was created. These include:

  1. bson: verbose format with features in use only by mongodb
  2. bjson: simplistic, lacks features
  3. ubjson: stays too true to json. lacks extensions, binary support
  4. msgpack: lacks timestamp, binary and extensions
  5. tnetstrings: simplistic and lacking features
  6. smile: complex. lacking features
  7. binary plist: simplistic and lacking features
  8. protocol buffers, thrift, avro: require schema and pre-compilation step

In particular, my application use-case required extreme compactness and high encoding/decoding performance without compression. I also required precise support for timestamps, user-defined extensions, and distinct binary and string types. None of thees encodings supported these features natively.

The closest I got was msgpack which I had standardized on and engaged the community and author to include timestamp and distinct binary and string types. However, after a few months working on it, progress just halted and could not be jumpstarted (see https://github.com/msgpack/msgpack/issues/128).

However, I believe Binc has significant features beyond those provided by msgpack, and stands tall on its own.

We implemented a Binc encoder/decoder using the same high-performance codec library used to build the de-facto and best performing msgpack encoder/decoder for the Go Language, and ran extensive benchmarks agains other encoders. The results are reproduced below, and show the 40% savings in data size and 60% savings in time for encoding and decoding vs others.

..............................................
Benchmark: 
    Struct recursive Depth:             2
    ApproxDeepSize Of benchmark Struct: 136311 bytes
Benchmark One-Pass Run (with Unscientific Encode/Decode times): 
       msgpack: len: 72599 bytes,    encode: 518.595µs,  decode: 485.455µs
          binc: len: 65777 bytes,    encode: 270.782µs,  decode: 457.233µs
        simple: len: 74465 bytes,    encode: 264.539µs,  decode: 407.068µs
          cbor: len: 72131 bytes,    encode: 242.007µs,  decode: 479.807µs
          json: len: 92013 bytes,    encode: 635.15µs,   decode: 839.028µs
      std-json: len: 92429 bytes,    encode: 684.876µs,  decode: 2.338646ms
           gob: len: 64701 bytes,    encode: 524.783µs,  decode: 428.706µs
     v-msgpack: len: 72573 bytes,    encode: 1.184547ms, decode: 974.376µs
          bson: len: 100582 bytes,   encode: 884.651µs,  decode: 1.19786ms
..............................................
Benchmark__Msgpack____Encode-8             10000        183961 ns/op       10224 B/op         75 allocs/op
Benchmark__Binc_______Encode-8             10000        206362 ns/op       12551 B/op         80 allocs/op
Benchmark__Simple_____Encode-8             10000        193966 ns/op       10224 B/op         75 allocs/op
Benchmark__Cbor_______Encode-8             10000        192666 ns/op       10224 B/op         75 allocs/op
Benchmark__Json_______Encode-8              3000        475767 ns/op       10352 B/op         75 allocs/op
Benchmark__Std_Json___Encode-8              3000        525223 ns/op      256049 B/op        835 allocs/op
Benchmark__Gob________Encode-8              5000        270550 ns/op      333548 B/op        959 allocs/op
Benchmark__Bson_______Encode-8              2000        747360 ns/op      715539 B/op       5629 allocs/op
Benchmark__VMsgpack___Encode-8              2000        637388 ns/op      320385 B/op        542 allocs/op
Benchmark__Msgpack____Decode-8              5000        370340 ns/op      120352 B/op       1210 allocs/op
Benchmark__Binc_______Decode-8              3000        443650 ns/op      126144 B/op       1263 allocs/op
Benchmark__Simple_____Decode-8              3000        381155 ns/op      120352 B/op       1210 allocs/op
Benchmark__Cbor_______Decode-8              5000        370754 ns/op      120352 B/op       1210 allocs/op
Benchmark__Json_______Decode-8              2000        719658 ns/op      159289 B/op       1478 allocs/op
Benchmark__Std_Json___Decode-8              1000       2204258 ns/op      276336 B/op       6959 allocs/op
Benchmark__Gob________Decode-8              5000        383884 ns/op      256684 B/op       3261 allocs/op
Benchmark__Bson_______Decode-8              2000       1146851 ns/op      373121 B/op      15703 allocs/op

We hope you find good use for the Binc format.

If you are looking for a high-performance library for it, please check out the Go Library codec at https://github.com/ugorji/go/tree/master/codec#readme . You can find API docs for it at http://godoc.org/github.com/ugorji/go/codec .

Tags: technology


Subscribe: Technology
© Ugorji Nwoke