Skip to main content

Data Serialization

I always read that JSON is text based and protobuf is binary. Still, one thing confused me. In the end, everything is binary.

text-binary

Text Based

In text based methods like JSON, the whole payload is text. Every character is encoded with UTF-8 or some other encoding.

  1. Integers and decimals are also treated as characters. Each one is encoded.
  2. Even boolean values true and false are encoded as 4 and 5 characters.
  3. Every quote and bracket is encoded as a character.
Text based serialization provides human readability

These encoding methods make it easy to decode the whole payload and read the contents.

Binary Based

In binary based methods like Protobuf, the encoding aims to shrink the payload. It writes binary values for the data directly.

  1. Integers are fully converted to binary using VARINT.
  2. Decimals are converted to binary using IEE754 based floating point numbers.
  3. Strings are still encoded using UTF-8.
  4. Boolean as just 1 byte.
protobuf features

Binary encoding is just one protobuf feature. It has others too. For example, it can skip the key names. This shrinks the payload even more.