An Introduction to Protocol Buffers with Ruby

If you build APIs for the web, you’re definitely familiar with JSON. It’s a well known format for exchanging data between servers. It has great serialization/deserialization support across languages. Also, it’s human readable. All around JSON is a great tool and there are many benefits to using JSON.

It’s kind of a no-brainer choosing a data exchange format for your services, use JSON. But before choosing a format it’s good to know the alternatives.

An alternative to JSON is a technology called Protocol Buffers. It’s developed by Google and was an internal tool for about 7 years before released to the public in 2008 1. If you’re familiar with Apache Thrift you should know that Protocol buffers is quite similar. From the website:

Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.

So how does it work? The process looks like this:

  • Define your data in a .proto file
  • Use the protoc compiler to generate source code in one of the supported languages
  • Then, serialize away!

Define your data in a .proto file

The first step in using Protocol buffers (or protobuf for short), is to define your data in .proto files. (Read the language guide for an understanding of the types that protobuf supports). Defining your data requires you to define messages for your types. Here is an example message from a user.proto file:

1
2
3
4
5
6
7
syntax = "proto3";

message User {
  string first_name = 1;
  string last_name = 2;
  string email = 3;
}

The first line declares a version the compiler should target. The meat of the .proto file is the User message. The User message defines three string type fields: first_name, last_name, and email. When defining a field you must give it at least three pieces of information: a type, a name, and a unique numbered tag. The unique numbered tag identifies the field in the binary format of a message.

After writing your proto file you’re ready to compile it using the protoc compiler.

Use the protoc compiler

First, download a release of the protoc compiler from the project’s releases page. Then, compile your .proto file using the following command:

1
protoc -I ./ user.proto --ruby_out=./

You can use the –help flag to see an explanation of the options and more usages of the protoc command. The command above asks the protoc compiler to compile a user.proto file and output the source code in Ruby. The compiler will generate a user_pb.rb file in the path specified by the –ruby_out flag. It should contain something like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: user.proto

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_message "User" do
    optional :first_name, :string, 1
    optional :last_name, :string, 2
    optional :email, :string, 3
  end
end

User = Google::Protobuf::DescriptorPool.generated_pool.lookup("User").msgclass

You can now use the User class to serialize and deserialize your data using protobuf:

1
2
3
4
5
6
7
8
9
10
11
12
# NOTE: You'll need to install the 'google/protobuf' gem
require 'google/protobuf'
require './user_pb.rb'

user = User.new
user.first_name = 'Andrew'
user.last_name = 'Sinner'
user.email = '[email protected]'
encoded = user.to_proto
user = User.decode(encoded)
puts user.first_name
# => "Andrew"

Benefits of using Protocol Buffers

The example above demonstrates one of the powerful features of Protocol buffers. Because protobuf is language neutral you define your data only once. You can integrate the compiler into your build process for keeping consumers up to date.

Besides being language-neutral protobuf is very efficient at serialization. Auth0 has provided their own analysis on protobuf’s efficiency. Saving a few bytes per message may not seem like a big deal. But when running high volume services you begin to understand bloat adds up fast.

In summary, Protocol buffers provide performance advantages over JSON. This benefits services operating (or expecting to operate) at a high volume. If you don’t need performance you can leverage the language-neutral benefit. Maintaining a single source of truth for your data keeps your clients (services, UI, etc.) up to date.