Google (my current employer) has finally open sourced protocol buffers, the data interchange format we use for internal server-to-server communication. The blogosphere’s response? “No wireless. Less space than a Nomad. Lame.”

Aaaaanyway…

Protocol buffers are “just” cross-platform data structures. All you have to write is the schema (a .proto file), then generate bindings in C++, Java, or Python. (Or Haskell. Or Perl.) The .proto file is just a schema; it doesn’t contain any data except default values. All getting and setting is done in code. The serialized over-the-wire format is designed to minimize network traffic, and deserialization (especially in C++) is designed to maximize performance. I can’t begin to describe how much effort Google spends maximizing performance at every level. We would tear down our data centers and rewire them with $500 ethernet cables if you could prove that it would reduce latency by 1%.

Besides being blindingly fast, protocol buffers have lots of neat features. A zero-size PB returns default values. You can nest PBs inside each other. And most importantly, PBs are both backward and forward compatible, which means you can upgrade servers gradually and they can still talk to each other in the interim. (When you have as many machines as Google has, it’s always the interim somewhere.)

Comparisons to other data formats was, I suppose, inevitable. Old-timers may remember ASN.1 or IIOP. Kids these days seem to compare everything to XML or JSON. They’re actually closer to Facebook’s Thrift (written by ex-Googlers) or SQL Server’s TDS. Protocol buffers won’t kill XML (no matter how much you wish they would), nor will they replace JSON, ASN.1, or carrier pigeon. But they’re simple and they’re fast and they scale like crazy, and that’s the way Google likes it.

§

Twenty comments here (latest comments)

I didn’t understand a single word you just said, but it still sounds pretty cool.

— Jake #

The protests against Protocol Buffers are hilarious.

XML has but one justification for it’s existence. Not RPC. Not document formats. Not databases. Not configuration files. Not programming languages (We hate you, CF). Standardised data exchange is the word.

— Mikkel Høgh #

Well said. The hating rants were a fun read.

— Tyler F #

What about Erlang?

— anonymouse #

As someone who was irritated every time he had to hand-roll packing and unpacking code for yet another proprietary IP-based protocol (htonl(), anyone?), the utility of this is not lost on me. Very nice.

— Avdi #

I’d almost forgotten about IIOP. Actually though, PB reminds me the most of good old XDR (RFC 1832 & 4506).

Instead of a .proto you have a .x file. And although originally it was only used with C (later C++), I’ve used it to exchange data between C and Python and on different endian architectures. XDR though tries to keep everything 4-byte aligned at the expense of adding padding bytes, which was a good performance choice back when CPUs were so slow and had data alignment constraints.

— Deron Meranda #

Holy fuck! Links! And since when did your blog start dissolving into a fractal?

— Justin #

I think this post may be the most cogent piece of writing I’ve read on them (including the announcement blog post).
That may be part of your problem?

— Phill #

If I were going to send/recieve Protocol Buffer messages over HTTP, is there any sort of media type I should be using?

— Dan #

It reminded me of the IFF format of the eighties (http://en.wikipedia.org/wiki/Interchange_File_Format). No schema support but it had all the other nice features like skipping parts of the format unknown to the reader and standard dictionary and list chunks.

— Carlos #

Protocol Buffers: Leaky RPC :: Steve Vinoski’s Blog (pingback)

Dan, if you’re going to use protocol buffers over HTTP, you might as well use JSON. The overhead of text protocols like HTTP is exactly what binary packet interfaces like proto bufs are trying to avoid.

— dbt #

In Practical Cryptography, Bruce Schneier points out that one advantage of XML for communicating over a secure channel is that you can do an integrity check of your data on the other end. That is, because the data is self describing, you can make sure that dropping or inserting a random byte hasn’t shifted the offsets for all of your fields and corrupted the underlying data. If Google is optimizing for low latency within their own networks, I would not think that this would be an issue, but is there a method for doing for an integrity check where relevant? (A corner case might be checking that you’re not parsing a stream with the wrong .proto schema)

— Mark #

“Protocol buffers are “just†cross-platform data structures.”
That’s how it always starts, it’s just a cross-language type serializer. And then you get RPC, or some dressed up equivalence. You only need to read the bottom of this page to see that:
http://code.google.com/apis/protocolbuffers/docs/overview.html
Sure, if you control the endpoints, and your system is in capable hands, etc, it won’t spiral out of control, etc, etc. But that’s almost always true, even for many of the things you’ve criticized in the past. We’ve seen this movie. Come back in a few years.

— Ali #

I’ve published some more comments, as well as some questions, in the blog posting following the one you link to:

Any chance someone from Google could answer the questions?

— Steve Vinoski #

Sprazzi di Lucidità » links for 2008-07-14 (pingback) links for 2008-07-14 « Breyten’s Dev Blog (pingback) Dare Obasanjo aka Carnage4Life - Scalability: I Don't Think That Word Means What You Think It Does (pingback)

I wrote an article on XML.Com on how Google hates XML.
I am _STILL_ trying to understand why you did not use JSON. I KNOW you serialize over the wire in a different format, so WHY invent something new? It seems YOUR format is very similar to JSON - it is just missing a comma.
WTF?

— Ric #

Protocol Buffers: deja yawn | soabloke (pingback)

Respond privately

I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)



§

firehosecodemusicplanet

© 2001–8 Mark Pilgrim


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser