The case for contract testing Protobufs, gRPC and Avro

The case for contract testing Protobufs, gRPC and Avro

Matt Fellows
The case for contract testing Protobufs, gRPC and Avro

When talking about Pact and contract-testing with customers and prospects, we commonly hear statements like "this problem would all but disappear if we just used Protobufs". This makes sense, because these sorts of self-describing data serialisation formats advertise themselves as a solution to the problem of passing data structures over the internet, with in-built versioning or backwards/forwards compatibility.

From this excellent post on the topic, we learn that schema evolution is core to the design goals of these technologies: can change the schema, you can have producers and consumers with different versions of the schema at the same time, and it all continues to work. That is an extremely valuable feature when you’re dealing with a big production system, because it allows you to update different components of the system independently, at different times, without worrying about compatibility.

If it is true that Protobufs and similar projects that use an Interface Definition Language (IDL) such as Thrift, Avro and Flatbuffers had backwards compatibility in mind when they were designed, why is it that it's the #1 requested feature on our open source product feature roadmap?

The case for contracts

Colourless green ideas sleep furiously

This absurd statement gives us a glimpse into the challenges any self-describing data format or abstract specification faces. The sentence is grammatically and syntactically correct, but carries no meaning (credit to Tim Jones who made this point in our AMA discussion on schemas vs contracts). That is to say, simply being able to communicate is not enough - we must be able to understand the semantics of the message.

Whilst the ability for old programs to parse a newer data structure and new programs to read an older one is helpful in certain contexts, that doesn't magically fix bugs with how your programs actually use the data. Under a new light, we'll begin to see these IDLs in much the same way as we think of schemas.

Let's start with a motivating example.

We have an Order message that has the field price . Let's say you have a consumer that reads these messages and tallies up the total (in 💰) for all the orders in a given time period. If you can remove price from the message and have the programs still communicate, that's great - you don't get network exceptions in your logs and your SRE team is happy - but the application is still very much interested in price, so you will still have a problem. This is the case whether or not a "read schema" is available at the time the message is consumed.

You need only to google "breaking change for x" to convince yourself that none of the major data serialisation formats solve this problem.

Self-describing data serialisation formats such as Protobufs, Avro, Thrift, GraphQL etc. do not provide any semantic guarantees

This alone would be enough to warrant additional testing, however as we'll see, there are a number of common problems they share, as well as unique idiosyncrasies that leave them susceptible to incompatibility issues.


Upon closer inspection of these technologies, there emerges a number of failure modes and challenges that must be overcome to bring safety to these systems:

  1. Managing breaking changes
  2. Message semantics
  3. Coordinating changes (forwards compatibility) and dependency management
  4. Providing transport layer safety
  5. Ensuring narrow type safety (strict encodings)
  6. Loss of visibility into real-world client usage
  7. Optionals and defaults: a race to incomprehensible APIs

Let's go through each of these in turn.

1. Managing breaking changes

Each technology has specific idiosyncrasies that must be managed to prevent breaking changes.

From the Buf docs (a lint tool to help prevent protobuf breaking changes):

Forwards and backwards compatibility is not enforced: While forwards and backwards compatibility is a promise of Protobuf, actually maintaining backwards-compatible Protobuf APIs isn't widely practiced, and is hard to enforce.

For example, you should never change the field descriptor. But this is just the tip of the iceberg. It gets more complicated when looking at things like optional fields and their default values, oneOf 's which enables a level of polymorphism, or the any type, which allows the caller to send a series of arbitrary bytes.

Here's a quote from a colleague on the dangers of default values with Protobufs, experienced on a client project:

A service that had create and update methods for an entity reused the same entity even though some of the fields were not updateable. This meant that fields were mistakenly overwritten when they omitted to pass the current value for them in update requests. When not setting a field explicitly, the standard library behaviour is to send a default value for the data type.

This is actually a case where forwards and backwards compatibility can actually be a bad thing - a network failure would have been better when omitting a required field (version 3 removes the ability to set required fields altogether).

Avro also has a number of similar limitations. Changing the name of the field needs coordination, and is tricky. Adding, modifying or removing fields is possible, but generally speaking requires making each type a union type to enable safety at the parser level (by providing a default value). This may be hard to retrofit if not initially considered, and has the effect of making the data structure less comprehensible (see 7).

Lesson: you can definitely introduce a breaking change at the level of the IDL itself.

2. Message semantics

The problem of message semantics as discussed above extends to the need to evolve those semantics over time. We need a mechanism to ensure that the meaning of a field is preserved as we introduce change, or can be accurately managed to prevent breaking changes with its consumers as each evolve.

Precisely because of some of the forwards/backwards guarantees, it's also possible to get into a situation where a client is able to receive a different type of message it wasn't intended to receive.

An unlikely, but illuminating example is the following situation:

message ProductUpdate {
  string event_type = 1;
  string product_name = 2;
  float product_price = 3;

message OrderUpdate {
  string event_type = 1;
  string order_id = 2;
  float order_total = 3;
  // ...
2 compatible protobuf messages, with completing different meanings

A client that knows how to parse ProductUpdate could actually receive an OrderUpdate , because it just so happens that the first three fields of both of the following messages have the same types and field number alignment! This is obviously likely to lead to strange problems.

Lastly, we need a way to know which concrete sets of requests and responses (i.e. a collection of fields) are actually valid within the confines of a message. If you have a message with many optional fields, and some combinations of them are valid, the schema and the compiler won't catch problems with those combinations.

This is particularly relevant when making heavy use of optional fields and polymorphic types (e.g. enums / oneOf).

For example, given the following message, it's unclear what response will be returned used under what circumstance:

message UpdateEvent {
  oneof event {
    ProductUpdate a = 1;
    OrderUpdate b = 2;
Example polymorphic protobuf

Whilst this example is trivial and likely manageable, you can see how large bodies or domain entities may complicate things.

Lesson: schemas are abstract, and introduce ambiguity which can lead to misinterpretations and bugs.

3. Coordinating changes and dependency management

If we accept (1) and (2), we then need a mechanism to ensure that collaborating systems remain compatible as we introduce change.

It must be said that Avro does have a novel solution to (1) - the ability to ship the schema along with the messages (see Object Container File). This does not solve (2), however, nor 4-7.

Lesson: schemas are static, and still require some level of version control and coordination among parties.

4. Providing transport layer safety

The protocol is separate to any transport layer, and there’s nothing intrinsic that enforces that a particular route exists and has the expected schema.

For example, if you were sending protobufs over HTTP, the endpoint/path may change.

Lesson: IDLs don't provide any guarantees for the transport layer.

5. Ensuring narrow type safety (strict encodings)

Just like with XML and JSON, the types that can be represented over the wire are limited to the set of types provided by the schema.

For example, you may want to ensure a given field must be a uuid, a specific date format or a semver compatible string. Depending on the IDL, you'll have more or less options at your disposal to help address this problem.

Lesson: schemas don't give you guarantees of narrow types.

6. Loss of visibility into real-world client usage

In lieu of a consumer contract specifying the actual real-word use by a given API's users, or log data that tracks actual runtime usage, the consumed API surface area becomes invisible to API owners.

This likely also results in a loss of knowing which version of a schema a particular consumer is on.

All of this makes evolution more difficult, because you can't track the specific needs of each consumer: API owners must therefore assume all users consume the entire schema. Both at the message and field level.

This is especially true for messaging systems like Kafka, because you may not have visibility into who your consumers are.

Lesson: IDL usage makes it hard to know real-world consumer behaviour, making it harder to evolve safely.

7. Optionals: a race to incomprehensible APIs

Optionals help solve one problem (wire compatibility) but introduce some new ones.

From protobuf guide:

Some engineers at Google have come to the conclusion that using required does more harm than good; they prefer to use only optional and repeated. However, this view is not universal.

In fact, in proto3 the required keyword has been removed altogether.

Avro has a similar problem, where to support the ability to add/remove fields, but you must supply a default such as null using a union type.

Eventually, with enough iterations, you may find your schema devolving into a pit of optional fields.

This reduces comprehensibility and the ability to reason about the interface, and it makes the challenge of (2) harder, because it may not be clear all of the valid request/response combinations.

Indeed, default value behaviour is more nefarious that it may first seem.

Unlike REST where verbs have implied behaviour with respect to the presence or absence of fields (e.g. POST vs PUT  vs PATCH semantics), clients may not be able to distinguish between "not set" and "default" value (and in some cases these are language dependent).  Following from the quote in (1):

I lost count of how many bugs we had at <client> because people where unaware of the default value behaviour

Lesson: optional fields and their default values impact developer comprehension and can lead to hard-to-detect bugs.

Contract Testing

Given these concerns to manage, how might contract testing help to address these issues?

Problem Contract testing pros
1 Reduced chance implementation drift, as real application code is executed to produce the contract (consumer) and, real code is invoked to verify the contract (provider). Primary objective of contract testing.
2 Specification by example removes ambiguity that abstract specifications may create, making clear the situations for which certain fields and messages will be communicated.
3 Provides a clear, first-class process for service evolution through dependency management of the contract and interacting components (see can-i-deploy)
4 Transport concerns like HTTP verbs, paths and headers may be encapsulated in a contract.
5 Matchers provide advanced field level type checking, including regex, semantic matchers (such Dates) and custom matchers*.
6 Consumed API surface is made visible, because consumer expectations are made clear in their contract tests, down to the specific message and field level. This also enables removal of unused fields and entire messages.
7 Specification by example increases comprehension - contract tests use representative examples of interactions to ensure the system works appropriately. This provides useful "how to use" documentation, as well as "how it works" documentation.

* via custom plugins


Whilst IDLs like Protobufs have a number of benefits, such as high performance and code generation, they don't seem to provide any more practical guarantees than the current predominant communication style of RESTful APIs using JSON.

Pact, and contract testing as an approach, is designed to address all 7 of the failure modes described above. Whilst Pact doesn't (yet) currently officially support these technologies, we're currently building the plugin infrastructure in Pact to enable gRPC and Protobufs to be tested in this fashion. If you want to stay in the loop and provide early feedback, please join our Developer Preview Program.

If you were sold by these arguments can't wait a minute longer, there are other ways to make it work right now.

Here's a sneak peak of the current prototype in action, verifying the Protobuf plugin (yes, we'll use Pact to test the Pact plugins 🤯):

Demonstration of a failing protobuf contract test, because a field is not semver compatible
arrow-up icon