Unexpected Adventures in JSON Marshaling

Recently, one of our engineering teams encountered what seemed like a fairly straightforward issue: When they attempted to store UUID values to a database, it produced an error claiming that the value was invalid. With a few tweaks to one of our internal libraries, our team was able to resolve the issue. Or did they?

Fast forward one month later, and a different team noticed a peculiar problem. After deploying a new release, their service began logging strange errors alerting the team that the UUID values from the redrive queue could not be read.

So what went wrong? What we soon realized is that when we added a new behavior to our UUID library to solve our first problem, we inadvertently created a new one. In this blog post, we explore how adding seemingly benign new methods can actually be a breaking change, especially when working with JSON support in Go.  We will explore what we did wrong and how we were able to dig our way out of it. We’ll also outline some best practices for managing this type of change, along with some thoughts on how to avoid breaking things in the first place.

When Closing a Functional Gap Turns Into a Bug

This all started when one of our engineering teams added a new PostgreSQL database and ran into issues. They were attempting to store UUID values in a JSONB column in the PostgreSQL database using our internal csuuid library, which wraps a UUID value and adds some additional functionality specific to our systems. Strangely, the generated SQL being sent to the database always contained an empty string for that column, which is an invalid value.

INSERT INTO table (id, uuid_val) VALUES (42, '');

ERROR: invalid input syntax for type json

Checking the code, we saw that there was no specific logic for supporting database persistence.  Conveniently, the Go standard library already provides the scaffolding for making types compatible with database drivers in the form of the database/sql.Scanner and database/sql/driver.Valuer interfaces. The former is used when reading data from a database driver and the latter for writing values to the driver. Each interface is a single method and, since a csuuid.UUID wraps a github.com/gofrs/uuid.UUID value that already provides the correct implementations, extending the code was straightforward.

With this change, the team was now able to successfully store and retrieve csuuid.UUID values in the database.

Free Wins

As often happens, the temptation of “As long as we’re updating things …” crept in. We noticed that csuuid.UUID also did not include any explicit support for JSON marshaling. Like with the database driver support, the underlying github.com/gofrs/uuid.UUID type already provided the necessary functionality, so extending csuuid.UUID for this feature felt like a free win.

If a type can be represented as a string in a JSON document, then you can satisfy the encoding.TextMarshaler and encoding.TextUnmarshaler interfaces to convert your Go struct to/from a JSON string, rather than satisfying the potentially more complex Marshaler and Unmarshaler interfaces from the encoding/json package.

The excerpt from the documentation for the Go standard library’s json.Marshal() function below (emphasis mine) calls out this behavior:

Marshal traverses the value v recursively. If an encountered value implements the Marshaler interface and is not a nil pointer, Marshal calls its MarshalJSON method to produce JSON. If no MarshalJSON method is present but the value implements encoding.TextMarshaler instead, Marshal calls its MarshalText method and encodes the result as a JSON string. The nil pointer exception is not strictly necessary but mimics a similar, necessary exception in the behavior of UnmarshalJSON.

A UUID is a 128-bit value that can easily be represented as a 32-character string of hex digits; that string format is the typical way they are stored in JSON. Armed with this knowledge, extending csuuid.UUID to “correctly” support converting to/from JSON was another simple bit of code.

Other than a bit of logic to account for the pointer field within csuuid.UUID, these two new methods only had to delegate things to the inner github.com/gofrs/uuid.UUID value.

At this point, we felt like we had solved the original issue and gotten a clear bonus win. We danced a little jig and moved on to the next set of problems.

Celebrations all around!

A Trap Awaits

Unfortunately, all was not well in JSON Land. Several months after applying these changes, we deployed a new release of another of our services and started seeing errors logged about it not being able to read in values from its AWS Simple Queue Service (SQS) queue.  For system stability, we always do canary deployments of new services before rolling out changes to the entire fleet.  The new error logs started when the canary for this service was deployed.

Below are examples of the log messages:

From the new instances:
[ERROR] ..../sqs_client.go:42 - error unmarshaling Message from SQS: json: cannot unmarshal object into Go struct field event.trace_id of type *csuuid.UUID error='json: cannot unmarshal object into Go struct field event.trace_id of type *csuuid.UUID'

From both old and new instances:
[ERROR] ..../sqs_client.go:1138 - error unmarshaling Message from SQS: json: cannot unmarshal string into Go struct field event.trace_id of type csuuid.UUID error='json: cannot unmarshal string into Go struct field event.trace_id of type csuuid.UUID'

After some investigation, we were able to determine that the error was happening because we had inadvertently introduced an incompatibility in the JSON marshaling logic for csuuid.UUID. When one of the old instances wrote a message to the SQS queue and one of the new ones processed it, or vice versa, the code would fail to read in the JSON data, thus logging one of the above messages.

json.Marshal() and json.Unmarshal() Work, Even If by Accident

The hint that unlocked the mystery was noticing the slight difference in the two log messages. Some showed “cannot unmarshal object into Go struct field” and the others showed “cannot unmarshal string into Go struct field.” This difference triggered a memory of that “free win” we celebrated earlier.

The root cause of the bug was that, in prior versions of the csuuid module, the csuuid.UUID type contained only unexported fields, and it had no explicit support for converting to/from JSON. In this case, the fallback behavior of json.Marshal() is to output an empty JSON object, {}. Conversely, in the old code, json.Unmarshal() was able to use reflection to convert that same {} into an empty csuuid.UUID value.

The below example Go program displays this behavior:

With the new code, we were trying to read that empty JSON object {} (which was produced by the old code on another node) as a string containing the hex digits of a UUID. This was because json.Unmarshal() was calling our new UnmarshalText() method and failing, which generated the log messages shown above. Similarly, the new code was producing a string of hex digits where the old code, without the new UnmarshalText() method, expected to get a JSON object.

We encountered a bit of serendipity here, though, because we accidentally discovered that the updated service had been losing those trace ID values called out in the logs for messages that went through the redrive logic. Fortunately, this hidden bug hadn’t caused any actual issues for us.

The snippet below highlights the behavior of the prior versions.

With this bug identified, we were in a quandary. The new code is correct and even fixes the data loss bug illustrated above. However, it  was unable to read in JSON data produced by the old code. As a result, it was dropping those events from the service’s SQS queue, which was not an acceptable option. Additionally, this same issue could be extant in many other services.

A Way Out Presents Itself

Since a Big Bang, deploy-everything-at-once-and-lose-data solution wasn’t tenable, we needed to find a way for csuuid.UUID to support both the existing, invalid JSON data and the new, correct format.

Going back to the documentation for JSON marshaling, UnmarshalText() is the second option for converting from JSON. If a type satisfies encoding/json.Unmarshaler, by providing UnmarshalJSON([]byte) error, then json.Unmarshal() will call that method, passing in the bytes of the JSON data. By implementing that method and using a json.Decoder to process the raw bytes of the JSON stream, we were able to accomplish what we needed.

The core of the solution relied on taking advantage of the previously unknown bug where the prior versions of csuuid.UUID always generated an empty JSON object when serialized. Using that knowledge, we created a json.Decoder to inspect the contents of the raw bytes before populating the csuuid.UUID value.

With this code in place, we were able to: 

  1. Confirm that the service could successfully queue and process messages across versions 
  2. Ensure any csuuid.UUID values are “correctly” marshaled to JSON as hex strings
  3. Write csuuid.UUID values to a database and read them back

Time to celebrate!

Lessons for the Future

Now that our team has resolved this issue, and all is well once again in JSON Land, let’s review a few lessons that we learned from our adventure:

  1. Normally, adding new methods to a type would not be a breaking change, as no consumers would be affected. Unfortunately, some special methods, like those that are involved in JSON marshaling, can generate breaking behavioral changes despite not breaking the consumer-facing API. This is something we overlooked when we got excited about our “free win.”
  2. Even if you don’t do it yourself, future consumers that you never thought of may decide to write values of your type to JSON. If you don’t consider what that representation should look like, the default behavior of Go’s encoding/json package may well do something that is deterministic but most definitely wrong , as was the case when  generating {} as the JSON value for our csuuid.UUID type. Take some time to think about what your type should look like when written to JSON, especially if the type is exported outside of the local module/package.
  3. Don’t forget that the simple, straightforward solutions are not the only ones available. In this scenario, introducing the new MarshalText()/UnmarshalText() methods was the simple, well documented way to correctly support converting csuuid.UUID values to/from JSON. However, doing the simple thing is what introduced the bug. By switching to the lower-level json.Decoder we were able to extend csuuid.UUID to be backwards compatible with the previous  code while also providing the “correct” behavior going forward.

Do you love solving technical challenges and want to embark on exciting engineering adventures? Browse our Engineering job listings and hear from some of the world’s most talented engineers.

Go to Source
Author: Dylan Bourque

You may also like...