Nathaniel Knight

Reflections, diversions, and opinions from a progressive ex-physicist programmer dad with a sore back.

Is OpenTelemetry Excessive?

This article is a brief account of my experience setting up, operating, and using Open Telemetry on a very small software development project wherein I reach the surprising conclusion that it's probably worthwhile much earlier and at much smaller scales than you might expect.

The project in question was the back end for a proof-of-concept mobile app that I worked on as part of my day job. This wasn't even a Minimum Viable Product, more of an experiment to demonstrate what an MVP might look like. When I adopted Open Telemetry I was worried that it might be adding needless complexity and overhead to a very basic app, but to my surprise and delight it paid for itself several times over.

Open Telemetry

Open Telemetry describes itself as

High-quality, ubiquitous, and portable telemetry to enable effective observability

It's pitched as a tool for tackling enterprise-grade-highly-distributed-microservice-enabled complexity–the sort of thing that Charity, Liz, and Jessica talk about on the O11ycast.

Concretely, it's a set of standards for

as well as

Once you've set it up, you can turn on "auto-instrumentation" for common software components, which ended up being very valuable.

What I put into it

Unfortunately, it's not all good news: setting up Open Telemetry was more work than I was expecting. The NodeJS libraries are complex (and seem to be in a state of flux?). There's a lot of configuration and setup. The library's interface is also more complicated (and quite a bit more powerful) than console.(log|info|error|debug), which is what I would usually be doing. This all took work and precious time to learn.

I ended up sending logs to stdout as nicely formatted JSON. More sophisticated setups are available, but this 12-factor sort of approach served me well in development (Docker Compose, where I could inspect the logs with docker-compose logs) and in production (SystemD services on EC2, where I used journalctl).

What I got out of it

Once I got the SDK configured properly and wrapped my head around how to use it I was able to instrument my own code, which was valuable as expected. What I wasn't expecting was the comprehensive auto-instrumentation for things like NodeJS's HTTP stack and PostGRES client.

This let me inspect the details of:

This helped me catch and fix:

These were bugs that slipped past a decent test suite and TypeScript annotations, and I diagnosed them without modifying my app. That's the promise of observability: you can't predict what you should be recording but if you're disciplined and systematic about instrumenting your code you'll be able to figure everything out when you discover what you need.

This seemed like common sense for big complicated distributed systems, but I might be starting to believe it for small straightforward greenfield projects as well.