On product configuration and operational responsibility

Posted 2020-06-07

A number of operational patterns have emerged in the tech industry. Agile software development, continuous delivery, developer operational responsibility, etc. If you work in software in Silicon Valley, no doubt you consider that you either exercise these things, or some "better" variant of what you imagine each of these things to be. In case you don't, my one-sentence definition of each follows:

Agile software development is a collection of concepts related to releasing simple software changes often, using scoped and clear definitions of requirements.
Continuous delivery is the concept of your master code branch matching your deployed code as closely as possible by means of automation - both to achieve the desired cadence, and to simplify operations/stability.
Developer operational responsibility is the concept of putting software execution in the hands of software developers, which would typically include rollout, rollback, and configuration change.

I have no doubt that software developers are well-placed to be operationally responsible for their software, but this post is to challenge the claim that they're the right team to be responsible for configuration.

This post is not about techniques for making configuration less painful (devops, schema validation, etc).

Power structures

Part of why this responsibility exists is related to power structures, and ensuring that configuration consumers have the agency to control the meaning of the configuration they're consuming. In other words, software developers are expected to set configuration because they're in control of the complexity of that configuration.

This is a noble goal: engineer agency is incredibly important. However, I challenge the claim that developers have agency over configuration. Rather, configuration takes shape based on requirements and fracturing of customer use-cases, which developers are normally given, rather than choose.

Clearly this power structure still puts some of the onus on developers to write good configuration, but users and requirements-specifiers must take on some of the operational pain if they are to create good feedback loops and reasonable flexibility demands.

The pain of managing your product deployment YAMLs should be put on those that set requirements - otherwise, product requirements will continue to demand configuration, and fragmentation of product behavior.

Make users and requirements-specifiers as close to the same thing as possible.
Set and prioritise requirements for sensible default configuration, and for safe behavior on overrides.

Being prescriptive

I see a cycle repeat itself. Some libraries are made to be generic - to work with any backend; to work with any front-end. Some libraries are made to be prescriptive - to simplify the consumer experience by imposing demands on implementation details and on dependencies. We seem to oscillate between them (but that is a topic for another blog post).

By writing an application with dependencies, you're probably contributing to this ecosystem/cycle. Good software design is to reuse, and to expose scoped APIs. I believe this to be obvious, so I won't belabour the point. While I have your attention though, I do encourage the following advice:

Don't expose configuration for your dependencies in your product's config - this leads to proliferation of YAMLs.
Be prescriptive with your application - in the absence of requirements on configuration, err on the side of no configuration. Too often do I see engineers tend towards any static value being a red flag in need of configuration.

As an extreme example of this, I previously worked on a product with hand-rolled configuration templating for enabling running either as a collection of disparate services in k8s, or as an abomination running on a single VM. Deep under the covers lived multiple nginx instances, hand-rolled discovery based on templating, dpdk, systemd configuration, and not to mention application code written in bash, python, rust, c++, and C. Depending on the configuration change needed, it could take as little as a single CLI command, or as much as 6 nested product recompilations/template renderings. Deeply nested configuration wasn't prescriptive enough.

Devops

Devops - a core component that enables some of the modern software development I mentioned above (continuous deployment etc) is an incredibly valuable concept. It commoditises deployment, removes enormous mental burden, simplifies the landscape of expected models for various operations. It also enables developers to be irresponsible and spoiled, which I suppose is also good (a topic for another blog post).

Being great at operations comes with a number of costs, at least one of which is directly relevant here: if configuration is easy, then the feedback loop of configuration pain is lost - or worse - so easy to handle certain kinds of pain, that the value of how low the pain is is more visible than the pain itself.

Do not allow yourself to be satisfied with powerful pain killers - instead, ask why the pain exists in the first place, and work to resolve it.

As an extreme example of this, I work on a product with over a hundred unique deployments, each with its own configuration. Powerful devops tools enable this to be feasible: rolling-out configuration changes across deployments, CD, BG upgrades, heterogeneous product version blacklisting, semver automation, RPC compatibility automation, product dependency automation, etc. I can imagine a world where my product configuration would be simple enough to survive with just half of this devops tooling, but with my config shaped as it is now, I cannot. This feels like a generalisable red flag.

Summary

Product developers shouldn't be solely accountable for operational burden if they're not unilaterally responsible for flexibility requirements. Be prescriptive. Managing pain via devops improvements is great, but should be recorded as a signal to simplify configuration, lest one risks putting up with more pain and demanding more from devops.

I challenge you to read the entirety of your product's (operator-exposed) configuration. If this doesn't sound feasible, or if this includes more than a handful of YAMLs, then the advice above is for you.