I spent the first few weeks of my new job building a map of the domain I would be working in – in part to manage the firehose of information I’d have to ingest, and in part as an exercise of model-building. By taking down notes and building a map of where to find various things, I was also slowly building a mental model of the domain. There’s a sprawling field of information that I need to be aware of in my job, and to be honest, not all of it is necessarily fully documented; having at least a mental model of things, I thought, would be necessary if not entirely sufficient.
After all, in many ways model-building is part and parcel of understanding, or at least it is for me. A mental model implies a description of inputs and outputs: if A then B, but not when C and so on. A model captures how we understand things to be.
Of course, those models can be wrong; in building them, we inadvertently also encode our own assumptions and presumptions about how the world works (or, in my case, some details of the domain), rightly or wrongly. The map is not the territory, and in many ways the map is sometimes wrong.
The map is not the territory - Alfred Korzybski
If the code we write is an encoding of those models, debugging then is an exercise in shaking out the incorrect assumptions we’ve encoded in our models. And there’s a process that’s proven itself to work pretty well in shaking out assumptions in our models: the scientific method.
Fundementally, when debugging something we’re both building and testing our mental models of the software in question and its constituent parts: the domain, the code, the platform, all the way down to the bare silicon. It’s abstraction all the way down, and the way we understand that abstraction (or rather the abstraction we use as a useful fiction) is built on some sort of model of behavior of the system in question, and it is when those models don’t quite fit the reality of the actual behavior of the system that we have problems.
So, in helping a colleague with a problem a few weeks back, we both had to apply a systematic approach to sussing out those assumptions: hypothesis, test, observe. We walked through each particular bit and tested, even if each seemed obvious or trivial – starting from the top going down.
We were trying to use
AssumeRole
in one of the services we were building, and we were using some
sample
code
as a template. For one reason or another, it wasn’t working for us,
and we had to slowly go through possible failure points and scenarios
– essentially, checking our model against reality. Was it something
in our code? Did we grant the proper access to the IAM user we were
running our service as? Did we specify the target ARN correctly?
Eventually, we discovered that the sample code we were using as basis for our own code had a small, but fatal, bug. And, because we had made an assumption that the code worked as-is, we never questioned it immediately.
This isn’t the first time this has ever been the case though; in a lot of cases where I’ve had to fix a bug in my code, it was because I had entirely assumed some behavior or some invariant to hold true – and inevitably, that isn’t so.
The map isn’t the territory, and often we have to be brutally honest about that. We have to be ready to tear down our models and build new ones, when we’ve found that we’ve put too many incorrect assumptions in our models.
Previously: Downtime