Sensible Commits

Administrivia

I’ve realized after coming home yesterday that doing a post a week on a Friday is a bit untenable, considering that I’ve not yet picked up the habits and discipline to write regularly. So, to give me some breathing room, I’ve decided to post every Saturday evening instead. To be honest, I should probably write these things every Saturday anyway, and then publish them on Friday evening, but until I’ve gotten a stable workflow down, it’s best that I give myself enough time to think and write.

Workflows

Speaking of workflows, one of the things I’ve picked up over the years writing code is the importance of writing sensible commit messages — and preparing sensible commits, in fact. I admit freely though that it is a skill, a form of technical writing, where your audience is you and your teammates. Similar to documentation, it’s something most developers hate to do, but I think it’s at least worth spending a few minutes on every time you prepare a commit.

Mind you, you don’t have to write the final commit message quite immediately: the beauty of Git (and other modern version control systems) is that you can edit your work branch’s history; rebasing, and in particular interactive rebasing is a powerful tool that I think every developer should know how to use.

When I was in college, I attended a class on writing and literature, and one of the things taught to us that I still quite remember is the aphorism:

Write in white heat, edit in cold blood

which basically means that you shouldn’t spend too much time overthinking everything at the moment of conception, and just simply write; pour your heart out when writing, write freely and without censor. However, when it comes time to going over what you’ve written, view your work coolly, without passion: divorce yourself from what you’ve written, and analyze and critic what you’ve just written.

I think that’s quite applicable to the act of preparing a good commit, and in particular, working on a good flow of commits that describes and builds up the feature you’re working on so that it becomes sensible to your teammates, or even future self.

So how does the aphorism translate to writing software, and not literature? Is it applicable to commit messages, which are often unread? How do you use git rebase -i as an editing tool?

I find that I usually work best on a feature or bug fix by experimentation: change a little here, observe/test, change a little there. I try to keep that feedback loop as tight as possible, so I often will write code as much as I can to implement something. Add a test, run it, make it pass: or maybe, refactor a method, add a new test, etcetera.

Side note: I admit that I go against the grain of advice of committing early and often: I simply disagree that you should stop to commit your work, because personally that particular context switch slows me down, but that’s me— if it does work for you, then go ahead and commit your work early and often, but rebase at the end to edit those commits to make them more sensible.

Personally, as I said, I tend to write a bunch of code in tight feedback loops. Once I know a part of the feature I’m building is done, I pause to prepare a commit for it. Particularly, my ideal commit is one that encapsulates a set of related changes that, in itself, can stand alone to describe a single step in the process of building a particular feature: in other words, for me, a commit is a single atomic change to a codebase that one should be able apply independently of other changes. In reality, however, most commits do have some interdependencies with each other: this method over here depends on this class being created in previous commit, or how these changes to these configuration files affect the codebase depends on a whole other set of changes elsewhere. I don’t really think of it though when I do my initial commit.

Part of the preparation for a commit, of course, is choosing what goes into it. That means reviewing the current state of my working copy and selecting only the bits necessary to affect a particular coherent change. I then write a message to describe why and how these bits are necessary. Other people have written on how to write a good commit message, so I won’t elaborate on the details of writing one as much.

At the end of implementing a particular feature or fix, I spend a decent chunk of time in ensuring that the commits on my branch present a sensible story. I spend my time editing my branch, through rebase -i: ordering commits, splitting or squashing commits, or even throwing out commits altogether. My goal in this process is to ensure that a reviewer looking through the commit history can understand how each change flows to the next.

At this stage, I put on my code reviewer’s hat and I try to look at it as a third party: how does this commit relate to the previous one? Maybe this commit over here should be moved here, as we’re changing the same class anyway; maybe this other commit should be squashed into the next, seeing as it’s the next commit is the test for the previous one.

In the same way that code itself should be readable, I think that the commit log should also be readable, in the sense that each change in the log should flow logically to the next, as much as possible. Think of each branch as a chapter in a book, and each commit as a paragraph in that chapter: explain why you’re making this change in this commit, and then continuing your explanation in the next.

A good way, I find, to try to make the sequence of your commits sensible is to concatenate the commit message together: if the concatenation of commit messages makes logic sense, and if the messages do explain what you’re trying to achieve quite well, then you know the order of commits flows into each other.

For instance, a personal project of mine is a JMeter sampler and proxy for RMI, and I always try to keep that philosophy of preparing commits in mind. Here’s a sample of a sequence of commit messages, without any code, on a branch implementing a particular feature:

RMI Sampler: Bind BeanShell interpreter on test start.

Because JMeter serializes all bean properties to the JMX test plan, the BeanShell interpreter ends up also being serialized, dumping large amounts of interpreter state if the test plan is saved after a test run. This is bad: we don’t need to have interpreter state persisted between runs, hence us marking the property as temporary. Apparently, this is not enough.

Since it’s fine for us to have a single interpreter per test thread, just create the interpreter and do the initial eval of the argument script at test start, and remove the interpreter at test end. Also, make the getter for the interpreter private so it isn’t visible to the serialization code.

RMI Sampler: Make remote object binding setter private.

As with the interpreter, make the setter for the remote object config private so it doesn’t get serialized.

Introduce interface for registering remotes.

This abstracts how we reference RMI remotes, whether in the proxy or by samplers.

In the case of the capture side (the RMI proxy), the proxy itself is an instance registry, where later on we wire up the proxy invocation handler to call into it when it sees a java.rmi.Remote instance being returned to the client. Meanwhile, on the playback side, the RMI remote object config contains an instance registry that is bound to the thread test context.

Note that we’ve changed RMIRemoteObjectConfig to initialize the registry and bind the registry to the context variables on test start, and clear the registry on test end.

Admittedly, I’m tooting my own horn here, but I do think that the prose is sensible and flows necessarily: I’ve explained what and why I’m doing a particular bit of change in the code, and each commit reads into the next. (There’s also still a lot to improve, I’m sure, but I never said I was perfect)

Does this workflow eat up time? Sure, but I think in the same way that one should be careful and deliberate when writing code, one should also be careful and deliberate about writing and preparing commit messages. The time spent in doing such editing pays itself off later during code review, and when you need to understand why a bit of code evolved, especially when you’re trying to fix something. The added context helps.

Previously: Deadlines