Estimating debugging

27 Feb 2017

Let's play a simple dice game. The equipment for this game will be a pile of tokens, like pennies or poker chips, and four six-sided dice, two red and two white.

You start with one token, and the object of the game is to get to zero tokens.

On each turn, you roll your dice, and if the sum of the red dice exceeds the sum of white dice, then you get the difference, in tokens. If the sum of the white dice exceeds the sum of the red dice, then you get rid off the difference, in tokens, or all of your tokens if you don't have enough.

If the sum of white dice equals the sum of red dice, then you are required to provide an estimate of how many turns it will take you to finish.

What is a good strategy for these estimates? Clearly if you have more tokens, then you're worse off. That is, you do have some information.

However, the average number of turns it will take you to finish is infinite. (You can google "random walk with one absorbing barrier" and do the math yourself if you don't believe me.) So if you report "infinite" as your estimate, then you're being truthful, but not informative.

There are different changes to the reporting structure possible. One, which I think is really quite common, is that you could report the minimum duration to finish, instead of the average. The maximum progress you can do in one turn is 12 - 2, so if you report your number of tokens divided by 10, then that has a clear meaning, and is in the units of turns, and it's informative.

One objection to making "minimum duration to finish" the explicit meaning of a software estimate is that it doesn't have the right incentive structure. That is, someone could just write down "1" over and over again, and you couldn't say that they're wrong - that is, if they're not done, then 1 certainly is a lower bound on the number of turns until they are done.

So to answer that objection, maybe we should ask for an interval estimate instead of a point estimate - or if we want to report a single number, just one end of an interval estimate. Then we can detect the "always-1" uninformative estimates, by saying something like "Our policy is that we provide 25% quantiles as estimates, but out of 25 estimates, none of them were below your provided lower bound, (which was always 1). So you're miscalibrated, and you need to raise your estimates."

Another objection is that this dice game isn't representative of real debugging. The analogy is that a token corresponds to a gap between theory and reality. Something like "In theory, it should work but in practice, it doesn't work." Then after spending a day debugging, you might resolve the gap, or you might break down the gap into several gaps. So far it is pretty reasonable. The objectionable approximation is that there is no representation, in the model, that the several gaps ought to be in some sense "smaller" than the original gap; each of the gaps are fully equivalent to the original.

Here is a simple card game:

You start with a very small deck of cards - possibly just 5 cards. Each turn, draw one card. If it is a black card, discard it. If it is a red card, discard it but add that number (1-10) of black cards.

This model as two levels of "bigger" and "smaller". Big tasks (red cards) always decompose into some number of small tasks (black cards), and whenever you work on a small task you always make progress on it.

In real debugging, however, there are often many levels, not just two, and the uncertainty about how deep the rabbit hole goes is perhaps the most characteristic quality of debugging. By introducing the objectionable approximation, we simplify the model (only one kind of token to keep track of) and also introduce the "paradox" of infinite expected duration. I believe that the paradox

What if we assumed we had drift?

You start with a very small deck of cards, and a very small pile of tokens. On your turn, there are two phases. First, you draw a card. If it is red, then you shuffle it, and a new red card, into the deck, and you get a token. If it is black, then you shuffle it, and a new black card, into the deck, and you lose a token. Then as before, we roll 2d6 - 2d6 and gain or lose that number of tokens.

When the drift is advantageous (more black cards than red cards in the deck), then the expected number of turns remaining is finite. However, when the drift is disadvantageous, then the expected number of turns remaining is infinite.

(Note that we're using a Polya urn process coupled to the random walk to make the drift unknown, but discoverable over time.)

This resonates with my experience debugging, that different problems have different degrees of knottiness.