Source: Allen Pike allenpike.com/2018/the-great-bug-hunt
At that time, the team was working on what was to be one of the first games ever for a brand new game console, called the “Xbox”. As final testing ramped up, QA set up three pre-release Xboxen to run automated tests overnight. If the previous day’s build of the game was still running the next morning, it was a sign of a stable build.
Unfortunately, by morning one of the consoles had crashed. Crashes are never good, but this was a particularly bad crash: something running on the graphics card had brought the whole system down. Diagnosing a GPU crash meant hard mode: no debuggers, no stack traces, no “printf” debugging. All you could do is read the code and try things, like an animal.
And so began the Bug Hunt. Each day the lead engineer would comb through the evidence they had, develop a hypothesis, and work to eliminate possibilities. Each night, QA would get a “random” crash, without a clear cause. “That’s not possible.” “How could this even happen?” “Could there be a bug in the compiler?” All the greatest hits.
Naturally, the game ran perfectly well on the engineer’s machine – for multiple days, even. This was little consolation, as the deadline to print and ship the game loomed large.
Luckily, a pattern was soon detected – albeit a strange one. The game was only crashing overnight on one of the three Xboxes. A search for differences ensued. It wasn’t the power cables. It wasn’t the controllers. It wasn’t the order they were burning the DVDs. Bring the Xbox back to your desk – no crash. Put it back – crash. It was something about the very specific setup QA was using.
Now, the process of elimination requires eliminating all variables. So eventually, in desperation, the engineer tried one more thing: he shuffled which console sat on what table. Boom.
It wasn’t that Xbox that crashed, it was any Xbox sitting on that table that crashed. In the middle of the night.
Now, sometimes you need to do strange things for science, and this was one of those times. Stoically, the engineer set up a chair, gathered the requisite quantity of Red Bull, and the Bug Hunt became a Bug Watch. He vowed to watch the Xbox run automated testing on that cursed table until he saw the problem with his own two eyes.
The night passed by slowly, then quickly, and eventually dawn approached. The game still ran. Infuriatingly, it ran. The sun began to rise.
As he started to consider calling it for the night, something interesting finally happened: The Table was struck by a ray of light from the rising sun. Minute by minute, a sunbeam crept across the table towards the Xbox, its warm glow quietly enveloping that black blob of a console.
Which promptly crashed.
It turns out, early Xbox units had an issue where the graphics card could fault when the console reached a certain temperature. There was nothing you could do about it in software. The hardware issue was escalated, the game was cleared for release, and the Red Bull was swapped out for beer – or let’s be honest, probably whisky. Science one, Bug zero.