A day in the life of atest failure

I love my job. Fact.

But some days are easier than others. If you’ve been reading my tweets this week, you’ll have sensed a certain frustration.

Don’t worry. None of this is your fault.

But here’s what happened.

Over the weekend it had occurred to me, while I was asleep at about 6am – seriously, this happens – it occurred to me that I’d forgotten to submit my latest test fixes before I merged my branch. This would screw up the branch for the weekend, for the whole team, and delay the deploy on Monday. Bad news.

As soon as I woke up, I popped open the laptop, preemptively apologised to all and sundry – as is the English way – and proceeded the fix the problem. It turned out my fixes only fixed half the problem. And it’s the weekend.

So, I cheated. I knew this code was in a hidden section, and wouldn’t be live when deployed. I could just comment out the tests, ensuring the branch would pass even if the code didn’t work properly.

First thing Monday morning, I returned to the problem. I looked back over the logs to find the faulty test and dutifully kicked off a test run.

And this was the start of my problem.

It passed.

I was a little surprised. There were failures last week, I could see there were failures. Maybe somebody else fixed it. I check the logs. Nope. I run the tests again.

It passed.

I should mention that running the tests isn’t instantaneous. It’s a complex test suite we’re running here. We’re currently migrating our server code away from Ruby (we’re always migrating to faster systems). In the interim, we’re running tests against both systems. The various compilation and startup sequences, combined with the real browser used, and the queues of other developers waiting on the system, gives us an average runtime of about 15 minutes.

It’s optimised to run a suite of tests, rather than individual tests, so both cases take around the same amount of time to run. In fact, we have some excellently comprehensive test suites with thousands of tests, and they can all be run in about 15 mins. Running single tests is still slow, because the task can’t be distributed.

From this point on, I’ll indicate the test timings like this: (15).

I can of course run a test locally. Again, this can take time. Usually less time – let’s say 5 minutes or so. It takes less time because I already have the servers running locally.

So my tests are passing. Usually, a cause for joy. In this case, a worry. Why did they fail last week? I dig out the code from last week, and run the tests (15).

It fails, as expected.

Well, that’s something. I diff the code. And here I win the Goof of the Week award. Prize chump right here. The tests are passing because I commented them out.

Duh.

I go to lunch and try to apologise to my own brain cells for letting them down.

I kick off a real test run (15). It fails. Good. But weird – I swear these were working once. I test the Ruby code instead (15). It passes.

Interesting. So I’ve found a difference. This is weird because I haven’t actually changed any of the server code. It shouldn’t be possible to be different here.

What’s worse is that we can’t easily test the new code locally. It takes longer. I run them (10). They pass. I run the Ruby tests (5). They pass. I run the new tests again, to be sure it’s not random. Pass (10). Pass (10).

I spike the code full of debugger statements. Run the tests (15). More debugging (15). Add screenshots (15). Everything is the same between systems, except that the new tests are failing.

” is not true”. Thanks, error messages.

Hmm. I look back at the screenshots. While the systems show the same screens, there’s something I didn’t expect. When you’re a new user on Twitter, we show a little helper module at the top of the page. I’d forgotten we did that. It also loads asynchronously.

Aha. Here it is. The new code runs so much faster that this module loads earlier. This explains the difference. I go home, happy that I’ve established the problem.

Next day, I make allowances for the helper module. I pull out the debugging code, happily run the tests (15) one last time, and prep the branch for shipping.

It fails.

Bollocks. I unprep the branch and put the debugging code back in. I run the tests (15).

It fails. Same debugging result as before.

That’s weird. I was so convinced the module was the problem. I hack the code and remove the module entirely. Another run (15).

It passes.

Ok, so it’s the module. No question. I just got the allowances wrong. I tweak them, and run (15). Fail. I add debugging (15). Nothing. And again (15). No.

This is crazypants. I am so not happy.

I try to reproduce locally. Passes (10) every (10) time (10). No amount of debugging and breakpoints will help if I can’t reproduce the error.

I look back to the screenshots. Aha. No images in the module.

I knew the error was caused by an bad height calculation. And the missing images are (whips out an onscreen ruler), yes, the images are 137px – exactly the difference.

Eureka.

Our images don’t load on the testing machines because they’re insulated against the outside network. That makes sense. Locally, we can access the images, which must be why they pass. It’s perfect.

Furthermore, our testing machines are using Firefox, which won’t render broken images, while I’m using Chrome locally – which does. This would account for the height difference.

I curse and laugh, since this quirk of Firefox has bitten me many times in the distant past.

To fix the test, I simply break all the images – to be consistent. I prep the branch for shipping, throw it out to the test cluster one last time, and go and get a fifteenth cup of coffee.

It fails (15).

What. The. Fuck.
What. Thefuck.
Whatthefuck.
Fuck.

How can this be? I had this nailed.

Worse than that, the tests is now passing occasionally. When I run the tests on the test cluster, I run them about sixty times in parallel. This ensures we don’t get “flaky” tests, which sometimes pass and sometimes don’t. Flaky tests are notoriously disruptive and hard to fix.

Sometimes, all you can do is try it yourself. I open a browser. I downgrade it to the exact same version as our test cluster. I’ve done this before, but I’m doing it again. I run the tests (10). I click around. I fiddle with the code. I break the images, to match. Refresh, refresh, refresh.

Hey, what was that?

I refresh again. And again. There it is.

Firefox won’t render a broken image – it hides it away completely. I knew that. But what I’m seeing here is a flash of broken image. There’s a flash of a broken image in this helper module, before Firefox hides it away.

If my code is making calculations just at that moment, it’ll get a different result than, say, half a second later.

I don’t get my hopes up again. I tweak the test. There’s actually a little css property:
-moz-force-broken-image-icon:1;
This will prevent Firefox from hiding the broken images. I run the tests. All the tests (15).

Pass.

I prep. I ship. I smile.
I move on to the next thing.

(btw, if testing tools is your kind of thing, and you want to help me make things faster, we’re hiring and I will buy you cookies)

Waiting for Bill

I throw another log onto the fire. I’m not sure we really need it – it’ll be warm enough all night long. Out here in Palm Springs, it’s only bearable when the sun sets anyway. The fire simply gives us something to watch while we wait.

Bill Gates is due for dinner. Most of the other guests have already arrived, so he should be next. Or the one after.

I poke the fire. Sparks fly. It’s been a few good hours now, and I still haven’t decided.

Do I shake his hand, or kick him in the fork?

This was about ten years ago. I’d never suggest or condone violence in any form these days. Hell, even then I wouldn’t really. So let’s moderate the language.

Do I shake his hand heartily, welcome him like an old friend, thank him for all he’s done for my career, and propose my next big idea? Or do I ignore him, snub him, turn the other cheek?

I should clarify my position. I don’t have dinner with billionaires. That doesn’t happen. I’m really not that kind of guy.

I’m here on holiday. Vacation, if you will. I’m visiting an old friend from London, who is originally from Perth, who is currently working as a hot air balloon pilot in Palm Desert. And in the evenings, he works as a parking valet at The Reserve, an exclusive golf club in Palm Springs. And, since I’ve got nothing else to do, I’m here to help.

I say “parking valet” to emphasise that I’m using the American meaning of “valet”. Someone who parks your car. Not the English meaning, which you might be familiar with from Downton Abbey, of someone who dresses you in the morning.

To make this slightly more bizarre, I can’t actually drive. No license. Never even sat behind the wheel. I’m opening the car door, welcoming them, opening the restaurant door, that kind of thing. Oh, and I get to drive the golf cart, ferrying my fellow valets back from the car park. This is a nice place – you expect your valet to be fast. This is a classy place – even the valets have a little nook with a fireplace.

When I arrived, they told me Bill Gates was booked for dinner. Yes, that Bill Gates. That one.

Obviously I was somewhat in awe. He was, I believe, the richest man in the world at the time. And I’d been working with his software for years. My job, my life, was founded upon it.

And yet, that brought so much trauma. I was out here for a much-needed break. I’d been fighting poorly implemented APIs, buggy applications, incomprehensible error codes, absent or irrelevant documentation, and that blasted comedy paperclip. In no small way did I hate this software as much as I loved it. I should snub this man for all the suffering he put me through.

But what a wasted opportunity. I should be pitching an idea. I’ve got maybe 20 seconds with the most powerful man in technology. I’ve got to plant a thought in his head, one that will distract him over dinner, so that he’ll emerge with a warm smile and an offer of partnership.

All I needed now was the idea.

Ideas are funny things. After a few pints, or several sleepless hours, your mind is full of amazing, wonderful, life-changing, world-changing, revolutionary ideas. When you’re put on the spot, nothing.

Fast forward to the future, and hindsight brings the answers. I could’ve pitched Twitter. Facebook. Myspace. The cloud. Groupon. Well, maybe not Groupon. Instagram. The iPhone. I could’ve begged him to continue development of IE.

If you were put on the spot, right here, right now – could you pitch? What’s your idea? Are you ready?

I wasn’t ready, but that was ok. Bill never showed up.

The other guys were fine with it. Apparently he’s not a big tipper.

Since then of course, he’s donated most of his wealth to charity. Seems like a generous tip to me. I’d heartily shake his hand. Or fist bump. Or whatever he’d prefer.

And he tweets. I build the software he uses every day. That’s pretty awesome.

Maybe it’s a good thing he didn’t show.