Make the Automatic Explicit

Never assume the customer will recognize important details.

I once got a support call from a customer working in a secure environment. Our application was crashing on startup. He couldn’t send me any logs or screenshots so we had to diagnose the issue over the phone - I told him what to try and where to look, and he told me what happened and what he saw.

I had him read me the latest line from the application log file, which turned out to be an error I’d seen before. I walked him through the corresponding fix and we tried starting the software again. It still failed, and when he checked the log file again it was still showing the same error.

This confused me, since we should have just fixed that problem. We went through the fix again, double-checking every step, and there was no change. I suggested everything I could think of, and the error message stubbornly remained.

It was late in his workday and we had to leave the issue unresolved when his secure environment closed for the night. When we picked it up again in the morning, the application was still crashing but there was no error in the log file - and I realized what had been happening.

I never told the customer to check the timestamp of the error message in the log file. We actually had fixed the logged problem at the very beginning. Every time he’d checked the log file, the latest line was still that original error message because nothing new was being written to the file. We’d been trying to fix an already-resolved issue and only realized it when the automatic log archiver moved us to a new log file for a new day.

Once we recognized this, we stopped looking in the application log file. We found and fixed the actual remaining problem and within a few minutes the customer was up and running.

If I’d had the log file in front of me, I’d have noticed the non-updating timestamp immediately. That’s an automatic-enough process for Product Support folks that I didn’t even think to ask about it. But the customer didn’t have a habit of studying our log files and for him checking the timestamp was not automatic. Because I didn’t call it out as an important detail, we wasted hours and the application was down all night.

It’s not easy to recognize and externalize the automatic parts of your troubleshooting process, but in cases where you don’t have direct access to the customer environment, it can save a lot of time and pain.