Wednesday 13 June 2012

Troubleshooting Checklist

We can't guess which order will get to the solution quickest, but here's a stab at some things to remember:

1).  What does the error say (I can jump to conclusions and miss the clues right in front of my eyes)
      If it says the hostfile is missing an entry, then it might be.

2)   What has changed since things were working?  Look for the culprit to be that thing that was changed 5 minutes ago or last week before trying every single link in the chain. 

3)   Which component could have the root cause?  If other network connections are fine then it's not the entire network (but it might be one port or one module, one switch or one data center that owns the problem.

4)   Draw a picture.  It's all about isolating what is not wrong, when you rule out all but one thing, the one remaining thing is the culprit!  So draw a picture to make it clear the way the components connect and to see visually.  Remember #1 above, try to not make assumptions.  That's where we usually miss the cause of the problem--it hides within the sensible and understandable but incorrect assumptions!

5)  Two heads are better than one.  It might just be that explaining and drawing the problem to a colleague, which forces you to explain it simply, think clearly will lead you to the "Aha!" moment.  Or they might see something you've missed or get lucky where you're unlucky.

This can all be very frustrating.  Take a step back, try to not get angry.


People wouldn't like you when you get angry.

No comments:

Post a Comment