Here is another story from the trenches. My current project was having mysterious errors coming from its Oracle database, but only under load. Specifically we were getting the following:
ORA-01013: user requested cancel of current operation
This one had me stumped. How could this happen. I searched the code line by line. I got assurances from the testing team that the testing tool was waiting for the last byte to return before navigating away from the page. I scoured the web for any hint of what could cause this.
We ran some more tests with the DBA monitoring the database. What we found was large number of locks being held for an unusually long time. This of course results in other requests being blocked and eventually timing out.
Unfortunately I couldn't execute my own explain plans. For any who don't already know, this is a function within Oracle that tells you the cost of the query you are trying to execute. Once the DBA ran them for us it showed that the query that was getting a lock was doing a table scan. It was reading row by row until it finally found the one it was looking for.
We added an index specifically for this query and the result was that the locks were being released so quickly that the DBA's monitoring tool didn't even see them.
Lesson: Know they query plan!