I just spent the last 3 days chasing down 3 deadlocks that were teaming up against me. Man, I hate deadlocks. They were definitely the biggest obstacles so far in the development process. Simply because their origins are so hard to locate. And that, even though they are always caused by the same type of situation: 2 threads intertwining with their locking scheme.
Put into sequence:
- Thread 1 locks x and continues execution (no waiting)
- Thread 2 locks y and continues execution (no waiting)
- Thread 1 requests y and waits
- Thread 2 requests x and waits,… oeps
The most often way to locate this situation is by examining the execution stack of the CPU. But what if you do out of sync locking: the request in function 1 and the release in function 2, without any relationship between the 2 functions. As an example, take the network-core’s lock expression, which is new. This locks the items when the statement is called, but releases them only after all the child statements were called. The interpreter obviously can’t do this in the same function call.
So how do you trace this type of bug? Well, slowly, painfully and with lots of debug code to generate application dumps. here’s the smallest dump that I generated for the core (usually it was several 100 lines longer):
Links in
ID: 7461 ReadCount: 2 WriteCount: 0, Waiting: 0
Links out
ID: 7461 ReadCount: 2 WriteCount: 0, Waiting: 0
Values
ID: 7461 ReadCount: 2 WriteCount: 0, Waiting: 0
Processors
Parents
ID: 7461 ReadCount: 0 WriteCount: 0, Waiting: 2
Children
ID: 7432 ReadCount: 0 WriteCount: 1, Waiting: 0
ID: 7461 ReadCount: 2 WriteCount: 0, Waiting: 0
ID: 7288 ReadCount: 0 WriteCount: 1, Waiting: 0
LockExpressionCounter: 2
Single lock count: 0
Basically, it allows me to check all the locks that are still active (in order of age) when the deadlock occurred. From then it’s simply a matter of figuring out who has the oldest lock and why it wasn’t released. To do this, it’s best to keep track somehow of the threads that do the locking and make certain that they are named, so you can find them again in the execution stack of the debugger.