Sean Beatty explains what a deadlock is and why testing probably won't catch it.
Most software development projects rely upon some combination of inspection, structural testing, and functional testing to identify software defects. While testing is invaluable and does uncover the vast majority of software problems, sometimes testing fails to uncover certain errors—errors such as deadlocks
Before we can discuss deadlocks, we need to understand why they occur. A typical program contains separate threads of execution, or separate processes. For simplicity, we’ll call them tasks. In a multitasking system, these tasks operate concurrently and sometimes need to access the same resource at the same time. A resource that more than one task may need to access is a shared resource. A shared resource may be a certain data item in memory, or it could be a particular hardware resource. It could even be a specific file on the disk, or a single record in a database. When two tasks attempt to access one of these shared resources at the same time, serious problems usually result. One task may overwrite the data that the other task just wrote. Worse yet, if two tasks access the same resource at the same time, the result may be a data record that contains some data written by one task, and some data written by the other task, making the data record, as a whole, inconsistent.
To prevent these problems, programmers lock the shared resource to prevent any other task from interfering with the resource before it is safe to do so. Locked resources, while vitally important in avoiding access conflicts, can lead to deadlock. Testing for deadlock is generally ineffective, since only a particular order of resourcelocking may produce deadlock, and the most common tests may not produce that specific order. Deadlock is best avoided by design.
Figure 1 shows how resources are used by each of three tasks. Task 1 first allocates (locks) the resource A, and while holding A, it acquires B. Then later in the program, Task 1 also locks D, all the while still holding A and B. In similar manner, Task 2 allocates B, then C, and finally D. Task 3 allocates only two resources, first C, and then A.
What happens as the system runs and the tasks all lock and unlock their resources? Let's overlay the three tasks and their shared resources into one resource allocation graph, as shown in Figure 2. This makes it simple to see the effect of the interaction among the tasks. Suppose, at some instant, Task 1 holds A, Task 2 holds B, and Task 3 holds C. If a task is unable to acquire a needed resource, it will stop running (block) until the resource becomes available. In this scenario, Task 1 cannot acquire B since Task 2 has already locked it. Task 2 cannot acquire C since Task 3 has C. Task 3 cannot acquire A because it has been locked by Task 1. The system would be deadlocked—none of the tasks could run. Analysis of Figure 2 would reveal the closed loop among the resources (indicated by the red arrow).
Deadlock Avoidance by Design
The most effective way to deal with deadlocks is to avoid them by design. Adopting any of the following design constraints eliminates the possibility of deadlock:
- Tasks have no more than one resource locked at any time. (There are no arrows at all in the graph, indicating that there is no possibility of a circular wait.)
- A task completely allocates all its needed resources before it begins execution. This prevents any other task from locking