Using the crash tool to analyze a real-time share of the Linux kernel deadlock

Introduction to this article:

Kernel deadlocks are commonly caused by read-write locks (rw_semaphore) and mutex locks (mutex). This article explains how to analyze such deadlock issues using the ramdump and crash tools.

0. Background Knowledge

Ramdump is a memory dump mechanism that allows you to capture the system's memory at a specific point in time, then import it into analysis tools like trace32 or crash along with symbol information (vmlinux) for offline debugging. It's a crucial method for diagnosing kernel issues such as crashes, deadlocks, and memory leaks.

Crash is an open-source tool used to parse ramdump files (http://people.redhat.com/anderson/). It operates in command-line interactive mode, offering powerful debugging commands that make it a valuable tool for analyzing complex kernel problems.

A deadlock occurs when multiple execution threads block each other while competing for resources. Figure:

1. Problem Description

In the Android 7.1 system, during monkey testing, the interface becomes unresponsive:

1) No screen refresh; all input events are invalid, including the power button.

2) The watchdog does not restart the system_server process.

3) ADB can be connected, but commands like ps hang.

2. Initial Analysis

Since direct adb debugging is not possible, we use the long-press power button to enter dump mode and export the ramdump file. Then, we load it into the crash tool for offline analysis.

Threads may be stuck in the UNINTERRUPTIBLE state, so first, we use the 'ps' command in the crash environment to find threads in this state, filtering out kernel threads using the '-u' parameter:

The 'bt' command shows the call stack of a thread. Let's examine the watchdog thread in the UN state:

From the call stack, we see that the 'proc_pid_cmdline_read()' function is blocked. The corresponding code includes acquiring the 'mmap_sem' lock of a thread's mm structure, which is held by another thread.

3. Deriving Read-Write Locks

To determine which thread holds the lock, we need to derive its value via assembly. Using the 'dis' command, we look at the assembly of 'proc_pid_cmdline_read()':

The address 0xffffff99a680aaa0 calls 'down_read()', and the first parameter x0 is the sem lock. We can check the values of x0 and x28, which store the sem value. The offset of 'mmap_sem' in 'mm_struct' is 104 (0x68), and we can confirm this using the 'whatis' command:

By finding these registers in the stack frames of functions after 'down_read()', we can identify the mm and mmap_sem locks.

4. Thread Holding the Read-Write Lock

Using the 'list' command, we check the wait_list to see how many threads are waiting for the read-write lock:

There are 2 writers and 17 readers waiting, totaling 19 threads in the UNINTERRUPTIBLE state. By reviewing all UNINTERRUPTIBLE threads, we find that most are waiting for this lock. Since the owner is 0, the lock is held by a reader. We search the stack space of all threads to find the lock value (0xffffffd76e349a68):

After identifying the thread, we look at its call stack and find that it holds the 'mmap_sem' lock in the 'handle_mm_fault()' function.

5. Other Blocked Threads (Derivation of Mutex Locks)

Next, we look at the ActivityManager thread, which is suspended at 'binder_alloc_new_buf'. We find the mutex lock address and trace it back to the 'mutex_lock()' function. By examining the stack frame of '__mutex_lock_slowpath()', we find the mutex value and the task struct of the holding thread:

This thread is one of the 19 blocked by the read-write lock. Using the same approach, we find that other threads, such as those in the audio server and system_server, are also blocked by the same mutex lock.

6. Deadlock

The last UNINTERRUPTIBLE thread is the 2767 (sdcard) thread, which is waiting for the result of a process. The 2124 thread is trying to read, while the 2767 thread is processing an open request. This creates a mutual dependency: the sdcard thread cannot proceed without the 2124 thread, and the 2124 thread is waiting for the sdcard thread. This interlocking causes a deadlock.

This article outlines the process of identifying deadlock issues using ramdump and crash. For detailed solutions involving specific modules and their interactions, further investigation would be required.

Type-c Cables

usb type c cable,4-in-1 data cable,type-c charger,type-c charging cable

DongGuan BoFan Technology Co.,Ltd. , https://www.ufriendcc.com