E10K: Debugging Drain Failures
Standard disclaimer: use the information that follows at your own risk. If you screw up a system, don't blame it on me...
- Enable the kernel variable dr_mem_debug by setting its value to -1 using either adb or setting the value in /etc/system and reboot.
# adb -kw
dr_mem_debug: 0x0 = 0x1
- Capture the console output from a failed DR drain session. The failed address will be readily apparent, the message will be something to the effect:
hold_pfns: page not held: <some address
- In an adb session, enter the following command:
<page address from step 2$<page
- Look for the field p_selock. If the value in this field is 1, the problem is possibly related to swap.
- If there is a value in the p_vnode field, then enter the following:
Look for the vop field, this tells us which virtual operation is in progress.
- If the value in p_selock is an address, then we need to adjust the value of this address by subtracting 8 from the high order bit. For example, if the p_selock field = c0000000, then the value we need for the next step is 40000000. This is a thread address. To check this, enter the following command:
- Look for the field called procp and get that address. Enter the following:
- Search through the screen output for the psargs field. This will indicate the process that is holding the lock.
If this is a 3rd party vendor, we need to know who. In any case, mail the screen output to us and we will look further at it.
Notes on adb: Be careful of the columns. Things don't always align