E10K: Debugging Drain Failures


Standard disclaimer:  use the information that follows at your own risk.  If you screw up a system, don't blame it on me...


  1. Enable the kernel variable dr_mem_debug by setting its value to -1 using either adb or setting the value in /etc/system and reboot.

# adb -kw
physmem 13af5d
dr_mem_debug/W0x1
dr_mem_debug: 0x0 = 0x1
$q

  1. Capture the console output from a failed DR drain session. The failed address will be readily apparent, the message will be something to the effect:

hold_pfns: page not held: <some address

  1. In an adb session, enter the following command:

<page address from step 2$<page

  1. Look for the field p_selock. If the value in this field is 1, the problem is possibly related to swap.
  2. If there is a value in the p_vnode field, then enter the following:

<vnode address$<vnode

Look for the vop field, this tells us which virtual operation is in progress.

  1. If the value in p_selock is an address, then we need to adjust the value of  this address by subtracting 8 from the high order bit. For example, if the p_selock field = c0000000, then the value we need for the next step is 40000000. This is a thread address. To check this, enter the following command:

<thread address$<thread

  1. Look for the field called procp and get that address. Enter the following:

<proc address$<proc2u

  1. Search through the screen output for the psargs field. This will indicate the process that is holding the lock.

If this is a 3rd party vendor, we need to know who. In any case, mail the screen output to us and we will look further at it.

Notes on adb: Be careful of the columns. Things don't always align