sun_drain_failure

E10K: Debugging Drain Failures

Standard disclaimer: use the information that follows at your own risk. If you screw up a system, don't blame it on me...

Enable the kernel variable dr_mem_debug by setting its value to -1 using either adb or setting the value in /etc/system and reboot.

# adb -kw
physmem 13af5d
dr_mem_debug/W0x1
dr_mem_debug: 0x0 = 0x1
$q

Capture the console output from a failed DR drain session. The failed address will be readily apparent, the message will be something to the effect:

hold_pfns: page not held: <some address

<page address from step 2$<page

Look for the field p_selock. If the value in this field is 1, the problem is possibly related to swap.
If there is a value in the p_vnode field, then enter the following:

<vnode address$<vnode

Look for the vop field, this tells us which virtual operation is in progress.

If the value in p_selock is an address, then we need to adjust the value of this address by subtracting 8 from the high order bit. For example, if the p_selock field = c0000000, then the value we need for the next step is 40000000. This is a thread address. To check this, enter the following command:

<thread address$<thread

<proc address$<proc2u

Search through the screen output for the psargs field. This will indicate the process that is holding the lock.

If this is a 3rd party vendor, we need to know who. In any case, mail the screen output to us and we will look further at it.

Notes on adb: Be careful of the columns. Things don't always align