E10K: Control board review


Standard disclaimer: Use the information that follows at your own risk. If you screw up a system, don't blame it on me...


The control board is a cornerstone to the E10K operations; without it, as I undertand it, the various E10K domains come crashing down. However, since you so rarely even look at the things, how important they are won't become apparent until they're not working. So, this review:

The control boards are responsible, among other things, for the following:

  1. Distribute timing signals between the SSP and domains
  2. Supports communications between the SSP and E10K.
  3. Controls the support subsystems such as fans and power

Before a particular domain is active, you talk to it (@ the OBP prompt, for instance) via JTAG and the control boards. The control board, a full computer system in its own right, runs a control board executive (cbe) daemon. The SSP runs a control board server (cbs) that talks to the cbe. Once a particular domain is up and running, netcon talks to a cvcd daemon running on each domain.

The E10K should have two control boards. If you only have one, and it breaks, then you're down until it's replaced. Bad juju. If you have two, and the primary breaks, then you can switch to the alternate and restart your domains. Max downtime 1-2 hours depending on how fast you type and how fast the domains boot.

The SSP has a control board configuration file called ${SSPVAR}/.ssp_private/cb_config with the following format:

${E10K_name}:Ultra-Enterprise-10000:${cb0_name}:${cb0_status}:${cb1_name}:${cb1_status}

For example:

ra:Ultra-Enterprise-10000:sol:P:aten:

The control boards get their boot information from the SSP via tftp. So, if you're trying to harden the SSP, don't turn off tftp; you'll wonder why your E10K flat out won't start up...