We’ve recently added features using IPMI to our ReAssure testbed, for example to support reimaging of our Sun experimental PCs and rebooting into a Live CD, so that researchers can run any OS they want on our testbed. IPMI stands for the “Intelligent Platform Management Interface”, so we have a dedicated, isolated network on which commands are sent to cards in the experimental PCs. An OS running in these cards can provide status responses and can perform power management actions, for example a power cycle that will reboot the computer. This is supposed to be useful if the OS running in the computer locks up, for example. So, we were hoping that we’d need fewer trips to the facility where the experimental PCs are hosted, have greater reliability and that we’d have more convenient management capabilities.
However, what we got was more headaches. Some IPMI cards failed entirely; as we had daisy-chained them, the IPMI cards of the other PCs became inaccessible. Others simply locked up, requiring a trip to the facility even though the OS on the computer was fine… One of them sometimes responds to status commands and sometimes not at all, seemingly at random. The result is that using the IPMI cards actually made ReAssure less reliable and require more maintenance, because the reliability-enhancing component was so unreliable! The irony. I don’t know if we’ve just been unlucky, but now I’m keeping an eye out for a way to make that more reliable or an alternative, hoping that it doesn’t introduce even more problems. That is rather unlikely, as I’ve discovered that even though the LAN interface is standard, the physical side of those cards isn’t; AFAIK you can’t take a generic IPMI card and install it, it needs to be a proprietary solution by the hardware vendor (e.g., you need a Tyan card for a Tyan motherboard, a Sun IPMI card for a Sun computer, etc…). So if the IPMI solution provided by your hardware vendor has flaws, you’re stuck with it; it’s not like a NIC card that you can replace from any vendor. I don’t know of any way to replace the software on the IPMI cards either, in a manner similar to how you can replace the bad firmware of consumer routers with better open source software. I suppose that the lessons from this story are that:
Comments
never been a big fan but I like to use them when no other application works.