5-Jan-2003 (Sun)
Wherein are presented three-fisted tales of sysadmin horror!

DANGER: hard core geekery ahead! I don't make fun of hippies or drunk people at all in this entry.

So, while we were testing some things trying to figure out how to recover the motherboard with the scorched BIOS, I accidentally typed ``shutdown'' at the wrong machine: I shut down the machine in colo (the mp3 server.) I lost all of my sysadmin cred with that one (if I ever had any to begin with) and we had to get Tom to let us in to colo to try and fix it.

There are three machines involved, all of which are VA Linux FullOn 2230s, with Intel L440GX+ motherboards. Two of the machines are at the club (the MP3 encoder, and the RealVideo encoder) and the third is in colo (the MP3 server that the outside world connects to.) Here's what we've learned:

  • Linux will run fine on these mobos with any processor available (500MHz, 700MHz, or 1GHz), alone or in pairs.

  • BIOS 9.something and 10.something work fine with 500MHz and 700MHz CPUs, alone or in pairs.

  • BIOS 9.something and 10.something freak slightly when confronted with 1GHz CPUs: the BIOS doesn't know what to make of it, and pauses waiting for you to type F1 before continuing. (But after that, it will boot, and Linux works fine.)

  • BIOS 14.3 (build 133, the latest release) freaks hard when confronted with either 700MHz or 1GHz CPUs: it says something to the effect of, ``CPU not compatible with mainboard revision.'' After that, it won't boot at all. That's right, upgrading the BIOS makes it no longer able to boot the 700MHz CPUs (which it previously had no complaints about whatsoever), and likewise unable to boot the 1GHz CPUs, which it would previously grudgingly tolerate. This despite the fact that the release notes say that 1GHz CPUs have been supported since 13.0. (Our mainboard version numbers are recent enough, according to the relnotes.)

  • Intel's BIOS upgrader software will not let you make a backup of the previous BIOS version so that you can back out the change: apparently the BIOS is locked as write-only. (Incidentally, the BIOS NVRAM is surface-mounted, not in a socket.)

  • BIOSes older than 14.x are not available for download anywhere.

We could not find any way to get a 1GHz chip to boot unattended; we could not find any way to get any chip but a 500MHz to work on a mobo whose BIOS had been upgraded; nor could we find any way to downgrade the BIOS. Fuck you very much, Intel.

So, we had three machines, five motherboards with various BIOSes on them, nine CPUs of various flavors, and a jigsaw puzzle of an evening ahead of us. One of the motherboards had a corrupted BIOS, and another of the motherboards has a dead SCSI controller.

We spent a long time trying to un-scorch the dead BIOS to no avail; and we considered switching to IDE disks for the one without a SCSI controller. But to make a long and tortuous story short, here's what we ended up with:

Machine: Last week: Desired: Actual:
MP3 Encoder, club 2 x 500MHz 2 x 700MHz 2 x 500MHz
Real Encoder, club 1 x 700MHz 2 x 500MHz 2 x 1GHz (must type F1 to reboot)
MP3 Server, colo 1 x 700MHz 2 x 1GHz 2 x 500MHz

We could have gotten the 2x1GHz CPUs onto the MP3 Server as originally intended, except then I would have had to go to colo to reboot it, so we made the ``have to type F1'' machine be at the club.

It turns out that RealEncoder will work on a 2x700MHz machine, or on a 1x700MHz machine, but not on a 2x500MHz machine. In hindsight, this makes a kind of sick sense, if you assume that the single RealEncoder process requires (say) a dedicated 600MHz to work properly. When we had 2x500MHz in there, I imagine that what was going on was that one of the CPUs was overloaded, and the other was mostly idle.

It remains to be seen whether 2x500 is better than 1x700 for the MP3 Server machine. I suspect it will be, since there are many load-ful processes running on it. But it still remains to be seen whether load was the cause of the ``chipmunk'' problem at all.

If any of you have any suggestions as to how to get an L440GX+ to work with a 1GHz Pentium III (Slot 1, 100MHz bus, 256K cache) please let me know! At this point, all I can think of is replacing both the motherboards and CPUs at the same time with some totally unrelated model, and I'm pretty tired of throwing money down this rathole. We just spent $330 on these new 1GHz CPUs because I thought the MP3 Server machine might have had a load problem! I don't even know if it'll fix anything...

2 Responses:

  1. moof says:

    The release notes at build 121 have some interesting comments about cpu speed, and clock-locked-ness. Build 115 also talks about other board revisions.

    The history also has an awful lot of things along the lines of 'fix for P3 rev a3 detection speed with other processor'; you may wanna complain to Intel. (ha ha ha!)

    • jwz says:

      Yeah, I read all that, uh, "interesting" text. But the fact remains that builds 133 and 132 are the only ones I can actually find anywhere.