28-Jul-2004 (Wed)
Wherein RAID blows and Nigerians scam.

We lost the friday night webcast because the server in question lost its mind and corrupted the file system on its RAID-1 disk pair; it's done this several times now, but this was the worst. At this point, we're inclined to point the finger at the RAID controller card (a Promise FastTrak SX4000.) We did hardware RAID because that is purported to be faster and possibly more reliable than software RAID, but here's something that I didn't know: disks used in a RAID array are married to the card that wrote them. Even though both disks are supposed to be exact copies of each other, if you plug one of them in and try to use it as a normal non-RAID IDE drive, it doesn't work! The disks are in some proprietary format. So RAID protects you from bad drives, but locks you into the exact model of card that you used to format the drive the first time. Nice, very nice. This is apparently not the case with software RAID under Linux, so we're going to try that next. The sucky part there is that first we have to buy a third drive to make a backup of the two RAID drives before reformatting it.

Also, here is why we only accept credit cards with billing/shipping addresses in the US and Canada:

Subject: INTERNATIONAL ORDER
Date: Wed, 28 Jul 2004 23:48:49 +0800
From: "emma qaz" <emma43love@fastermail.com>
To: <orders@dnalounge.com>

HELLO,
  I AM dan horst FROM USA I WILL LIKE TO PURCHASE SOME ITEMS IN YOUR STORE THEN I WANT THE ITEMS TO BE SHIP DOWN TO MY COMPANY IN NIGERIA,SO I WANT YOU TO MAIL ME BACK IF YU CAN SHIP INTERNATIONAL ORDER,THEN MY METHOD OF PAYMENT WILL BE CREDIT CARD, DO MAIL ME BACK ASAP.
  THANKS
dan horst


50 Responses:

  1. injector says:

    From what I've heard hanging out on the linux-kernel list, Linux software RAID is actually faster than most hardware products in most cases.

    The only problem that comes up is no levels but RAID-1 can be booted from without some tricks. (Because the driver to handle the striping is striped across the disks somewhere.) But since your target is RAID-1 it shouldn't be a problem.

    • jwz says:

      Also, the RAID drives aren't the boot disk. But that's good to know...

      • c0nsumer says:

        May I suggest vinum under FreeBSD doing software mirroring? It's possible to do software RAID-1 on the boot disks, with hot swap and all. Works rather nicely...

        • jwz says:

          No, dumbass, you may not suggest that I change distros just to get a drive to work.

          • c0nsumer says:

            Ahh, sorry. Didn't know if you were redoing the whole box after that...

          • jeramey says:

            I've always been curious about why you're so indignant and refuse to try other OSes when they might possibly work better. Not that it really matters. Reading your frustrated rants when technology fails is highly entertaining. And maybe that's your point: technology always does fail at some point.

            Hey look, I answered my own question.

            • jwz says:

              Because whenever I say "how do I get Foo to work", some dipshit fanboy comes out of the woodwork to say, "I don't have any idea, but I like hearing my lips flap, so since I notice that you are using unrelated-software-Bar instead of unrelated-software-Baz, I'm going to waste your time by advocating that!"

              "Change distros to the one that I use" is the way useless fanboys spell "I don't have any idea."

              It's not an answer at all. It's a wild-assed guess. I can make wild-assed guesses myself, thanks, and surely already have.

              So please, people -- before you suggest to me that I switch distros / upgrade my kernel / bleed a chicken, just pretend that you already have, and that I have already insulted you and hurt your feelings, and that you went away sad.

              • The only reason we suggest you change distros is that we love to watch the steam come out of your ears. Pretending we already went through the suggest/insult/depart cycle deprives us of that pleasure. So.

              • jeramey says:

                I suspect that if the fanboys ceased doing that, your blog would suffer an appreciable loss of entertainment value. That said, now that you mention it I can't recall ever reading any of your blog entries where someone provided useful information that included the suggestion to switch to an alternative distro or OS, so I understand why you get ticked whenever someone brings it up.

              • zztzed says:

                Perhaps you should bleed a chicken, you know, in effigy of people who suggest switching distros to solve problems.

          • violentbloom says:

            maybe you should use amiga!

    • I'd like to see some reference on the suggestion that Linux software RAID is faster than "most" hardware RAID setups. I find that grossly difficult to believe.

      • ydna says:

        Well, this is not exactly what you're looking for. But from my personal experience, the Promise RAID cards suck. I did a lot of testing a couple years ago. They're slower than 3ware IDE RAID cards. And they cause the kernel (or ata/ide driver) to lock up when a drive is removed from a so-called hot-swap bay. On top of this, my tests with software RAID, even to the 3ware card configured as JBOD, was faster than any RAID configuration I could come up with in the 3ware card. I'm just certain I must have been doing something wrong, but I went through a lot of configurations and kept getting the same results: the Linux software RAID just screamed in comparison (with respect to arrays of ATA/IDE drives; no testing with proper SCSI arrays).

        • What testing did you do? bonnie++? Something else?

          It's intuitively unlikely that any software RAID would outperform any hardware RAID under heavy I/O conditions, but perhaps Linux software RAID is playing some we-know-about-the-file-system tricks to speed things up.

          If nothing else, hardware RAID should be safer, since write caches and so forth don't rely on a function OS (precisely the times that you want want that caches flushed gracefully is when the OS is getting itself hosed).

          • ydna says:

            I don't recall what all tests I used. I don't think bonnie was involved. But I do remember that simply shoving bits at the raw "device" was faster on soft RAID (no filesystem at all).

            I personally do not trust software RAID still. So, even if the hardware might be a little slower, I trust my data on cheap arrays to 3ware cards. All Promise controllers have be placed under the wheels of the accounting staff's personal automobiles and subsequently rendered harmless.

            I haven't built any SCSI arrays in a while, but I used to swear by Mylex DAC960 cards back when they were their own company and Leonard Zubkoff was alive and writing his beautiful drivers.

          • freiheit says:

            AFAIK, "ATA RAID" cards don't actually do hardware RAID.

            They do software RAID with just barely enough BIOS tricks to get the OS to boot off of it dishonestly marketed as hardware RAID. The drivers handle all the RAID with at least some models having the XOR computation work offloaded onto a chip on the card.

            This technique or similar ones is quite common with ATA and SATA "RAID" cards. Specific 3ware, Adaptec and LSI MegaRAID models being the exception.

            So, it doesn't surprise me the slightest bit that a proprietary driver developed in a back room over a long weekend doesn't perform as well as an open source driver that's been meticulously tweaked for years.

            Besides, the main bottleneck is the drives; whether the OS has to send the data once or twice is a negligible factor.

      • injector says:

        How about the `dmesg` from any machine running RAID-5? Most RAID controllers have CPUs that only runs a few hundred MHz. Even if they do have optimized XOR algorithms, so does MMX. Even modest CPUs can compute a gigabyte of parity a second. So then it just comes down to the actual IO speed of the drives holding everything back. The kernel also has extra knowledge about the filesystem and cache that a hardware RAID controller has no access to. So some things never even have to be pushed down to the disk subsystem. The only place where you gain more overhead is the parity data goes over the PCI bus, where on a hardware card it is generated on the other side. But the bottle neck is the drive bus here anyway, and the parity ends up there no matter what.

        Most machines fit into two categories, CPU bound, read a little data do a lot of crunching, and write a little back. Or disk bound, streaming as much data to or from the drives as they can take, where the program is always in IO wait. Both of these have no problem with software RAID. The only case where it doesn't work is when the data coming from the maxed out drives is exactly enough to max out the CPU processing it. Then hardware RAID can benefit, but so can just buying the next faster CPU.

        I think just about every kernel developer agreed that Linux's software RAID was faster, cheaper, and safer.

        • Could you please provide a reference to some actual testing of this, rather than your own theories?

          (It sounds to me like you're talking about the craptastic IA32 RAID cards that are commonly available from people like Promise and 3Ware. It's definitely possible to get far better, purpose-built hardware RAID setups, even on an IA32 machine, and I don't believe that your arguments hold true in those cases.)

          • injector says:

            Hot off the bonnie++...

            All machines use Seagate Cheetah 10k RPM drives in RAID5, just the number and configuration vary as I'll note, bonnie is just run with the default options:

            Dual 1.4 GHz Athlon MPs, 1 GB of RAM, Adaptec U320 RAID, 3 drives on 1 channel, load average 0.00, 0.00, 0.05:

            Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
            -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
            Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
            infini.hereintow 2G 22345 83 27694 21 11575 6 22565 74 52605 18 352.9 1
            ------Sequential Create------ --------Random Create--------
            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
            files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
            16 1168 99 +++++ +++ +++++ +++ 1138 99 +++++ +++ 2671 99

            Dual 1.4 GHz Opteron 240s, 1 GB RAM, LSI Logic MegaRAID 320, 6 drives on 2 channels, load average 0.39, 0.31, 0.68:

            Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
            -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
            Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
            prime.hereintown 2G 8060 20 9030 2 6346 2 16579 41 44195 8 257.2 0
            ------Sequential Create------ --------Random Create--------
            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
            files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
            16 372 3 +++++ +++ 345 2 397 4 +++++ +++ 398 2

            Dual 667 MHz Pentium IIIs, 1 GB RAM, Infortrend external SCSI-to-SCSI RAID, 4 drives on 1 channel, load average 0.00, 0.00, 0.24:

            Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
            -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
            Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
            lightning.herein 2G 7053 90 6610 7 3487 3 6841 82 14173 6 337.2 2
            ------Sequential Create------ --------Random Create--------
            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
            files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
            16 528 98 +++++ +++ 25267 100 575 99 +++++ +++ 2106 98

            Single 700 MHz Athlon, 256 MB RAM, Symbios 53C896 SCSI w/Linux MD RAID, 5 drives on 1 channel, load average 0.06, 0.21, 0.25:

            Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
            -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
            Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
            clubneon.clubn 496M 10688 79 28010 31 12723 14 12574 89 48764 33 469.8 4
            ------Sequential Create------ --------Random Create--------
            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
            files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
            16 605 98 +++++ +++ 29543 97 591 97 +++++ +++ 2220 96

            So the software RAID didn't top out the bunch totally, I wasn't expecting it to with this competition. But looking at the actual machines, an old single CPU machine can give a much newer machine with 4 times the CPU power and RAM a run for its money.

            • notabouthim says:

              wow. awesomely useful information/insight. thanks for posting that.

            • I'm surprised about the MegaRAID performance being that poor (though I've seen MegaRAID cards do some really wonky things under Linux, so I question the controller device driver).

              Just so we know, could you say a bit more about the Infortrend setup you were using? (I've used their SATA-to-2 Gb FC-AL external arrays, and they were way faster than that.)

              • injector says:

                I've heard recently that there are some performance problems with the MegaRAID driver also.

                The Infortrend is their older IFT-3101 3 channel model. It was originally going to be part of a 2 machine cluster using GFS. But GFS went proprietary and Infortrend never released a firmware upgrade supporting D-locks for this controller. So only two channels are in use, one to the onboard SCSI of the motherboard and one down to the drive array. The motherboard uses a Symbios 53C896 chip, and the IFT has three more of the same. It has 64 MB of cache (the max for that unit), and is based around a PowerPC chip.

                It is an older unit, but I too am surprised at how poorly it did. The machine is being retired from its current service, I'll see if I can tune it a bit before it takes on its next role.

    • fo0bar says:

      Linux RAID1 is great (and yes JWZ, the RAID1 disks in lihnucks is the same as a non-RAID equivalent, you just have to make the paritions' flags be "fd" instead of "83" for the kernel stuff to recognize it as a raid device), however, the limit is the PCI bus as you get large:

      My experience on how doing 6-disc RAID5 with large hard drives ain't that possible

  2. colubra says:

    On the selling-posters front, gad, that sounds like a piss-cutter.
    I haven't been to DNA in yearsandyears. Is there somewhere near the entrance that you could put up a frame mounted on the wall, with something like 'BUY THIS POSTER OVER THERE AT THE SWAG TABLE FOR XTEEN $' on the wall beneath it- with the frame set up such that it's an easy job to trade the poster in it for whichever band's currently on the stage? That might help you not have to rely on merchants who might display their own wares to the detriment of yours-- and let people have a better look at the poster than they could get over the merchandise table, anyways.
    hope that's a useful thought.

  3. travisd says:

    We have in the past basically done a warm-standby arrangement for disks in systems that aren't being updated a lot. Basically use dd or somesuch to duplicate the disk to another one on a regular basis. Probably not the right thing for a data drive, but it worked reasonably well for boot disks in the Slowaris boxen. Since there was no RAID hardware or software involved the disks were pretty portable, even between systems.

  4. spendocrat says:

    They don't have a good reputation anywhere that I know of.

    You'd have the same disk incompatibility problem with other vendors, but probably not the data corruption. We (movie effects company with a few TB of data) use a lot of 3ware stuff.

  5. freiheit says:

    Do you really need to get a third drive?

    Can't you pull one of the drives out of the RAID1 (operating in "degraded", AKA "non-redundant" mode for the duration), hook it up to a non-RAID IDE channel in the box, reformat it and then copy the data over? I haven't dealt with promise raid before, but other than some possible incessant beeping this should work.

    Then on whatever new RAID1 setup you move to, configure a RAID1 with only one drive ("degraded"), copy your data into it and then, finally, add the drive into the pool. If you're doing Linux software RAID1 you can skip this second copy operation and configure the drive with the unmunged data as the only drive in the RAID1 and then add in the "bad" drive.

    It's the same number of copies (2) as you'd be doing with a 3rd drive, but with the added hassle of 2 extra steps. A degraded RAID1 will be about half as fast on reads, but writing is your bottleneck here, anyways.

    • jwz says:

      Yeah, that might work -- or not. Doing that would require me to trust everything in the path to fail to do something Catastrophically Stupid. No thanks, I'll just burn cash on another disk instead.

      • kw34hd1 says:

        it does work, and quite well. linux software raid is one of the only experimental parts of the linux kernel that i've used that actually works as advertised. neil brown rules.

        if you're worried about the data, just copy it over the network to one or more other machines with extra space.... chances are you'll just delete those temp files after you're done. if something _does_ go bad, then you've got another copy.

        -j

  6. ex_sjc says:

    I say you just get a new Acer.

  7. Most "hardware" IDE RAID chipsets (especially those that came integrated with motherboards, and definitely all the Promise and HighPoint kit) hoist all of the computational heavy lifting on the OS through shitty drivers, which commonly aren't that well supported (if at all) in Linux. Software RAID is preferable over one of those cards.

    3ware cards, OTOH, are real hardware RAID cards. You probably already know this, but the kernel has had 3ware drivers for a while.

    Just out of curiosity, what linux distro are you using? (No, I'm not trying to convert you.)

    • jwz says:

      RH9 on just about everything; OpenBSD 2.8 on the firewall (because I haven't yet found the stomach to convert ipf to whatever-the-new-thing-is, and there exist no auto-conversion scripts.)

      • Interesting (re: RHL9). Are you doing anything for updates, such as Fedora Legacy, or are you just not caring?

        • jwz says:

          I've been just not caring, since the RH9 updates stopped coming in via Red Carpet. I suppose I'll try to care again the next time a big exploit makes the rounds. I'm really not looking forward to doing a full reinstall of every machine, that's for sure; and I'm told that the Fedora installers don't like to "upgrade" from RH9.

          • dzm6 says:

            For whatever it's worth, Fedora Core 2 RC 2 fucked my system, but Fedora Core 2 actually nicely upgraded a RH9.

            Mileage may vary, etc.

          • transiit says:

            I look forward to the future jwz livejournal entry asking which distribution to use post RH9.

            It'll be a hoot.

            -transiit

    • fzou says:

      Of course, there are other ways which can induce similar crappiness in 3Ware cards. (16 WD-120GB/8MB's behind two 32-bit PCI 3Ware cards. I forget the exact model.)

      You'd think jwz would be rich enough, or wise enough, to get someone else to deal with hardware. But no.

  8. malokai says:

    the new linux raid stuff in 2.6 might be able to decode the raid format present on the raid controllers..

    I read some posts on l-k about adaptec doing the same with their controllers (adopting linux's ondisk format).

    It could also be a spurious memory fault, cache, etc.. Basically if it's just shitting all over itself any point between the harddrive and CPU is at fault.

    You're not using some sort of SiS chipset right?

    • chromal says:

      Ugh, this is bringing back bad memories of the Promise FastTrack TX2 "RAID" card. Pain to use; I recall needing to feed the kernel a bunch of command-line arguments that essentially told it to pay no attention to the RAID card (which is apparently indistinguishable from their non-RAID ATA controller cards to certain Linux drivers). I recall it requiring a triple-stage boot. I recall having to do all sorts of smoke and mirrors to install it on the disks as one dev name and then switch everything later from a boot disk before the system would actually boot. Pain.

      I've heard much happier things about Adaptec ATA RAID implementations, which basically look like SCSI volumes to Linux. Simple 'nuff. 3ware's isn't too difficult to work with, either. Neither may be the best, but they're certainly an order of magnitude more turnkey than Promise's low-cost products. And they actually degrade gracefully when a drive does fail. Promise locked the system.

      The thing I especially liked about 3ware's implementation was the way I could still query SMART ATA drives with smartctl and get stuff like remapped bad sectors, operating temperature, and other predictions of drive wear right off of the drives. Now if my drives exceed safe operating temperature, they page me. Uhm. Yay.

      Anyway, your time and business is worth more than the minute savings Promise controllers offer over real ATA RAID implementations. Lose the promise.

  9. violentbloom says:

    maybe you can trade the guy in nigeria tshirts for drugs! or mail order brides!
    and just forget the credit card bit completely.

    • ciphergoth says:

      Or maybe a DNA Lounge T-shirt is all he needs to unlock the $10,000,000 in a Swiss bank account belonging to his dead uncle...

  10. edm says:

    Allow me to second the observation that Promise RAID cards are basically software RAID (in either a proprietary driver, or in a scary (deprecated in Linux 2.6 I think) experimental" open source driver -- read the driver source before you consider trusting data you value to it). (Several other IDE RAID cards are also essentially software RAID, but the Promise ones seem to be the most widely used.)

    3Ware RAID cards are actually closer to hardware RAID; they have good Linux support (Open Source driver that's been in the Linux kernel since early 2.4 days, if not earlier). As far as I'm aware most distros will support installing onto 3Ware cards out of the box. (I'm not certain all the 3Ware support is in hardware, but it certainly seems like as much of it is in hardware as on SCSI RAID cards.)

    I had a couple of servers with Promise RAID cards; I'm very pleased I got rid of them. I now have three servers with 3Ware RAID cards, and am quite pleased with them -- there's even some reasonable Linux monitoring tools (and I can send you the shell script wrapper I use to hook them up to nagios if you want).

    I also support one server with Linux Software RAID. It's reasonably good these days, but be very wary of using Linux Software RAID and LVM together -- especially in LVM 1.0 there's some rather dodgy things happen with device scanning which can lead to it finding only half your RAID device. (This is also true with the Promise RAID cards if you use the open source driver.) (Ironically the Linux Software RAID is on SATA drives connected to a Promise SATA RAID card. After my earlier experiences I wasn't going to use the Promise RAID support, so I'm using it as an expensive SATA controller. It wasn't what I speced for the machine -- I'd speced a 3Ware SATA RAID card -- and they chose to buy it instead of what I speced, so I don't feel bad about this.)

    Ewen

  11. partylemon says:

    Will you ship international orders to people who have graduated some form of high-school?

    • jwz says:

      When we first started taking credit cards in the store, the LJ peanut gallery here was in an uproar of "waaah, I want to buy a t-shirt but I'm not in the US or Canada", so I added this page.

      You may now die of non-shock to learn that we haven't had a single person take advantage of that, ever.

      • gths says:

        Yeth. Going to the post office and queuing up, getting a money order (paying the usual crappy commission and a less than optimal exchange rate), and then sending it off par avion, so it takes a week for the order to get in, is arse. Though I have done it in the past, filthy looks from the postal staff and all... Still might go through that all shit, though, since I've been on a wanky t-shirt binge and green on black is natty as.

        I understand fully why you do this. It doesn't mean I have to like it, eh?

      • partylemon says:

        You may now die of non-shock to learn that we haven't had a single person take advantage of that, ever.

        Given that the cost of getting a money order, fees for the currency conversion and then making sure it doesn't wash up on the shores of some pacific island would triple the price of the order - I'm not that desperate when bootleg shirts just require a decent colour printer.

  12. rjhatl says:

    Ask him for the address in Nigeria where he wants the stuff shipped. If it's in Lagos, I can have some friends stop in there and possibly get a photo of him for posterity.