21 January 2011

All about the Benjamins (originally posted to StorageMonkeys June 9, 2009)

The choice of the unit of measure of storage is interesting to me because it's otherwise tought to measure price for performance.

I remain agape at the price tag on high-end, supposedly high-performance, storage systems. Connected by FibreChannel or gigabit Ethernet, that's a limit of 400 and 110 MB/s, respectively. (Yes, I know of 8Gb/s FC and 10GE, but these are prohibitively expensive, if supported.Even link-aggregated GigE practically tops out at 880MB/s) I'm thinking that writes across 40 7200RPM disks could saturate an FC link, and it would take fewer than 20 15k disks. Neither of these strikes me as impractical or unusual sizes of storage arrays, even doubling those numbers for RAID 1. More importantly, such arrays don't strike me as high performance.

Particularly shocking is that a brand name "SAN" solution of such a size would cost in the neighborhood of a quarter million dollars and be at its performance limit. Granted, it might be half that price without fancy management and replication software. whereas the less fancy alternative, at one tenth to one fifth the cost, would still be expandable from a performance standpoint. How much does the Veritas database suite cost these days?


The cheaper alternative, which I have implemented and benchmarked, is using Serial-Attached SCSI (SAS) instead of FibreChannel and commodity SATA disks instead of 10k or 15k spindles.Although it's not necessarily "SAN" in the marketing sense, SAS readily supports multiple hosts per bus. It's also typically implemented as 4x 300MB/s channels on one connector for interfacing to expanders (a rough equivalent to FC switches). An x4 PCIe slot is actually the limiting throughput factor for one of these, as each x1 lane is only 250MB/s. Even with RAID1, rolling my own array would cost $25k (including labor), maybe double that for Dell brand MD1000s. One could then spend twice again the same amount to get triple the throughput on the same server(s), before running up against the limit. Additional fanciness can be gained from 3rd-party storage software vendors, especially in this economy, for under 6 figures.

That's for truly random I/O. For sequential I/O, such as for logs, the situation is even more egregious: only 4 7.2k spindles would saturate a (dedicated) FC link. If it's paired for redundancy, one would need a second pair for the non-sequential, perhaps introducing some management complexity, unless FC link aggregation becomes common enough to be standardized.

Another issue I've had come up in conversation is reliability and/or maintenance. This Usenix paper belies the notion that SATA disks are any less reliable than others. With a 3-6% annualized replacement rate, that's 2-5 disks per year, or about 15% or 12 disks over 2.5 years, on an 80-disk array. I've actually already included this (4 spares per 20 non-spares) in the $25k above.

Somewhere between 2 and 3 years, you're going to have to bite the bullet, spend another $25k for twice as much space, and migrate the old data, assuming you're not already upgrading for other reasons. Woe is you. You'll just have to resort to drowning your sorrows in the hundreds of grand you saved, never mind the headache of shipping disks back and forth.

The Storage Emperor's new clothes are looking mighty skimpy, indeed.

No comments:

Post a Comment