SCIENTIFIC-LINUX-USERS Archives

September 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Vladimir Mosgalin <[log in to unmask]>
Reply To:
Vladimir Mosgalin <[log in to unmask]>
Date:
Mon, 3 Sep 2012 22:37:32 +0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (74 lines)
Hi Todd And Margo Chester!

 On 2012.09.02 at 17:33:24 -0700, Todd And Margo Chester wrote next:

> On several Windows machines lately, I have been using
> Intel's Cherryville enterprise SSD drives.  They work
> very, very well.
> 
> Cherryville drives have a 1.2 million hour MTBF (mean time
> between failure) and a 5 year warranty.
> 
> I have been thinking, for small business servers
> with a low data requirement, what would be the
> risk of dropping RAID in favor of just one of these
> drives?

Personally I wouldn't recommend dropping RAID on *any* server where
loss of functionality can cause you any problems. That is, if its
functionality is duplicated and load balancer will switch to other
server automatically - sure, deploy it without RAID, but if its loss can
cause you business troubles, it's a bad idea.

We do use SSDs for some kind of small business servers, but we prefer
to buy at very least two of them and use software RAID; it's fine to use
cheaper SSDs, RAID1 of them is still better idea than single more
expensive one. (unless we are talking about something ultra-reliable
like SLC drive, but these are like 10 times more expensive).

You should understand that MTBF values or warranty time are absolutely
useless when you think about chance of single failure or calculating how
soon the problem will likely happen; taking them into account is worth
it only when you calculate cost of usage or replacement rate for big
park of computers (say, 1000's).

If you want to calculate SSD reliability for server task, the best
indicator would be amount of allowed writes; in some cases, like
database journals (redo logs in Oracle / WAL logs in postgresql / etc)
you might need really lots of writes, if you calculate the value it's
easy to check that it's impossible for consumer SSD drive to last for
years under such load (you need SLC drive or at least enterprise-grade
MLC drive like Intel 710). There are other usage scenarios where SLC
SSDs are must, like ZFS ZIL.

SSDs reliability is okay for mostly-read usage scenarios, but: the
problem is that SSDs fail in completely different way than HDDs.
Actually, I've never seen SSD that has run out of write cycles - but
I've seen quite a few SSDs who died from flash controller failure or
something similar to that. That is, most likely problem with SSD is -
"oops, it died". While this happens with HDDs too, it's way more likely
to just get bad blocks on it. So MTBF for SSD != MTBF for HDDs as we are
talking about completely different types of common failures; you can't
even compare these numbers directly.

> 
> Seems to me the RAID controller would have a worse
> MTBF than a Cherryville SSD drive?
> 
> And, does SL 6 have trim stuff built into it?

Yes, make sure to add "discard" option to ext4 filesystem mounts.

However, it won't work on hardware RAID and probably won't work on
software one either - though I'm not 100% sure of the later.
This means, if you are going to RAID SSD's, using Sandforce-driven SSDs
isn't recommended as you will lose lots of performance over time; use
Marvell-based solutions like Crucial M4, Plextor M3/M3P, Intel 510, OCZ
Vertex 4 and few others; don't use Marvell-based Corsair Performance
Pro with hardware RAID controllers as they are often incompatible,
however.

-- 

Vladimir

ATOM RSS1 RSS2