SCIENTIFIC-LINUX-USERS Archives

March 2011

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Lamar Owen <[log in to unmask]>
Reply To:
Lamar Owen <[log in to unmask]>
Date:
Fri, 11 Mar 2011 08:37:01 -0500
Content-Type:
Text/Plain
Parts/Attachments:
Text/Plain (15 lines)
On Thursday, March 10, 2011 10:39:29 pm Stephen John Smoogen wrote:
> Ok this sounds familiar with another problem set I heard last week.
> You need to make sure the drives on the array are "raid compatible"
> these days. 

I'll confirm this with certain WD drives in particular.  You're supposed to have the 'RE' drives to use with RAID.  To see more about this, anyone with these symptoms should google 'WDTLER' and get the full scoop.  And, yes, the 'RE' drives are more expensive.  The non-'RE' drives will retry much longer on a sector read issue (allowing media with less Eb/No to be used), and the better media with higher Eb/No (more expensive) would then be used on the 'higher end' RE drives.  But that's speculation.

I had an F13 machine with a pair of 1.5TB drives in RAID1; one drive was  (and is) a Seagate, and the other was (but not is; it is another Seagate now) a WD 15EADS (identical LBA capacity to the Seagate).  This is a client's machine, and thus it's at said client's house, now, and not here.

The WD15EADS drive would cause iowait %'s off the scale; sysstat showed awaits of 20,000ms or more at times on this drive.  Oh, and the partitions were (and are) 4k aligned, before anyone asks.

System load average when the drive 'acted up' shot up through the roof, hitting 15 to 20 during the drive's 'conniption fits.'

Replacing with another Seagate 1.5TB unit resolved the issue; this particular EADS drive could not have TLER set to prevent the issue.  And at times the MD layer would drop that drive out of the array; at that point the machine would run well....

ATOM RSS1 RSS2