SCIENTIFIC-LINUX-USERS Archives

March 2011

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephen John Smoogen <[log in to unmask]>
Reply To:
Stephen John Smoogen <[log in to unmask]>
Date:
Thu, 10 Mar 2011 20:39:29 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (56 lines)
On Thu, Mar 10, 2011 at 20:24, Chuck Munro <[log in to unmask]> wrote:
> This is a bit long-winded, but I wanted to share some info ....
>
> Regarding my earlier message about a possible race condition with mdadm, I
> have been doing all sorts of poking around with the boot process. Thanks to
> a tip from Steven Yellin at Stanford, I found where to add a delay in the
> rc.sysinit script, which invokes mdadm to assemble the arrays.
>
> Unfortunately, it didn't help, so it likely wasn't a race condition after
> all.
>
> However, on close examination of dmesg, I found something very interesting.
>  There were missing 'bind<sd??>' statements for one or the other hot spare
> drive (or sometimes both).  These drives are connected to the last PHYs in
> each SATA controller ... in other words they are the last devices probed by
> the driver for a particular controller.  It would appear that the drivers
> are bailing out before managing to enumerate all of the partitions on the
> last drive in a group, and missing partitions occur quite randomly.

Ok this sounds familiar with another problem set I heard last week.
You need to make sure the drives on the array are "raid compatible"
these days. Various green drives can take way too long to spin up or
goto sleep quickly causing them to get marked as bad by dmraid before
they get ready. However if its not that, then the next two issues tend
to be cable related:

1) Cable isn't rated for the length. Sure you can buy a 2 foot sata
cable but the controller timing issues may assume something much
shorter.
2) Cable isn't rated for drive capacities.
3) Other bios issues that require updates and playing around (oh wait
the default is to spin everything down but I need it up).

> So it may or may not be a timing issue between the WD Caviar Black drives
> and both the LSI and Marvell SAS/SATA controller chips.
>
> So, I replaced the two drives (SATA-300) with two faster drives (SATA-600)
> on the off chance they might respond fast enough before the drivers move on
> to other duties.  That didn't help either.
>
> Each group of arrays uses unrelated drivers (mptsas and sata_mv) but both
> exhibit the same problem, so I'm mystified as to where the real issue lies.
>  Anyone care to offer suggestions?
>
> Chuck
>



-- 
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren

ATOM RSS1 RSS2