SCIENTIFIC-LINUX-USERS Archives

March 2011

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Chuck Munro <[log in to unmask]>
Reply To:
Date:
Thu, 10 Mar 2011 15:21:31 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (146 lines)
Well, I tried adding a 5-second sleep to the mdadm startup in the 
sysinit script, and 10 seconds in the mdmonitor script, but it made no 
difference.  I still got the spare partitions not included in two of the 
arrays.  What I find curious is that it's always the hot spares, never 
the active components.

The "No suitable drives" thing is a mystery, since all drives work for 
the other arrays, but I get things like:

   # mdadm -A -s
   mdadm: No suitable drives found for /dev/md/md_d23
   mdadm: No suitable drives found for /dev/md/md_d27

when I issue the command manually after the system is up.

d23 and d27 are the random arrays with missing spares this time around. 
  Next time I boot it'll be different arrays.

Time to put on my thinking cap  :-)

Chuck


On 03/10/2011 10:53 AM, Steven J. Yellin wrote:
> Maybe I missed it, but I didn't see any response to your request
>
> "Does anyone know of a way to have mdadm delay its assembly until all
> partitions are enumerated? Even if it's simply to insert a
> several-second wait time, that would probably work. My knowledge of the
> internal workings of the boot process isn't good enough to know where to
> look."
>
> I thought partition enumeration was done before init was started, but I
> don't know much about such matters. That's why I was waiting for someone
> else to reply. Anyway, here's what information I can contribute: In
> /etc/rc.d/rc.sysinit is a line with "/sbin/mdadm -A -s", before which
> you can insert a delay. And /etc/rc.d/init.d/mdmonitor contains
> "#chkconfig: 2345 15 85", where the relatively low number "15" means
> when chkconfig sets mdmonitor to start during boot, chkconfig will make
> a symbolic link to mdmonitor named "S15mdmonitor", causing mdmonitor to
> start relatively early. If you 'chkconfig mdmonitor off', it will not be
> started at all during the boot, and you can do it later by hand with
> "service mdmonitor start". That would let you see if an arbitrarily long
> delay of it helps. Surely you can recover if mdmonitor is needed for
> later parts of the boot, if only by 'chkconfig mdmonitor on' and reboot.
>
>
> Steven Yellin
>
> On Tue, 8 Mar 2011, Chuck Munro wrote:
>
>> Hello folks,
>>
>> This is my first adventure with SL after many years of using CentOS.
>> I'm using SL-6 on a large-ish VM server, and have been quite happy
>> with it.
>>
>> I am experiencing a weird problem at bootup with large RAID-6 arrays.
>> After Googling around (a lot) I find that others are having the same
>> issues with CentOS/RHEL/Ubuntu/whatever. In my case it's Scientific
>> Linux-6 which should behave the same way as CentOS-6. I had the same
>> problem with the RHEL-6 evaluation version. I'm posting this question
>> to the CentOS mailing list as well.
>>
>> For some reason, each time I boot the server a random number of RAID
>> arrays will come up with the hot-spare missing. This occurs with
>> hot-spare components only, never with the active components. Once in a
>> while I'm lucky enough to have all components come up correctly when
>> the system boots. Which hot spares fail to be configured is completely
>> random.
>>
>> I have 12 2TB drives, each divided into 4 primary partitions, and
>> configured as 8 partitionable MD arrays. All drives are partitioned
>> exactly the same way. Each R6 array consists of 5 components
>> (partitions) plus a hot-spare. The small RAID-1 host OS array never
>> has a problem with its hot spare.
>>
>> The predominant theory via Google is that there's a race condition at
>> boot time between full enumeration of all disk partitions and mdadm
>> assembling the arrays.
>>
>> Does anyone know of a way to have mdadm delay its assembly until all
>> partitions are enumerated? Even if it's simply to insert a
>> several-second wait time, that would probably work. My knowledge of
>> the internal workings of the boot process isn't good enough to know
>> where to look.
>>
>> I tried to issue 'mdadm -A -s /dev/md/md_dXX' after booting, but all
>> it does is complain about "No suitable drives found for /dev....."
>>
>> Here is the mdadm.conf file:
>> -------------------------------------
>>
>> MAILADDR root
>> PROGRAM /root/bin/record_md_events.sh
>>
>> DEVICE partitions
>> ##DEVICE /dev/sd* <<---- this didn't help.
>> AUTO +imsm +1.x -all
>>
>> ## Host OS root arrays:
>> ARRAY /dev/md0
>> metadata=1.0 num-devices=2 spares=1
>> UUID=75941adb:33e8fa6a:095a70fd:6fe72c69
>> ARRAY /dev/md1
>> metadata=1.1 num-devices=2 spares=1
>> UUID=7a96d82d:bd6480a2:7433f1c2:947b84e9
>> ARRAY /dev/md2
>> metadata=1.1 num-devices=2 spares=1
>> UUID=ffc6070d:e57a675e:a1624e53:b88479d0
>>
>> ## Partitionable arrays on LSI controller:
>> ARRAY /dev/md/md_d10
>> metadata=1.2 num-devices=5 spares=1
>> UUID=135f0072:90551266:5d9a126a:011e3471
>> ARRAY /dev/md/md_d11
>> metadata=1.2 num-devices=5 spares=1
>> UUID=59e05755:5b3ec51e:e3002cfd:f0720c38
>> ARRAY /dev/md/md_d12
>> metadata=1.2 num-devices=5 spares=1
>> UUID=7916eb13:cd5063ba:a1404cd7:3b65a438
>> ARRAY /dev/md/md_d13
>> metadata=1.2 num-devices=5 spares=1
>> UUID=9a767e04:e4e56a9d:c369d25c:9d333760
>>
>> ## Partitionable arrays on Tempo controllers:
>> ARRAY /dev/md/md_d20
>> metadata=1.2 num-devices=5 spares=1
>> UUID=1d5a3c32:eb9374ac:eff41754:f8a176c1
>> ARRAY /dev/md/md_d21
>> metadata=1.2 num-devices=5 spares=1
>> UUID=38ffe8c9:f3922db9:60bb1522:80fea016
>> ARRAY /dev/md/md_d22
>> metadata=1.2 num-devices=5 spares=1
>> UUID=ebb4ea67:b31b2105:498d81af:9b4f45d3
>> ARRAY /dev/md/md_d23
>> metadata=1.2 num-devices=5 spares=1
>> UUID=da07407f:deeb8906:7a70ae82:6b1d8c4a
>>
>> -------------------------------------
>>
>> Your suggestions are most welcome ... thanks.
>>
>> Chuck
>>

ATOM RSS1 RSS2