On Wed, Feb 22, 2012 at 4:22 PM, Bill Maidment <[log in to unmask]> wrote:
> -----Original message-----
> From: Tom H <[log in to unmask]>
> Sent: Thu 23-02-2012 01:12
> Subject: Re: Degraded array issues with SL 6.1 and SL 6.2
> To: SL Users <[log in to unmask]>;
>> On Wed, Feb 22, 2012 at 7:58 AM, Bill Maidment <[log in to unmask]> wrote:
>> > -----Original message-----
>> > From: Bill Maidment <[log in to unmask]>
>> > Sent: Mon 20-02-2012 17:43
>> > Subject: Degraded array issues with SL 6.1 and SL 6.2
>> > To: [log in to unmask] <[log in to unmask]>;
>> >> I have had some issues with the last two kernel releases. When a degraded
>> array
>> >> event occurs, I am unable to add a new disk back in to the array. This has
>> been
>> >> reported on Centos 6.1/6.2 and also RHEL 6.2 (see Bug 772926 - dracut unable
>> to
>> >> boot from a degraded raid1 array). I have found that I need to revert to
>> kernel
>> >> 2.6.32-131.21.1.el6.x86_64 in order to be able to add the new drive.
>> >
>> > The response from RH is as follows:
>> > 1) If you try to re-add a disk to a running raid1 after having failed it,
>> > mdadm correctly rejects it as it has no way of knowing which of the disks
>> > are authoritative. It clearly tells you that in the error message you
>> > pasted into the bug.
>> >
>> > 2) You reported a Scientific Linux bug against Red Hat Enterprise Linux.
>> > Red Hat does not support Scientific Linux, please report bugs against
>> > Scientific Linux to the people behind Scientific Linux.
>> >
>> > My response is:
>> > 1) a) It used to work it out. b) No it does not clearly spell it out. c) Why
>> was it not a problem in earlier kernels?
>> > 2) Is this an SL bug? I think not!
>>
>> Bug 772926 doesn't have anything about SL. Are you referring to another bug?
>>
>> In (1) above, are they replying that you can't "--fail", "--remove",
>> and then "--add" the same disk or that you can't "--fail" and
>> "--remove" a disk, replace it, and then can't "--add" it because it's
>> got the same "X"/"XY" in "sdX"/"sdaXY" as the previous, failed disk?
>>
>
> Bug 772926 was reported from via someone from CentOS, but it would affect SL too and it seemed to be related
> http://bugs.centos.org/view.php?id=5400
>
> I think they are saying that you NOW can't re-add the same disk without first zeroing out the disk superblock.
> I just find the wording of the error message a bit confusing:
>
> [root@ferguson ~]# mdadm /dev/md3 -a /dev/sdc1
> mdadm: /dev/sdc1 reports being an active member for /dev/md3, but a --re-add fails.
> mdadm: not performing --add as that would convert /dev/sdc1 in to a spare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc1" first.
> [root@ferguson ~]#
772926 doesn't have "You reported a Scientific Linux bug against Red
Hat Enterprise Linux".
The wording about a spare in the third line seems wrong. Anyway, I'd
never re-add a failed and removed disk without zeroing the superblock;
if you could do it previously, it was an oversight/bug that's now been
fixed.
|