SCIENTIFIC-LINUX-USERS Archives

August 2017

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Konstantin Olchanski <[log in to unmask]>
Reply To:
Konstantin Olchanski <[log in to unmask]>
Date:
Wed, 23 Aug 2017 17:57:09 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (76 lines)
On Fri, Aug 18, 2017 at 10:46:32AM -0700, ToddAndMargo wrote:
> 
> Is there a way to create software raid 1 after the fact?
> Meaning, you already installed SL on a stand alone drive.
> 

Yes, in the nutshell:

- you must have two disks of identical size (/dev/sda, /dev/sdb). (HDD, SSD, etc does not matter)
- assuming you have non-raid system already installed on /dev/sda, as /dev/sda1, sda2, etc
- partition the 2nd disk same as the first disk, mark partition type as mdadm-raid
- create raid devices using just /dev/sdb, i.e. mdadm --create --raid-devices=2 /dev/md1 missing /dev/sdb1
- "missing" is the placeholder for /dev/sda1 which is in use, cannot be touched yet
- mkfs, mount /dev/md1
- rsync /dev/sda1 to /dev/md1
- doctor /etc/fstab to mount /dev/md1 instead of /dev/sda1
- reboot
- /dev/md1 should be in use now
- change partition type of /sda1 to mdadm-raid
- add /dev/sda1 to /dev/md1: mdadm --add /dev/md1 /dev/sda1, wait for raid1 rebuild to finish
- first partition is now "raided", repeat for all partitions.

In el6 and el7 dracut creates special complications for automatically starting
raid arrays on reboot: all md-raid partitions have to be listed in /etc/mdadm.conf
*inside* the initramfs file. So after the usual "mdadm -Ds >> /etc/mdadm.conf"
you need to rebuild all initramfs files ("dracut -fv").

To boot from a raid1 array, you need to specify superblock format "1.0"
using "mdadm --create --metadata=1.0" (default is 1.1, normally not bootable).

To boot from raid1 you need to change the right magic stuff in /boot/grub/grub.conf (el6)
and magically magic stuff in some magic place (el7).

To boot from raid1, you need to have the boot loader installed on both disks (if 2nd disk
has no boot loader, and 1st disk fails, you are dead in the water). The common bug is to
have the 2nd disk boot loader to boot from the 2nd disk - if the 1st disk fails,
the system will not boot because there is no 2nd disk anymore. Boot loader on both
disks should be configured to boot from 1st disk. (1st disk, 2nd disk, etc refers
to the boot loader magical order, which could be different from the linux sda, sdb order).

In my experience, it is straightforward to "raid1" an existing system, but it is a major chore
to get the raid arrays to correctly start on reboot and to get the system
to actually boot *and* to actually boot from either disk (*simulated single-disk failure
must be tested*). For all of this, dracut and grub actively prevent you from getting
a working system. Consider switching to syslinux, but no trivial replacement for dracut.

The el6.8+ installer seems to confugure raid1 correctly (early el6 releases had bugs,
i.e. did not install the boot loader on the 2nd disk), so you may find it quicker to reinstall
the system from scratch, compared to the above "to-raid1" procedure.

Of course it is much simpler if you have 3 disks, the disk in use now, and two new disks "to raid1",
in this scenario, you will never have an unbootable system (unless you make a mistake, i.e. overwrite
the bootloader in the active disk).

You may be able to find some "easy" scripts to do the conversion trivially,
but under the hood all these scripts have to do all the same work I just wrote down
and all the same bad things may go wrong with no way to automatically recover.

Good luck.

P.S.

Instead of mdadm raid1, today, I would recommend ZFS raid1, *except* I do not
know how to boot from ZFS and I do not know how to setup a redundant boot loader
for ZFS. So the system partition would still have to be mdadm raid1 even if you
use ZFS for all the other partitions.

Why ZFS? Background fsck (zpool scrub), no lengthy raid1 rebuilds, no non-zero md/mismatch_cnt,
no need for manual recovery after hardware/software glitch kicks disk our of raid array.

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2