On Thu, 5 Mar 2009, Miles O'Neal wrote:
> recap: new 64 bit Intel quadcore server with Adaptec SATA RAID
> controller, 16x1TB drives. 1 drive JBOD for OS. The rest are
> setup as RAID6 with 1 spare. We've tried EL5.1 + all yum updates,
> and EL5.2 stock. We can't get /dev/sdb1 (12TB) stable with ext2
> or xfs (ext3 blows up in the journal setup).
>
> So I decided to carve /dev/sdb up into a dozen partitions and
> use LVM. Initially I want to use one partition per LV and make
> each of those one xfs FS. Then as things grow I can add a PV
> (one partition per PV) into the appropriate VG and grow the LV/FS.
> Between typos and missteps, I've had to build up and tear down the
> LV pieces several times. And now I get messages such as
>
> Aborting - please provide new pathname for what used to be /dev/disk/by-path/pci-0000:01:00.0-scsi-0:0:1:0-part6
> or
> Device /dev/sdb6 not found (or ignored by filtering).
>
> I clean it all up, wipe out all the files in /etc/lvm/*/*
> (including cache/.cache), and try again, still broken.
>
> I tried rebooting. Still broken.
>
> How can I fix this short of a full reinstall?
>
> The whole LVM system feels really kludgy. I suppose there's
> not a better alternative at this time?
Just a little history for you to mull over... A while ago block devices
over 2TB (1TB on some systems!) were not supported properly since with the
default block-size (512 bytes) that meant that the block numbers went over
2^32 (2^31 was the limit on some systems including earlier Linux kernels
because the type had been left signed by accident).
While 64 bit block offsets have been added each device driver needs to
properly implement them, so even if people say 'it works for me with
device xxx', you can't be sure it will work for you unless you also have
device xxx and exacactly the same (or newer I guess) drivers.
To work round this most RAID devices have long supported a 'hack', where a
larger volume is exported to the host as a number of slices - typically
each under 2TB as new LUNs on the same bus,target (or whatever). The host
sees these as individual 'disks', but you can fairly easily join things
back together by putting the PVs into the same VG and then slice things up
into the LV's of your choice.
Note that splitting your large device into smaller partitions isn't the
same since the kernel will still need to use large block offsets to access
the parts more than 2TB into the device.
Some people think I'm really silly/old-fasioned for still doing things
this way (using large devices works for them, and it might work for me
with some hardware), but there is little extra overhead and my sanity is
better preserved by not tempting fate.
e.g. on one box I'm using atm it has a ~4TB RAID (a Dell PERC/6i in fact),
but it is configured to present that as three devices to me, whcih we
then partition and set up as PVs as if they were different disks. e.g.
$ grep sd /proc/partitions
8 0 1843200000 sda
8 1 104391 sda1
8 2 1843089255 sda2
8 16 1843200000 sdb
8 17 1843193646 sdb1
8 32 705822720 sdc
8 33 705815743 sdc1
The actual numbers for the 'disk' slice sizes don't really matter but
happened to be based on what Dell suggested for a different box we have
with a PERC/6i controller (it wouldn't fit as 2 '<2TB devices' so we need
at least 3...)
$ pvscan
PV /dev/sda2 VG TempLobeSys00 lvm2 [1.72 TB / 0 free]
PV /dev/sdb1 VG TempLobeSys00 lvm2 [1.72 TB / 78.00 GB free]
PV /dev/sdc1 VG TempLobeSys00 lvm2 [673.09 GB / 0 free]
Total: 3 [4.09 TB] / in use: 3 [4.09 TB] / in no VG: 0 [0 ]
$ vgdisplay
--- Volume group ---
VG Name TempLobeSys00
System ID
Format lvm2
Metadata Areas 3
Metadata Sequence No 11
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 7
Open LV 7
Max PV 0
Cur PV 3
Act PV 3
VG Size 4.09 TB
PE Size 32.00 MB
Total PE 134034
Alloc PE / Size 131538 / 4.01 TB
Free PE / Size 2496 / 78.00 GB
VG UUID h393E0-mHmv-s4gf-Py2x-mJuU-OWUc-VoWtuP
$ df -hlP
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/TempLobeSys00-root 2.0G 644M 1.3G 35% /
/dev/mapper/TempLobeSys00-scratch 50G 9.2G 38G 20% /local
/dev/mapper/TempLobeSys00-var 3.9G 1.5G 2.2G 41% /var
/dev/mapper/TempLobeSys00-tmp 9.7G 152M 9.1G 2% /tmp
/dev/mapper/TempLobeSys00-usr 12G 5.3G 5.6G 49% /usr
/dev/sda1 99M 25M 70M 26% /boot
tmpfs 12G 0 12G 0% /dev/shm
/dev/mapper/TempLobeSys00-tardis 4.0T 9.3G 3.9T 1% /local/tardis
This happens to be using XFS for the larger fs but we had it working with
ext3 - though I understand that this isn't as big as your example.
Re-configuring your RAID controllers to export as <2TB slices isn't fun,
but it should be possible without a re-install (if a bit fiddly).
--
/--------------------------------------------------------------------\
| "Computers are different from telephones. Computers do not ring." |
| -- A. Tanenbaum, "Computer Networks", p. 32 |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP, University of Cambridge |
| Mail: [log in to unmask] Web: http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/
|