LISTSERV - SCIENTIFIC-LINUX-USERS Archives

Hi All,

We have started seeing a rather worrisome problem with our large XFS  
fileservers.  Basically, it appears that something overwrites the LVM  
metadata on a physical volume.  In every case so far, after a reboot  
we can repair the physical volume without losing data or corrupting  
the filesystem.  We don't know exactly what triggers this, but it  
seems related to filling up a filesystem (or physical volume?).

Our RAID servers are running SL305 with the 2.4.21-37.EL.XFSsmp  
kernel.  We use 3ware 9500S-12 RAID controllers with 750G disks in a  
RAID5 configuration.  The 3ware controller splits this 6.82TB RAID  
volume into 2TB volumes (sd[c-f]) that we combine into one volume  
group.  We currently have nine identical fileservers (some in  
operation for over two years), and in the last month we've seen this  
problem on two of them.  We now have a test setup that we can (more  
or less) reliably trigger the problem on.  Since we don't know  
exactly where the problem originates, I will attempt to describe our  
test setup and how everything is created.

Logical volumes are created as follows (for example):
[root@lnx113 root]# lvcreate --size 2000G --name test1 vg1
[root@lnx113 root]# lvcreate --size 128M --name test1log vg

And we create XFS filesystems as follows:
[root@lnx113 root]# mkfs.xfs -s size=4096 -l logdev=/dev/vg/test1log / 
dev/vg1/test1
meta-data=/dev/vg1/test1         isize=256    agcount=32,  
agsize=16384000 blks
          =                       sectsz=4096
data     =                       bsize=4096   blocks=524288000,  
imaxpct=25
          =                       sunit=0      swidth=0 blks,  
unwritten=1
naming   =version 2              bsize=4096
log      =/dev/vg/test1log       bsize=4096   blocks=32768, version=2
          =                       sectsz=4096  sunit=1 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

The first logical volume (test) is 1.9TB and resides entirely on  
sdc.  The second logical volume (test1) fills up sdc and uses up part  
of sdd.

Everything then looks like:
[root@lnx113 root]# pvscan
pvscan -- reading all physical volumes (this may take a while...)
pvscan -- WARNING: physical volume "/dev/sda4" belongs to a meta device
pvscan -- WARNING: physical volume "/dev/sdb4" belongs to a meta device
pvscan -- ACTIVE   PV "/dev/sdc"  of VG "vg1" [2 TB / 0 free]
pvscan -- ACTIVE   PV "/dev/sdd"  of VG "vg1" [2 TB / 195.94 GB free]
pvscan -- ACTIVE   PV "/dev/sde"  of VG "vg1" [2 TB / 2 TB free]
pvscan -- ACTIVE   PV "/dev/sdf"  of VG "vg1" [840.78 GB / 840.78 GB  
free]
pvscan -- ACTIVE   PV "/dev/md2"  of VG "vg"  [68.72 GB / 49.17 GB free]
pvscan -- total: 7 [6.89 TB] / in use: 7 [6.89 TB] / in no VG: 0 [0]

[root@lnx113 root]# lvscan
...
lvscan -- ACTIVE            "/dev/vg/testlog" [128 MB]
lvscan -- ACTIVE            "/dev/vg/test1log" [128 MB]
lvscan -- ACTIVE            "/dev/vg1/test" [1.86 TB]
lvscan -- ACTIVE            "/dev/vg1/test1" [1.95 TB]
lvscan -- 14 logical volumes with 1.83 TB total in 2 volume groups
lvscan -- 14 active logical volumes

We then fill up the filesystems using dd (for example, `dd if=/dev/ 
zero of=/mnt/test1/zero bs=1M count=200000`).  We can fill up /dev/ 
vg1/test without a problem.  However, while filling up test1, at some  
point syslog reports the following error:
Feb  3 15:36:28 lnx113 kernel: attempt to access beyond end of device
Feb  3 15:36:28 lnx113 kernel: 08:20: rw=1, want=0, limit=2147483647

Note that this happens before test1 fills up.  For example:
[root@lnx113 root]# df -h |grep test
/dev/vg1/test         1.9T  1.9T   36K 100% /mnt/test
/dev/vg1/test1        2.0T  1.8T  193G  91% /mnt/test1

A pvscan and vgscan then show the problem (what happened to sdc?):
[root@lnx113 root]# pvscan
pvscan -- reading all physical volumes (this may take a while...)
pvscan -- WARNING: physical volume "/dev/sda4" belongs to a meta device
pvscan -- WARNING: physical volume "/dev/sdb4" belongs to a meta device
pvscan -- ACTIVE   PV "/dev/sdd"   is associated to unknown VG  
"vg1" (run vgscan)
pvscan -- ACTIVE   PV "/dev/sde"   is associated to unknown VG  
"vg1" (run vgscan)
pvscan -- ACTIVE   PV "/dev/sdf"   is associated to unknown VG  
"vg1" (run vgscan)
pvscan -- ACTIVE   PV "/dev/md2"  of VG "vg"  [68.72 GB / 49.17 GB free]
pvscan -- total: 6 [4.89 TB] / in use: 6 [4.89 TB] / in no VG: 0 [0]

[root@lnx113 root]# vgscan
vgscan -- reading all physical volumes (this may take a while...)
vgscan -- found active volume group "vg"
vgscan -- found active volume group "vg1"
vgscan -- ERROR "vg_read_with_pv_and_lv(): current PV" can't get data  
of volume group "vg1" from physical volume(s)
vgscan -- "/etc/lvmtab" and "/etc/lvmtab.d" successfully created
vgscan -- WARNING: This program does not do a VGDA backup of your  
volume groups

After a reboot, we are then able to recreate the physical volume on  
sdc and restore the LVM metadata:
pvcreate /dev/sdc
vgcfgrestore -n vg1 /dev/sdc
vgchange -a y vg1

After doing this, everything looks fine and fsck reports that the  
filesystems are clean.

Please let me know if there's any other information we can provide.   
Any suggestions or help would be greatly appreciate.

Many thanks,
Devin

------
Devin Bougie
Laboratory for Elementary-Particle Physics
Cornell University
[log in to unmask]