SCIENTIFIC-LINUX-USERS Archives

March 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Michael Hannon <[log in to unmask]>
Reply To:
Michael Hannon <[log in to unmask]>
Date:
Wed, 26 Mar 2008 18:34:38 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (38 lines)
Greetings.  We have a lately had a lot of trouble with relatively large
(order of 1TB) file systems mounted on RAID 5 or RAID 6 volumes.  The
file systems in question are based on ext3.

In a typical scenario, we have a drive go bad in a RAID array.  We then
remove it from the array, if it isn't already, add a new hard drive
(i.e., by hand, not from a hot spare), and add it back to the RAID
array.  The RAID operations are all done using mdadm.

After the RAID array has completed its rebuild, we run fsck on the RAID
device.  When we do that, fsck seems to run forever, i.e., for days at a
time, occasionally spitting out messages about files with recognizable
names, but never completing satisfactorily.

The systems in question are typically running SL 4.x.  We've read that
the version of fsck that is standard in SL 4 has some known bugs,
especially wrt large file systems.

Hence, we've attempted to repeat the exercise with fsck.ext3 taken from
the Fedora 8 distribution.  This gives us improved, but still not
satisfactory, results.

We usually end up just punting on the fsck: we make a new file system
and restore from backups.

Maybe I'm just missing something obvious here.  I'd like to know if
you've had similar experiences and/or if you have a better way to do all
this.

Thanks.

					- Mike
-- 
Michael Hannon            mailto:[log in to unmask]
Dept. of Physics          530.752.4966
University of California  530.752.4717 FAX
Davis, CA 95616-8677

ATOM RSS1 RSS2