On Thu, 27 Mar 2008, Stephen John Smoogen wrote:
> On Wed, Mar 26, 2008 at 7:34 PM, Michael Hannon <[log in to unmask]> wrote:
>> Greetings. We have a lately had a lot of trouble with relatively large
>> (order of 1TB) file systems mounted on RAID 5 or RAID 6 volumes. The
>> file systems in question are based on ext3.
>>
>> In a typical scenario, we have a drive go bad in a RAID array. We then
>> remove it from the array, if it isn't already, add a new hard drive
>> (i.e., by hand, not from a hot spare), and add it back to the RAID
>> array. The RAID operations are all done using mdadm.
>>
>> After the RAID array has completed its rebuild, we run fsck on the RAID
>> device. When we do that, fsck seems to run forever, i.e., for days at a
>> time, occasionally spitting out messages about files with recognizable
>> names, but never completing satisfactorily.
>>
>
> fsck of 1TB is going to take days due to the linear nature of it
Hmm, we successfully fsck'd ext3 filesystems 1.4 TB in size frequently a
couple of years ago, under 2.4 (back then, it was SuSE 8.2 + a vanilla
kernel). This took no more than a few hours (maybe 2,3, or 4). It was
hardware RAID, not too reliable (hence "frequently"), and not too fast (<
100 MB/s). A contemporary linux server with software RAID should complete
an fsck *much* faster, or something is wrong.
And I still wonder why fsck at at all just because a broken disk was
replaced in a redundant array?
--
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany
|