LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

March 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS March 2008

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: fsck.ext3 on large file systems?
From:	Jon Peatfield <[log in to unmask]>
Reply To:	Jon Peatfield <[log in to unmask]>
Date:	Fri, 28 Mar 2008 00:39:42 +0000
Content-Type:	TEXT/PLAIN
Parts/Attachments:	TEXT/PLAIN (57 lines)

On Thu, 27 Mar 2008, Bly, MJ (Martin) wrote:

<snip>
> On our hardware RAID arrays (3ware, Areca, Infortrend) with many (12/14)
> SATA disks, 500/750GB each, we fsck 2TB+ ext3 filesystems (as
> infrequently as possible!) and it takes ~2 hours each.  We have some
> 5.5TB arrays that take less than three hours.  Note that these are
> created with '-T largefile4 -O dir_index' among other options.

At least one pass of the ext3 fsck involves checking every inode table 
entry so '-T largefile4' would help you since you will get smaller numbers 
of inodes.  As the inode tables are spread through the disk it will read 
the various chunks then seek off somewhere and read more...

[ one of the planned features for ext4 is a way to safely mark than an 
entire inode-table lump is unused to save things like fsck from having to 
scan all those unusd blocks.  Of course doing so safely isn't quite 
trivial and it causes problems with the current model of how to choose the 
locations for an inode for a new file... ]

> I'd be very suspicious of a HW RAID controller that took 'days' to fsck
> a file system unless the filesystem was already in serious trouble, and
> from bitter experience, fsck on a filesystem with holes it it caused by
> a bad raid controller interconnect (SCSI!) can do more damage than good.

To give you one example at least one fsck pass needs to check that every 
inode in use has the right (link-count) number of directory entries 
pointing at it.  The current ext3 fsck seems to do a good impersonation of 
a linear search though it's in-memory inode-state-table for each directory 
entry - at least for files with non-trivial link-counts.

A trivial analysis shows that such a set of checks would be O(n^2) in the 
number of files needing to be checked, not counting the performance 
problems when the 'in-memory' tables get too big for ram...

[ In case I'm slandering the ext3 fsck people - I've not actually checked 
the ext3 fsck code really does anything as simple as a linear search but 
anything more complex will need to use more memory and ... ]

Last year we were trying to fsck a ~6.8TB ext3 fs which was about 70% 
filled with hundereds of hard-link trees of home directories.  So huge 
numbers of inode entries (many/most files are small), and having a 
link-counts of say 150.  Our poor server had only 8G ram and the ext3 fsck 
wanted rather a lot more.  Obviously in such a case it will be *slow*.

Of course that was after we built a version which didn't simply go into an 
infinite loop somewhere between 3 and 4TB into scanning through the 
inode-table.

Now as you can guess any dump needs to do much the same kind of work wrt 
scanning the inodes, looking for hard-links, so you may not be shocked to 
discover that attempting a backup was rather slow too...

-- 
Jon Peatfield,  Computer Officer,  DAMTP,  University of Cambridge
Mail:  [log in to unmask]     Web:  http://www.damtp.cam.ac.uk/

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV