LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

March 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS March 2008

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: fsck.ext3 on large file systems?
From:	"Bly, MJ (Martin)" <[log in to unmask]>
Reply To:	Bly, MJ (Martin)
Date:	Fri, 28 Mar 2008 08:22:43 +0000
Content-Type:	text/plain
Parts/Attachments:	text/plain (96 lines)

> -----Original Message-----
> From: Jon Peatfield [mailto:[log in to unmask]] 
> Sent: 28 March 2008 00:40
> To: Bly, MJ (Martin)
> Cc: [log in to unmask]
> Subject: RE: fsck.ext3 on large file systems?
> 
> ...
> At least one pass of the ext3 fsck involves checking every 
> inode table 
> entry so '-T largefile4' would help you since you will get 
> smaller numbers 
> of inodes.  As the inode tables are spread through the disk 
> it will read 
> the various chunks then seek off somewhere and read more...

We specify the number of Inodes too - roughly one per MB.  I guess it
helps us in that most of our data is HEP data with large (of the order
GB) file sizes.

> [ one of the planned features for ext4 is a way to safely 
> mark than an 
> entire inode-table lump is unused to save things like fsck 
> from having to 
> scan all those unusd blocks.  Of course doing so safely isn't quite 
> trivial and it causes problems with the current model of how 
> to choose the 
> locations for an inode for a new file... ]
> 
> > I'd be very suspicious of a HW RAID controller that took 
> 'days' to fsck
> > a file system unless the filesystem was already in serious 
> trouble, and
> > from bitter experience, fsck on a filesystem with holes it 
> it caused by
> > a bad raid controller interconnect (SCSI!) can do more 
> damage than good.
> 
> To give you one example at least one fsck pass needs to check 
> that every 
> inode in use has the right (link-count) number of directory entries 
> pointing at it.  The current ext3 fsck seems to do a good 
> impersonation of 
> a linear search though it's in-memory inode-state-table for 
> each directory 
> entry - at least for files with non-trivial link-counts.
> 
> A trivial analysis shows that such a set of checks would be 
> O(n^2) in the 
> number of files needing to be checked, not counting the performance 
> problems when the 'in-memory' tables get too big for ram...
> 
> [ In case I'm slandering the ext3 fsck people - I've not 
> actually checked 
> the ext3 fsck code really does anything as simple as a linear 
> search but 
> anything more complex will need to use more memory and ... ]
> 
> Last year we were trying to fsck a ~6.8TB ext3 fs which was about 70% 
> filled with hundereds of hard-link trees of home directories. 
>  So huge 
> numbers of inode entries (many/most files are small), and having a 
> link-counts of say 150.  Our poor server had only 8G ram and 
> the ext3 fsck 
> wanted rather a lot more.  Obviously in such a case it will be *slow*.

Indeed I recall some years ago needing to fsck some quite small
filesystems by modern standards and being putout by the time it took due
to do it - down entirely to users with massively linked structures.  Of
coure RAM counst were lower then, too.

> Of course that was after we built a version which didn't 
> simply go into an 
> infinite loop somewhere between 3 and 4TB into scanning through the 
> inode-table.
> 
> Now as you can guess any dump needs to do much the same kind 
> of work wrt 
> scanning the inodes, looking for hard-links, so you may not 
> be shocked to 
> discover that attempting a backup was rather slow too...

Been there, know what you mean.

 
> -- 
> Jon Peatfield,  Computer Officer,  DAMTP,  University of Cambridge
> Mail:  [log in to unmask]     Web:  http://www.damtp.cam.ac.uk/



	Martin.
-- 
Martin Bly
RAL Tier1 Fabric Team

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV