Subject: | |
From: | |
Reply To: | Bly, MJ (Martin) |
Date: | Fri, 28 Mar 2008 08:22:43 +0000 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
> -----Original Message-----
> From: Jon Peatfield [mailto:[log in to unmask]]
> Sent: 28 March 2008 00:40
> To: Bly, MJ (Martin)
> Cc: [log in to unmask]
> Subject: RE: fsck.ext3 on large file systems?
>
> ...
> At least one pass of the ext3 fsck involves checking every
> inode table
> entry so '-T largefile4' would help you since you will get
> smaller numbers
> of inodes. As the inode tables are spread through the disk
> it will read
> the various chunks then seek off somewhere and read more...
We specify the number of Inodes too - roughly one per MB. I guess it
helps us in that most of our data is HEP data with large (of the order
GB) file sizes.
> [ one of the planned features for ext4 is a way to safely
> mark than an
> entire inode-table lump is unused to save things like fsck
> from having to
> scan all those unusd blocks. Of course doing so safely isn't quite
> trivial and it causes problems with the current model of how
> to choose the
> locations for an inode for a new file... ]
>
> > I'd be very suspicious of a HW RAID controller that took
> 'days' to fsck
> > a file system unless the filesystem was already in serious
> trouble, and
> > from bitter experience, fsck on a filesystem with holes it
> it caused by
> > a bad raid controller interconnect (SCSI!) can do more
> damage than good.
>
> To give you one example at least one fsck pass needs to check
> that every
> inode in use has the right (link-count) number of directory entries
> pointing at it. The current ext3 fsck seems to do a good
> impersonation of
> a linear search though it's in-memory inode-state-table for
> each directory
> entry - at least for files with non-trivial link-counts.
>
> A trivial analysis shows that such a set of checks would be
> O(n^2) in the
> number of files needing to be checked, not counting the performance
> problems when the 'in-memory' tables get too big for ram...
>
> [ In case I'm slandering the ext3 fsck people - I've not
> actually checked
> the ext3 fsck code really does anything as simple as a linear
> search but
> anything more complex will need to use more memory and ... ]
>
> Last year we were trying to fsck a ~6.8TB ext3 fs which was about 70%
> filled with hundereds of hard-link trees of home directories.
> So huge
> numbers of inode entries (many/most files are small), and having a
> link-counts of say 150. Our poor server had only 8G ram and
> the ext3 fsck
> wanted rather a lot more. Obviously in such a case it will be *slow*.
Indeed I recall some years ago needing to fsck some quite small
filesystems by modern standards and being putout by the time it took due
to do it - down entirely to users with massively linked structures. Of
coure RAM counst were lower then, too.
> Of course that was after we built a version which didn't
> simply go into an
> infinite loop somewhere between 3 and 4TB into scanning through the
> inode-table.
>
> Now as you can guess any dump needs to do much the same kind
> of work wrt
> scanning the inodes, looking for hard-links, so you may not
> be shocked to
> discover that attempting a backup was rather slow too...
Been there, know what you mean.
> --
> Jon Peatfield, Computer Officer, DAMTP, University of Cambridge
> Mail: [log in to unmask] Web: http://www.damtp.cam.ac.uk/
Martin.
--
Martin Bly
RAL Tier1 Fabric Team
|
|
|