SCIENTIFIC-LINUX-USERS Archives

November 2009

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jon Peatfield <[log in to unmask]>
Reply To:
Jon Peatfield <[log in to unmask]>
Date:
Fri, 27 Nov 2009 17:44:15 +0000
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (40 lines)
On Fri, 27 Nov 2009, Michael Bontenackels wrote:

> Hi Jon,
>
> we encountered the same problem on four of our 64-bit machines in Aachen. They
> are setup as homedir servers and quite loaded. On one machine we had to do the
> xfs_repair after access to the filesystem resulted in Input/Output errors.
>
> The XFS is on top of a sofware RAID-5 consisting of 4 HDDs. The filesystem is
> exported via NFS3 to our desktop cluster. Before the kernel update no problems
> occured. We decided to step back to the old kernel version with the XFS
> modules not included in the kernel rpms. Until now everything seems to be
> quiet again.
>
> We hope to find some time next week to test a similar machine with NFS4 and
> software RAID-5 with XFS on the newest 64-bit SL5 kernel.

You may want to try the test kernel mentioned near the end of 
https://bugzilla.redhat.com/show_bug.cgi?id=512552 since that apparently 
'fixes' the raid-5 code to report the inability to do a stripe read-ahead 
in a way which the rh xfs module is happy with.  At least that will also 
have the current security fixes as well...

I don't know if this problem is visible because rh are using an older base 
of the xfs code or if there was a workround in the version that SL were 
building.  Ideally the checking for the bio pages being valid should 
probably be done in both places...

Anyway it seems like a plausable fix and has been in the mainline kernels 
since some time in 2006...

So far I've updated one of our machines which had the problems, and seen 
no problems yet, but the load may have gone down enough not to trigger it 
(I only did the update at 4pm local time).

I'm about to update the other one if the guy runnning code on it doesn't 
object too strongly...

  -- Jon

ATOM RSS1 RSS2