Subject: | |
From: | |
Reply To: | |
Date: | Fri, 27 Nov 2009 17:44:15 +0000 |
Content-Type: | TEXT/PLAIN |
Parts/Attachments: |
|
|
On Fri, 27 Nov 2009, Michael Bontenackels wrote:
> Hi Jon,
>
> we encountered the same problem on four of our 64-bit machines in Aachen. They
> are setup as homedir servers and quite loaded. On one machine we had to do the
> xfs_repair after access to the filesystem resulted in Input/Output errors.
>
> The XFS is on top of a sofware RAID-5 consisting of 4 HDDs. The filesystem is
> exported via NFS3 to our desktop cluster. Before the kernel update no problems
> occured. We decided to step back to the old kernel version with the XFS
> modules not included in the kernel rpms. Until now everything seems to be
> quiet again.
>
> We hope to find some time next week to test a similar machine with NFS4 and
> software RAID-5 with XFS on the newest 64-bit SL5 kernel.
You may want to try the test kernel mentioned near the end of
https://bugzilla.redhat.com/show_bug.cgi?id=512552 since that apparently
'fixes' the raid-5 code to report the inability to do a stripe read-ahead
in a way which the rh xfs module is happy with. At least that will also
have the current security fixes as well...
I don't know if this problem is visible because rh are using an older base
of the xfs code or if there was a workround in the version that SL were
building. Ideally the checking for the bio pages being valid should
probably be done in both places...
Anyway it seems like a plausable fix and has been in the mainline kernels
since some time in 2006...
So far I've updated one of our machines which had the problems, and seen
no problems yet, but the load may have gone down enough not to trigger it
(I only did the update at 4pm local time).
I'm about to update the other one if the guy runnning code on it doesn't
object too strongly...
-- Jon
|
|
|