SCIENTIFIC-LINUX-USERS Archives

November 2009

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Troy Dawson <[log in to unmask]>
Reply To:
Troy Dawson <[log in to unmask]>
Date:
Mon, 30 Nov 2009 15:22:25 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (54 lines)
I have compiled the fixed kernel and put it into SL5 x86_64 testing.

I haven't compiled any kernel modules for it, just the kernel.

Troy

Jon Peatfield wrote:
> On Fri, 27 Nov 2009, Michael Bontenackels wrote:
> 
>> Hi Jon,
>>
>> we encountered the same problem on four of our 64-bit machines in Aachen. They
>> are setup as homedir servers and quite loaded. On one machine we had to do the
>> xfs_repair after access to the filesystem resulted in Input/Output errors.
>>
>> The XFS is on top of a sofware RAID-5 consisting of 4 HDDs. The filesystem is
>> exported via NFS3 to our desktop cluster. Before the kernel update no problems
>> occured. We decided to step back to the old kernel version with the XFS
>> modules not included in the kernel rpms. Until now everything seems to be
>> quiet again.
>>
>> We hope to find some time next week to test a similar machine with NFS4 and
>> software RAID-5 with XFS on the newest 64-bit SL5 kernel.
> 
> You may want to try the test kernel mentioned near the end of 
> https://bugzilla.redhat.com/show_bug.cgi?id=512552 since that apparently 
> 'fixes' the raid-5 code to report the inability to do a stripe read-ahead 
> in a way which the rh xfs module is happy with.  At least that will also 
> have the current security fixes as well...
> 
> I don't know if this problem is visible because rh are using an older base 
> of the xfs code or if there was a workround in the version that SL were 
> building.  Ideally the checking for the bio pages being valid should 
> probably be done in both places...
> 
> Anyway it seems like a plausable fix and has been in the mainline kernels 
> since some time in 2006...
> 
> So far I've updated one of our machines which had the problems, and seen 
> no problems yet, but the load may have gone down enough not to trigger it 
> (I only did the update at 4pm local time).
> 
> I'm about to update the other one if the guy runnning code on it doesn't 
> object too strongly...
> 
>   -- Jon


-- 
__________________________________________________
Troy Dawson  [log in to unmask]  (630)840-6468
Fermilab  ComputingDivision/LSCS/CSI/USS Group
__________________________________________________

ATOM RSS1 RSS2