SCIENTIFIC-LINUX-USERS Archives

December 2005

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Bly, MJ (Martin)" <[log in to unmask]>
Reply To:
Bly, MJ (Martin)
Date:
Thu, 15 Dec 2005 00:00:35 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (52 lines)
John et al, 

The symptoms you describe sound a bit like the NFS client hang problems with some SL (RHEL) kernels in the range before 2.4.21-32.x - possibly as early as 2.4.21-18 or so.  The client processes doing the NFS access hang solid due to a 'lost' interaction between server and client - the only solution is a client reboot.  All variants of SL/RHEL show this for the range of kernels above, and we also saw it for RH 7.3 clients for kernels in a certain range.  

In fact the more I read your post, the more it is this problem...  We banged our heads against this for months on and off as loads changed and the problem came and went.  It is definitely load related.

There was a contrib kernel we put out with a patch that backed out the patch that causes the problem.  I think it had a version number of 2.4.21-27.0.something.ELSDR.  

We believe from anecdotal evidence that RH tried the patch that causes the problem in at least two kernel ranges.  We think they may have given up on the RH 7.3 tests but tried again with RHEL 3 - it's always possible the same 'patch' was used on the RH 8 kernel series. 

Anyway, they fixed it by taking out the patch at 2.4.21-32.EL.

And there's a gotcha in the stock autofs for 3.0.5 if you use the & substitution syntax in your maps:

* < mount options removed > &.stage.rl.ac.uk:/stage/&

If you mount a non-existant file system /stage/fred the the machine panics and dies.  The autofs for 3.0.4 works (I think) as do the ones for 3.0.3, 3.0.6.

Martin
RAL Tier1 Systems Team.




> -----Original Message-----
> From: [log in to unmask] 
> [mailto:[log in to unmask]] On 
> Behalf Of Miles O'Neal
> Sent: Wednesday, December 14, 2005 7:42 PM
> To: Scientific Linux Users
> Subject: Re: NFS... problems? or the perfect distributed file system?
> 
> John Haggerty said...
> |
> |The discussion of distributed filesystems inspired me to start a new 
> |thread on NFS.  Like probably everyone on this list, we use NFS to 
> |share files (home directories, whatever) among machines.  It pretty 
> |much works ok... except for the occasional problem which 
> appears to be 
> |related to NFS.  Does anyone else have low level NFS 
> problems, or am I the only one?
> 
> We see the same problem.  Sometimes a bunch of systems will 
> see it, other times it's just one or two.  Some of the 
> compute farm systems that have been up for close to a year 
> have screens full of the messages when you plug a console in.
> 
> We saw it with RH8, we see it with SL304.
> 
> -Miles
> 

ATOM RSS1 RSS2