SCIENTIFIC-LINUX-USERS Archives

December 2005

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
John Haggerty <[log in to unmask]>
Reply To:
John Haggerty <[log in to unmask]>
Date:
Wed, 14 Dec 2005 14:20:33 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (51 lines)
The discussion of distributed filesystems inspired me to start a new 
thread on NFS.  Like probably everyone on this list, we use NFS to share 
files (home directories, whatever) among machines.  It pretty much works 
ok... except for the occasional problem which appears to be related to 
NFS.  Does anyone else have low level NFS problems, or am I the only one?

In more detail, here's the kind of thing I've seen.  We use NIS to 
distribute autofs configuration files, so everyone gets the same 
configuration, and then the automounter mounts the necessary disks from 
a fileserver.  Our machines are mostly SL 3.0.2; we're about to upgrade 
to 3.0.5 (I'd rather 4.2, but we want to be consistent with the much 
larger computer center down the street).

For quite a few years, the NFS fileserver was just another one of our 
machines; last summer, I thought it would be nice to upgrade to 
something more appliance-like, and so I bought quite a nice Linux based 
RAID NAS device that runs Linux from flash memory, has redundant power 
supplies etc.

Everything works reasonably well together, in a cluster of around 100 
machines.  However, some machines, when they are running jobs and under 
heavy load, become unresponsive.  It can take a few days or a week, but 
you basically can't prevent it.  We think this is because of an NFS 
mounted volume in the PATH.  If you get in as root, you can ping the 
server, rpcinfo the server, even mount other volumes on the server. 
Everything seems ok, until you cd to the wrong NFS mounted directory, 
then you're hung until you push the button.

Getting back to the server, part of the reason we went to a commercial 
server was that we saw similar, or in some cases worse behavior, with 
our own server.  I used to think it was a network problem because of 
messages like this scattered around various consoles:

> messages.3:Nov 26 14:28:22 phnxbox0 kernel: nfs: server phnxsb0.phenix.bnl.gov not responding, still trying
> messages.3:Nov 26 14:28:22 phnxbox0 kernel: nfs: server phnxsb0.phenix.bnl.gov OK

but now I'm not so sure.  Through it all, the server (ours or the 
store-bought one) thinks everything is going well.

So I think there is some problem with the NFS clients, or maybe there is 
some incompatibility with the server.  Or is there a magic configuration 
that works better?  The arguments to the automounter look like this:

> -nfsvers=3,hard,wsize=32768,rsize=32768  phnxsb0.phenix.bnl.gov:/share/software

-- 
John Haggerty
email: [log in to unmask]
voice/fax: 631 344 2286/4592
http://www.phenix.bnl.gov/~haggerty

ATOM RSS1 RSS2