The discussion of distributed filesystems inspired me to start a new
thread on NFS. Like probably everyone on this list, we use NFS to share
files (home directories, whatever) among machines. It pretty much works
ok... except for the occasional problem which appears to be related to
NFS. Does anyone else have low level NFS problems, or am I the only one?
In more detail, here's the kind of thing I've seen. We use NIS to
distribute autofs configuration files, so everyone gets the same
configuration, and then the automounter mounts the necessary disks from
a fileserver. Our machines are mostly SL 3.0.2; we're about to upgrade
to 3.0.5 (I'd rather 4.2, but we want to be consistent with the much
larger computer center down the street).
For quite a few years, the NFS fileserver was just another one of our
machines; last summer, I thought it would be nice to upgrade to
something more appliance-like, and so I bought quite a nice Linux based
RAID NAS device that runs Linux from flash memory, has redundant power
supplies etc.
Everything works reasonably well together, in a cluster of around 100
machines. However, some machines, when they are running jobs and under
heavy load, become unresponsive. It can take a few days or a week, but
you basically can't prevent it. We think this is because of an NFS
mounted volume in the PATH. If you get in as root, you can ping the
server, rpcinfo the server, even mount other volumes on the server.
Everything seems ok, until you cd to the wrong NFS mounted directory,
then you're hung until you push the button.
Getting back to the server, part of the reason we went to a commercial
server was that we saw similar, or in some cases worse behavior, with
our own server. I used to think it was a network problem because of
messages like this scattered around various consoles:
> messages.3:Nov 26 14:28:22 phnxbox0 kernel: nfs: server phnxsb0.phenix.bnl.gov not responding, still trying
> messages.3:Nov 26 14:28:22 phnxbox0 kernel: nfs: server phnxsb0.phenix.bnl.gov OK
but now I'm not so sure. Through it all, the server (ours or the
store-bought one) thinks everything is going well.
So I think there is some problem with the NFS clients, or maybe there is
some incompatibility with the server. Or is there a magic configuration
that works better? The arguments to the automounter look like this:
> -nfsvers=3,hard,wsize=32768,rsize=32768 phnxsb0.phenix.bnl.gov:/share/software
--
John Haggerty
email: [log in to unmask]
voice/fax: 631 344 2286/4592
http://www.phenix.bnl.gov/~haggerty
|