LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

March 2006

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS March 2006

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: When a cluster mounts an NFS server
From:	"Steven J. Yellin" <[log in to unmask]>
Reply To:	Steven J. Yellin
Date:	Sun, 26 Mar 2006 10:12:16 -0800
Content-Type:	TEXT/PLAIN
Parts/Attachments:	TEXT/PLAIN (55 lines)

    Regarding the question of making the NFS clients recover more
gracefully from a failure of their server:
    As automounter I use amd with its default parameters, and have
found that stale mounts can be made to disappear with
  service netfs restart
after which it is necessary to do
  service amd restart

Steven Yellin

On Sun, 26 Mar 2006, John Haggerty wrote:

> What is the best way for a moderately large (I count about 180 machines)
> cluster of SL3.0.5 machines to NFS mount a fileserver which provides
> home directories, essential executables, common configuration files,
> etc. (i.e., not large amounts of data)?
>
> The NFS server in question has been reasonably reliable (it is now a
> commercial NAS which internally is running a Debian variant,
> http://www.open-e.com/, but we have had homemade SL and Gentoo NFS
> servers in that position, and the same question comes up), but still,
> there are failures at what I crudely estimate as a MTBF of about 50-100
> days in which the NFS server fails so badly that it has to be rebooted
> or power cycled.  That may be an acceptable rate of failures, but when
> it happens, we almost end up rebooting most of the 180 machines in the
> cluster.  Again, we have tools that help with that, but it's certainly
> not a one-button operation.
>
> So the question is, do we try to make the NFS server an order of
> magnitude more reliable (how do we do that?) or do we try to make the
> NFS clients recover more gracefully from a failure of their server (how
> do we do that?)?
>
> The clients mount filesystems on the server with the automounter with
> configuration files supplied by a (Sun) NIS server--the mount directives
> are like:
>
> -nfsvers=3,hard,wsize=32768,rsize=32768
> phnxsb0.phenix.bnl.gov:/share/software
>
> so the mounts are all hard and nointr (the default).  The NFS FAQ scare
> one away from soft mounts; maybe intr would be better, but I have a
> feeling we'd end up rebooting anyway rather than search for processes
> that can be killed.  What would really be best would be if everything
> hangs while the NFS server is down, then the stale mounts disappear on
> the reboot of the server, the processes die, we restart them, and we're
> up and running again.  Is there any hope of accomplishing that?
>
> --
> John Haggerty
> email: [log in to unmask]
> voice/fax: 631 344 2286/4592
> http://www.phenix.bnl.gov/~haggerty
>

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV