Miles, at FZK we run a cluster of 1000 machines with SL 3.05 clients and 20 RH 4.2 servers with: transport: tcp timeo: 600 retrans: 2 nfsd: 250 autofs timeout: 1800 and are pretty happy with it. On average there are 4 to 5 mounts on a client. Are you loosing packets on the server side? Is the re-assembly counter increasing? (netstat -s). J Miles O'Neal wrote: > Ever since I've gotten here (RH 7.1 days) > we've had NFS issues-- failures to mount, > failures to unmount, etc. > > We use NIS to distribute group, passwd, > netrgoup and automount files, and automount > almost everything. Tier one storage is on > NetApp filers, tier 2 is a variety of rackmount > PCs using RAID 5. These run a variety of > Linux OSes, including RH7.1, RH9, SL304 and > SL40 > > Clients are running 304 with and without the > SDR kernel, 305 and 307. We've tried the stock > 304 nfs-utils and the previous rev. All client > desktops and compute servers see occasional NFS > problems. The infrastructire boxes don't seem > to have these problems, but they are lightly > loaded. > > For a while we saw steady improvements. Then, > based on a paper all over the web on using > Linux with NetApp, I modified the following > NFS mount options: > > OPTION OLD NEW > timeo 7 600 > retrans 3 2 > > Things got much, much worse. We had many, > many more failures to mount, and more whines > about unmount problems and locks. > > I then changed these to > > timeo=10,retrans=5 > > and at the same time bumped up the automounter > timeout to unmount from 60 seconds to 300 seconds. > Things got better, but we still have mount failures. > Some of these have rather severe impacts on users. > (Failures in portions of distributed jobs can be > expensive.) > > We have changed from having all filers multihomed > with links to each subnet to all filers on their > own subnet, with one legacy link from one NetApp > to one subnet until some ancient processes can be > revamped and restarted. > > Any insights, recomnmendations, and/or experiences > would be appreciated. (We *may* be able to move > to SL4 on client systems, but it's not yet clear > whether we would lose application vendor support. > But if that proved helpful for others, I would > like to know that as well.) > > Thanks, > Miles