My responses are down near the bottom of this email interspersed among
Jon's suggestions.
Steven Yellin
On Fri, 10 Apr 2009, Jon Peatfield wrote:
> On Thu, 9 Apr 2009, Steven J. Yellin wrote:
>
>> We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5. I'll
>> call them "A" and "B". Each exports two file systems, and each runs amd to
>> mount whatever filesystems are requested from elsewhere. Filesystems
>> requested from SL3.0.9 systems mount without problem, and filesystems
>> requested from the SL5.1 systems also mounted without problem until
>> recently. But recently attempts to access from "A" a filesystem exported by
>> "B" or access from "B" a filesystem exported by "A" started being met with
>> a message "Input/output error". Similar requests on an SL3.0.9 system to
>> view a SL5.1 exported one give "Permission denied". I'd appreciate advice.
>> I'll give some more information in the following, and will be glad to add
>> more depending on what others think might be useful.
>> The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to
>> a set of computers including "A" and "B".
>> Unless I've made a mistake, the firewall is open between "A" and "B".
>> In the following is what went into /var/log/messages on "A" and "B" at
>> the time of an attempt to look from "A" at a filesystem exported by "B",
>> with a perhaps ineffectual paranoid attempt to maintain a low profile by
>> replacing computer names and IP's with "A" and "B".
>> On "A" at the time of the "Input/output error", a set of lines went to
>> /var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and
>> otherwise containing
>>
>> get_nfs_version: returning NFS(3,tcp) on host "B"
>> get_nfs_version: returning NFS(3,udp) on host "B"
>> Using NFS version 3, protocol tcp on host "B"
>> initializing "B"'s pinger to 30 sec
>> creating mountpoint directory '/.automount/"B"/root'
>> file server "B", type nfs, state starts up
>> Flushed /net/"B"; dependent on "B"
>> recompute_portmap: NFS version 3 on "B"
>> Using MOUNT version: 3
>> amfs_host_mount: NFS version 3
>> fetch_fhandle: NFS version 3
>> mountd rpc failed: RPC: Can't decode result
>> fetch_fhandle: NFS version 3
>> mountd rpc failed: RPC: Can't decode result
>> /net/"B": mount (amfs_cont): Input/output error
>>
>> On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B"
>> mountd[9831]: " and otherwise contained:
>>
>> authenticated mount request from "A":1023 for /data (/data)
>> authenticated mount request from "A":1023 for /scratch (/scratch)
>>
>> Steven Yellin
>
> To narrow the search I'd suggest seeing if a manual nfs mount from A to B
> (and vise-versa) works.
>
On "B" the command 'mount "A":/scratch /mnt/tmp' failed with response
mount: "A":/scratch failed, reason given by server: Permission denied.
There were no messages at that time in /var/log/messages of "B", but in
/var/log/messages of "A" was
Apr 9 19:01:58 "A" mountd[12500]: authenticated mount request from "B":777 for /scratch (/scratch)
There was nothing in /var/log/secure at the time of a failed mount for
either "A" or "B".
Similarly for "A" <--> "B".
> If the manual mount works then we need to look more closely at how amd is
> differing from the manual mount, and if it doesn't we have excluded amd from
> the equation and should look at the nfs setup...
I haven't modified /etc/sysconfig/nfs, which has only comment lines.
>
> The next step (whether the manual mount works or not) may well be to check
> /var/log/secure for relevant (e.g. blocking) messages and run
>
> rpcinfo -p
>
> against A and B to see that all the expected sunrpc services are registered
> and what ports they are listening on (e.g. in case those are being blocked
> somewhere...)
On both "A" and "B" the command 'rpcinfo -p' showed portmapper,
status, ypbind, nlockmgr, rquotad, nfs, mountd and amd, all with proto tcp
and udp. I didn't see any port that looked familiar as one that could be
blocked somewhere, but maybe that's just because I don't know how to tell.
From "A" for all the tcp ports shown by 'rpcinfo -p "B"'
telnet "B" <port>
always made a connection.
>
> btw from the error '...mountd...RPC: Can't decode result' it *sounds* like
> amd isn't liking (or can't underdstand) the reply it is getting from mountd -
> but that could be a problem with mountd or amd...
>
> BTW do you have a spare box to try as a 3rd sl5 machine 'C'?
>
Yes, I could install SL5 on an old Pentium III machine now running
SL3.0.9. I hope there's something simpler to do to diagnose the problem.
I remember some time ago having trouble exporting from "A" before "B"
was purchased, though instead of trying to diagnose the problem, I just
rebooted "A" and the problem went away for awhile. The machines are more
heavily used now, so I don't feel quite as free to do that. If a 3rd SL5
machine is setup, I suspect it won't have any trouble exporting at first,
any more than "A" or "B" had.
Just in case you can use this information, here's what's in the
/etc/exports file of both "A" and "B", with other computer names also
replaced by something in quotes:
/data "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync)
"X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
/scratch "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync)
"X4"(rw,sync) "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
> --
> /--------------------------------------------------------------------\
> | "Computers are different from telephones. Computers do not ring." |
> | -- A. Tanenbaum, "Computer Networks", p. 32 |
> ---------------------------------------------------------------------|
> | Jon Peatfield, _Computer_ Officer, DAMTP, University of Cambridge |
> | Mail: [log in to unmask] Web: http://www.damtp.cam.ac.uk/ |
> \--------------------------------------------------------------------/
>
|