SCIENTIFIC-LINUX-USERS Archives

April 2009

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Steven J. Yellin" <[log in to unmask]>
Reply To:
Steven J. Yellin
Date:
Fri, 10 Apr 2009 17:51:41 -0700
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (152 lines)
     After what seemed like good ideas for diagnosing the failure of two 
SL5.1 systems to export filesystems, I was still unable to correct the 
problem, even by restarting nfs and related services.  Suggestions stopped 
coming in, and local users got impatient for the problem to be corrected, 
even if it meant interrupting their work. So I rebooted the machines, and 
that seemed to fix the problem.  But the refusal to export may well recur 
-- it has happened before.  Ideas are still welcome on how to diagnose or 
correct the problem in the future without rebooting.

Steven Yellin

On Thu, 9 Apr 2009, Steven J. Yellin wrote:

>    My responses are down near the bottom of this email interspersed among 
> Jon's suggestions.
>
> Steven Yellin
>
> On Fri, 10 Apr 2009, Jon Peatfield wrote:
>
>> On Thu, 9 Apr 2009, Steven J. Yellin wrote:
>>
>>>    We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5. 
>>> I'll call them "A" and "B".  Each exports two file systems, and each runs 
>>> amd to mount whatever filesystems are requested from elsewhere. 
>>> Filesystems requested from SL3.0.9 systems mount without problem, and 
>>> filesystems requested from the SL5.1 systems also mounted without problem 
>>> until recently. But recently attempts to access from "A" a filesystem 
>>> exported by "B" or access from "B" a filesystem exported by "A" started 
>>> being met with a message "Input/output error".  Similar requests on an 
>>> SL3.0.9 system to view a SL5.1 exported one give "Permission denied". I'd 
>>> appreciate advice. I'll give some more information in the following, and 
>>> will be glad to add more depending on what others think might be useful.
>>>    The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to 
>>> a set of computers including "A" and "B".
>>>     Unless I've made a mistake, the firewall is open between "A" and "B".
>>>     In the following is what went into /var/log/messages on "A" and "B" at 
>>> the time of an attempt to look from "A" at a filesystem exported by "B", 
>>> with a perhaps ineffectual paranoid attempt to maintain a low profile by 
>>> replacing computer names and IP's with "A" and "B".
>>>    On "A" at the time of the "Input/output error", a set of lines went to 
>>> /var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and 
>>> otherwise containing
>>> 
>>> get_nfs_version: returning NFS(3,tcp) on host "B"
>>> get_nfs_version: returning NFS(3,udp) on host "B"
>>> Using NFS version 3, protocol tcp on host "B"
>>> initializing "B"'s pinger to 30 sec
>>> creating mountpoint directory '/.automount/"B"/root'
>>> file server "B", type nfs, state starts up
>>> Flushed /net/"B"; dependent on "B"
>>> recompute_portmap: NFS version 3 on "B"
>>> Using MOUNT version: 3
>>> amfs_host_mount: NFS version 3
>>> fetch_fhandle: NFS version 3
>>> mountd rpc failed: RPC: Can't decode result
>>> fetch_fhandle: NFS version 3
>>> mountd rpc failed: RPC: Can't decode result
>>> /net/"B": mount (amfs_cont): Input/output error
>>>
>>>   On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B" 
>>> mountd[9831]: " and otherwise contained:
>>> 
>>> authenticated mount request from "A":1023 for /data (/data)
>>> authenticated mount request from "A":1023 for /scratch (/scratch)
>>> 
>>> Steven Yellin
>> 
>> To narrow the search I'd suggest seeing if a manual nfs mount from A to B 
>> (and vise-versa) works.
>> 
>
>    On "B" the command 'mount "A":/scratch /mnt/tmp' failed with response
>
> mount: "A":/scratch failed, reason given by server: Permission denied.
>
> There were no messages at that time in /var/log/messages of "B", but in 
> /var/log/messages of "A" was
>
> Apr 9 19:01:58 "A" mountd[12500]: authenticated mount request from "B":777 
> for /scratch (/scratch)
>
> There was nothing in /var/log/secure at the time of a failed mount for either 
> "A" or "B".
>    Similarly for "A" <--> "B".
>
>> If the manual mount works then we need to look more closely at how amd is 
>> differing from the manual mount, and if it doesn't we have excluded amd 
>> from the equation and should look at the nfs setup...
>
>    I haven't modified /etc/sysconfig/nfs, which has only comment lines.
>
>> 
>> The next step (whether the manual mount works or not) may well be to check 
>> /var/log/secure for relevant (e.g. blocking) messages and run
>> 
>> rpcinfo -p
>> 
>> against A and B to see that all the expected sunrpc services are registered 
>> and what ports they are listening on (e.g. in case those are being blocked 
>> somewhere...)
>
>    On both "A" and "B" the command 'rpcinfo -p' showed portmapper, status, 
> ypbind, nlockmgr, rquotad, nfs, mountd and amd, all with proto tcp and udp. 
> I didn't see any port that looked familiar as one that could be blocked 
> somewhere, but maybe that's just because I don't know how to tell. From "A" 
> for all the tcp ports shown by 'rpcinfo -p "B"'
>
> telnet "B" <port>
>
> always made a connection.
>
>
>> 
>> btw from the error '...mountd...RPC: Can't decode result' it *sounds* like 
>> amd isn't liking (or can't underdstand) the reply it is getting from mountd 
>> - but that could be a problem with mountd or amd...
>> 
>> BTW do you have a spare box to try as a 3rd sl5 machine 'C'?
>> 
>
>    Yes, I could install SL5 on an old Pentium III machine now running 
> SL3.0.9.  I hope there's something simpler to do to diagnose the problem.
>
>    I remember some time ago having trouble exporting from "A" before "B" was 
> purchased, though instead of trying to diagnose the problem, I just rebooted 
> "A" and the problem went away for awhile.  The machines are more heavily used 
> now, so I don't feel quite as free to do that.  If a 3rd SL5 machine is 
> setup, I suspect it won't have any trouble exporting at first, any more than 
> "A" or "B" had.
>    Just in case you can use this information, here's what's in the 
> /etc/exports file of both "A" and "B", with other computer names also 
> replaced by something in quotes:
>
> /data "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync) 
> "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
> /scratch "X1"(rw,sync) "X2"(rw,sync) "X3"(rw,sync) "A"(rw,sync) "X4"(rw,sync) 
> "X5"(rw,sync) "B"(rw,sync) "X6"(rw,sync) "X7"(rw,sync)
>
>
>
>> -- 
>> /--------------------------------------------------------------------\
>> | "Computers are different from telephones.  Computers do not ring." |
>> |       -- A. Tanenbaum, "Computer Networks", p. 32                  |
>> ---------------------------------------------------------------------|
>> | Jon Peatfield, _Computer_ Officer, DAMTP,  University of Cambridge |
>> | Mail:  [log in to unmask]     Web:  http://www.damtp.cam.ac.uk/ |
>> \--------------------------------------------------------------------/
>> 
>

ATOM RSS1 RSS2