Subject: | |
From: | |
Reply To: | |
Date: | Fri, 29 Aug 2008 15:50:39 -0700 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On Fri, Aug 29, 2008 at 09:50:28AM +0100, Faye Gibbins wrote:
> We're running SL5 with this kernel:
>
> 2.6.18-92.1.10.el5 x86_64
>
> We're experiencing regular lockd failures on one of our nfs servers.
>
> One other with doesn't have the problem is still running 2.6.18-53.1.14.el5
>
> The problem is diagnosed by running:
>
> time flock ~/junk echo ok; rm ~/junk
>
> on an affected client of the server.
>
> It may be related to this report of a simular bug in the kernel discused
> here:
>
> https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996
>
> Does anyone know if the latest SL5 kernels are affected with this bug?
> Can something be done as the only fix we can find is to restart the box
> the NFS server lives on.
Greetings. FWIW, I was about to send a similar note to the SL list when
I read your note. I think the discussion to this point has nailed the
issue, but, for the record, here's what we see:
Aug 29 15:18:24 client-sys kernel: lockd: server xxxxxx not
responding, still trying
While on the server we see:
root 3569 0.0 0.0 0 0 ? D Aug27 0:00 [lockd]
The process is not a user-land process and evidently cannot be stopped
except by killing the kernel, i.e., rebooting.
The user-level ramification is that some, but maybe not all (?),
communication with the NFS-mounted /home file system is blocked. The
first complaints that we typically receive are that logged-in users
cannot run firefox, and people that aren't logged in can't get logged
in. Needless to say, those are high-profile issues for our user
community.
We found the following bug report (as mentioned by others):
https://bugzilla.redhat.com/show_bug.cgi?id=453094
And, yes, it WOULD be sweet if somebody could patch this.
- Mike
--
Michael Hannon mailto:[log in to unmask]
Dept. of Physics 530.752.4966
University of California 530.752.4717 FAX
Davis, CA 95616-8677
|
|
|