SCIENTIFIC-LINUX-USERS Archives

February 2005

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Steve Traylen <[log in to unmask]>
Reply To:
Steve Traylen <[log in to unmask]>
Date:
Wed, 9 Feb 2005 19:59:07 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (89 lines)
On Wed, Feb 09, 2005 at 07:02:46PM -0000 or thereabouts, Devin Bougie wrote:
> Hi All,
> 
> We have been struggling with what appears to be a bug in RH kernels.  
> This results in local disk I/O blocking all NFS I/O (and, as we show, 
> subsequent local I/O).  This appears easy to reproduce:
> ----
> 1.test access from the nfs server to the exported disk:
> [root@server]# time touch /mnt/disk/testlocal
> 2.test access from an nfs client to the nfs mounted disk:
> [root@client]# time touch /nfs/server/disk/clienttest
> 3.start local I/O on the nfs server:
> [root@server]# dd if=/dev/zero of=/mnt/disk/zero bs=1K count=10M
> 4.while the dd is running, test I/O on the nfs server to the exported 
> disk:
> [root@server]# time touch /mnt/disk/testlocal2
> 5.while the dd is running, access the exported disk from the nfs client:
> [root@client]# time touch /nfs/server/disk/clienttest2
> 6.one last time, while the dd is running, test access to the exported 
> disk from the nfs server:
> [root@server]# time touch /mnt/disk/testlocal3
> ----
> 
> After these steps, the last two 'touch' commands take anywhere from 30 
> seconds to 3 minutes to complete.
> 
> We have reproduced this using ext2, ext3, and jfs; with and without 
> lvm, scsi, and RAID; and with various RH kernels on RH9, RHEL3, and 
> FC3.  However, RH7.3 does not have this same problem.
> 
> We opened a bugzilla bug 
> (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139937), but have 
> not yet gotten any resolution.  For the time being (and on disk servers 
> where it is possible) we are restricting local logins to avoid running 
> into this.
> 
> Has anyone on this list experienced similar problems?  If so, what have 
> you done about it?

Hi Devin,

  The following info from Martin Bly.

<quote>
All,
                                                                                
This problem affects SL3 and RHEL nfs clients accessing nfs-exported filesystemshosted by most (all) nfs servers (linux, Solaris).
                                                                                
What happens is the client expects a reply from the server for some access
function: the server sends the response but the client doesn't receive it - the
kernel nfs layer looses it.  The client is hung and only a reboot will free it.
Other clients and the same client accessing other areas of the same file system
are still able to do so until they hit the locked file/directory.
                                                                                
SLAC spotted this with RHEL - and we and they suffer, as we both did in some
RH7.3 nfs clients.  SLAC escallated to RedHat who eventually provided a hot fix
- this is a binary kernel distribution for which they don't release the source.
I can't get access to the fix (but haven't asked - we *might* be able to claim
the fix via our single RHEL installation but it's doubtful.  It is expected the
fix will appear in RHEL3 Update 5 (not 4).  I don't think RHEL4 will suffer -
different kernel - BUT:
                                                                                
This is actually a problem with a patch to the 2.7 kernel back-ported to 2.6 andthen 2.4.  Redhat passed their fix to the NFS developers and it appears they aregoing to back out the patch rather than implement the fix.  I don't know which
patch it is (I'd back it out myself...)
                                                                                
So it is a client side problem - I'd not expect a fix for RH7.3 soon if ever.

Martin.
</quote>




> 
> Thanks in advance for any thoughts.
> 
> Devin
> 
> --------------------
> Devin Bougie
> Laboratory for Elementary-Particle Physics
> Computer Group
> [log in to unmask]

-- 
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/

ATOM RSS1 RSS2