Steve et al,
Having now read through the description of the problem (which wasn't
available to me when I created the problem description quoted below!), I
don't think it's the same problem as reported by Devin.
The problem we see is absolutely fatal and there is no way out -
processes do not complete and there is no I/O blocking as described...
That said, I've seen LVM implicated in some NFS related fatal lockups of
a different variety.
Martin.
> -----Original Message-----
> From: [log in to unmask]
> [mailto:[log in to unmask]] On
> Behalf Of Steve Traylen
> Sent: 09 February 2005 19:59
> To: Devin Bougie
> Cc: [log in to unmask]
> Subject: Re: NFS server I/O blocking
>
>
> On Wed, Feb 09, 2005 at 07:02:46PM -0000 or thereabouts,
> Devin Bougie wrote:
> > Hi All,
> >
> > We have been struggling with what appears to be a bug in RH
> kernels.
> > This results in local disk I/O blocking all NFS I/O (and,
> as we show,
> > subsequent local I/O). This appears easy to reproduce:
> > ----
> > 1.test access from the nfs server to the exported disk:
> > [root@server]# time touch /mnt/disk/testlocal
> > 2.test access from an nfs client to the nfs mounted disk:
> > [root@client]# time touch /nfs/server/disk/clienttest
> > 3.start local I/O on the nfs server:
> > [root@server]# dd if=/dev/zero of=/mnt/disk/zero bs=1K count=10M
> > 4.while the dd is running, test I/O on the nfs server to
> the exported
> > disk:
> > [root@server]# time touch /mnt/disk/testlocal2
> > 5.while the dd is running, access the exported disk from
> the nfs client:
> > [root@client]# time touch /nfs/server/disk/clienttest2
> > 6.one last time, while the dd is running, test access to
> the exported
> > disk from the nfs server:
> > [root@server]# time touch /mnt/disk/testlocal3
> > ----
> >
> > After these steps, the last two 'touch' commands take
> anywhere from 30
> > seconds to 3 minutes to complete.
> >
> > We have reproduced this using ext2, ext3, and jfs; with and without
> > lvm, scsi, and RAID; and with various RH kernels on RH9, RHEL3, and
> > FC3. However, RH7.3 does not have this same problem.
> >
> > We opened a bugzilla bug
> >
> (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139937),
> but have
> > not yet gotten any resolution. For the time being (and on
> disk servers
> > where it is possible) we are restricting local logins to
> avoid running
> > into this.
> >
> > Has anyone on this list experienced similar problems? If
> so, what have
> > you done about it?
>
> Hi Devin,
>
> The following info from Martin Bly.
>
> <quote>
> All,
>
>
> This problem affects SL3 and RHEL nfs clients accessing
> nfs-exported filesystemshosted by most (all) nfs servers
> (linux, Solaris).
>
>
> What happens is the client expects a reply from the server
> for some access
> function: the server sends the response but the client
> doesn't receive it - the
> kernel nfs layer looses it. The client is hung and only a
> reboot will free it.
> Other clients and the same client accessing other areas of
> the same file system
> are still able to do so until they hit the locked file/directory.
>
>
> SLAC spotted this with RHEL - and we and they suffer, as we
> both did in some
> RH7.3 nfs clients. SLAC escallated to RedHat who eventually
> provided a hot fix
> - this is a binary kernel distribution for which they don't
> release the source.
> I can't get access to the fix (but haven't asked - we *might*
> be able to claim
> the fix via our single RHEL installation but it's doubtful.
> It is expected the
> fix will appear in RHEL3 Update 5 (not 4). I don't think
> RHEL4 will suffer -
> different kernel - BUT:
>
>
> This is actually a problem with a patch to the 2.7 kernel
> back-ported to 2.6 andthen 2.4. Redhat passed their fix to
> the NFS developers and it appears they aregoing to back out
> the patch rather than implement the fix. I don't know which
> patch it is (I'd back it out myself...)
>
>
> So it is a client side problem - I'd not expect a fix for
> RH7.3 soon if ever.
>
> Martin.
> </quote>
>
>
>
>
> >
> > Thanks in advance for any thoughts.
> >
> > Devin
> >
> > --------------------
> > Devin Bougie
> > Laboratory for Elementary-Particle Physics
> > Computer Group
> > [log in to unmask]
>
> --
> Steve Traylen
> [log in to unmask]
> http://www.gridpp.ac.uk/
>
|