SCIENTIFIC-LINUX-DEVEL Archives

June 2016

SCIENTIFIC-LINUX-DEVEL@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show HTML Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Patrick J. LoPresti" <[log in to unmask]>
Reply To:
Patrick J. LoPresti
Date:
Fri, 3 Jun 2016 14:29:34 -0700
Content-Type:
multipart/alternative
Parts/Attachments:
text/plain (2011 bytes) , text/html (2803 bytes)
On Fri, Jun 3, 2016 at 9:38 AM, Alec T. Habig <[log in to unmask]>
wrote:

>
> Hmm.  I've (independently from the original poster) got the 8k sizes:
>

Try running "mount -o nfsvers=3 myserver:/mydir /mnt" by hand between two
SL6.7 systems. I just did, and it gives me proto=tcp and rsize/wsize of
1024k.

I vaguely recall seeing firewall oddities cause NFS mounts to fall back to
UDP with an 8k rsize/wsize. So check the "proto=tcp" carefully if you see
something else.


> with the network set up for jumbo frames (MTU=9000) the nfs server then
> chunks out network packets that get through the switches with minimal
> overhead.  Making this change from the default (1k, MTU=1500) made a
> huge throughput difference at the time we implemented it: which was a
> number of years ago, so certainly the world has changed since then.
>
> How do 1MB sized nfs chunks interact with the networking?


The same way any other application over TCP interacts with the
networking... TCP will break the (large) NFS requests and responses into
MTU-sized segments on send and then reassemble them on receive. The reason
to let TCP do this (instead of trying to match the application payloads to
the MTU) is that modern systems offload both the breakup and the reassembly
to the network hardware. Try searching for "TCP Segmentation Offload" and
"Large Receive Offload" for details. Note that modern Linux uses different
names for these things and I can't keep up, but the principle has not
changed.

Last time I experimented, these offload engines largely obviated the
advantages of jumbo frames. Jumbo frames do make a huge difference with
older hardware/software and with UDP, though.

These offload engines are why it is hard to increase NFS throughput via
channel bonding, because bonded channels generally have to disable these
engines.

Incidentally, I typically see 900MB/sec sustained NFS reads and writes
while moving half-terabyte files. (My application is somewhat specialized.)
Of course this requires a fast disk subsystem and 10GbE interconnect.

 - Pat


ATOM RSS1 RSS2