On Fri, Jun 3, 2016 at 9:38 AM, Alec T. Habig <[log in to unmask]> wrote:

Hmm. I've (independently from the original poster) got the 8k sizes:

Try running "mount -o nfsvers=3 myserver:/mydir /mnt" by hand between two SL6.7 systems. I just did, and it gives me proto=tcp and rsize/wsize of 1024k.

I vaguely recall seeing firewall oddities cause NFS mounts to fall back to UDP with an 8k rsize/wsize. So check the "proto=tcp" carefully if you see something else.

with the network set up for jumbo frames (MTU=9000) the nfs server then
chunks out network packets that get through the switches with minimal
overhead. Making this change from the default (1k, MTU=1500) made a
huge throughput difference at the time we implemented it: which was a
number of years ago, so certainly the world has changed since then.

How do 1MB sized nfs chunks interact with the networking?

The same way any other application over TCP interacts with the networking... TCP will break the (large) NFS requests and responses into MTU-sized segments on send and then reassemble them on receive. The reason to let TCP do this (instead of trying to match the application payloads to the MTU) is that modern systems offload both the breakup and the reassembly to the network hardware. Try searching for "TCP Segmentation Offload" and "Large Receive Offload" for details. Note that modern Linux uses different names for these things and I can't keep up, but the principle has not changed.

Last time I experimented, these offload engines largely obviated the advantages of jumbo frames. Jumbo frames do make a huge difference with older hardware/software and with UDP, though.

These offload engines are why it is hard to increase NFS throughput via channel bonding, because bonded channels generally have to disable these engines.

Incidentally, I typically see 900MB/sec sustained NFS reads and writes while moving half-terabyte files. (My application is somewhat specialized.) Of course this requires a fast disk subsystem and 10GbE interconnect.

- Pat