We're getting "do_ypcall: clnt_call: RPC: Timed out" errors. We're in the process of upgrading to 4.4, starting with some new 64 bit Supermnicros, some with a single Xeon dual core and some with a single Core 2 Duo. Both have Intel e1000 ethernet chipsets. We use NIS for user passwd and group entries, as well as netgroups, services and automounts. This has worked for us on 32 bit systems from Redhat5.2 up through SL30{4,7} (including some 64 bit Athlons running a 32 bit OS). We can reproduce this on the 32 bit SL3 systems, but they're a lot slower, and it takes some effort to do it. We first saw problems with torque (we've used PBS Pro in the past), but narrowed it down to rsh (and even a bare bones program running rcmd()). A single, random rsh call is fairly safe, but if we do one every second or two, we quickly start getting hangs and the error: do_ypcall: clnt_call: RPC: Timed out So it can happen at any time, but when we fire off lots of jobs in quick succession via torque, it's guaranteed to happen. We have also seen this with less frequency in some home grown tools. We've stripped down NIS to bare essentials (using only netgroup for testing), we've tried adding in a 3Com ethernet card to use instead of the built on cards, we've upgraded to the latest EL4 ypbind, ypserv and glibc (which we found in a CERN repo after looking through TUV's bug list), we've tried adding more, faster NIS servers, and we've tried isolating three machines on a 100Mb network (no spare 1Gb switches). And tried running the non-SMP kernel. No difference. Bizarrely, we also get whining in the SL3 ypservers' message logs about failed NIS host lookups. We don't use NIS for host lookups; nsswitch.conf has hosts: files dns . We had only used solaris servers in the past, and their ypserv's were not logging these errors. Presumably they still got the requests, but we don't know that. We ran ypserv in debug mode for a while, and nothing jumped out at us. We started running nscd for passwd and group on all the Linux systems after this started. No change. The switches are Cisco Gb switches and HP ProCurve Gb switches (the isolated test network was a 3Com 100Mb switch). Any ideas on either problem? Thanks, Miles TEST SCRIPT (works every time with failure in less than 10 rsh calls on our faster boxes on the Gb network): #!/bin/csh # set LIST_OF_HOSTNAMES to a valid list of hosts # to try, the more the merrier. We use a command # to generate these from a file of valid names. while ( 1 ) foreach i ( $LIST_OF_HOSTNAMES ) rsh $i uname -a # or any command you like end end