Jon,

Thanks for your insights.  We're looking at these
things.  Here are some partial answers.

|If you manage to send sufficiently many requests to the server that *it*
|can't cope then you will see these messages.  Some ypserv implementations 
|cope better with load than others...

We added more servers.  Then we tried it with just two
clients and one system running the server.  Same thing.
Even with "sleep 1" between rsh calls.

|Now we have some servers with Intel mboards with braindead BMC chipsets 
|which eat all traffic to the IPMI ports.  When anything happened to pick 
|those ports it never gets an answer so will time out.  We saw *lots* of 
|this especially doing things which caused lots of yp requests -- until we 
|tracked it down and caused things to avoid the IPMI ports.

I discussed this in the answer to John Hearn's note,
which I just sent to the list.  We have 'em, but have
yet to configure them.  We took the IPMI out of one
system as a test.  No difference.

|I assume that you also checked for firewall issues at both ends...

iptables and ipchains off comepletely.  No
forewalling on the switches.  SELinux is off.

|Do you also see it with ssh connections?  I ask 'cos rsh also picks a 
|privelaged (tcp) port...

No.  Only rsh.  We thought about switching to ssh for
torque, but we have other apps that throw these errors
on these systems as well (albeit far less often).

Thanks,
Miles