LISTSERV - SCIENTIFIC-LINUX-USERS Archives

On several of my clusters I am seeing a problem with ssh and scp hanging
on exit. Here's an example:

root@uinta ~ # scp /etc/ssh/ssh_known_hosts node0000:/etc/ssh
ssh_known_hosts                          100%  633KB 632.5KB/s   00:00

[program just hangs here]

Meanwhile, on node0000 (with some editing for brevity):

-bash-3.00# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root      1613   976  0 10:12 ?        00:00:00 sshd: root@notty
root      1617  1613  0 10:12 ?        00:00:00 [scp] <defunct>

If I kill the sshd process (1613) then things continue normally on
uinta. If I CTRL-C the scp process on uinta then I am left with a
defunct process and the sshd process it is connected to on node0000. The
hang rate for these scp attempts is roughly 50%, in a random fashion.

Looking for another way to skin the cat I have tried:

root@uinta /etc/ssh # cat /etc/ssh/ssh_known_hosts | ssh -f -n node0000
'cat > /etc/ssh/ssh_known_hosts'

This give me a much lower failure rate, ~1%, but eventually it hangs and
on node0000 we see:

-bash-3.00# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root      1647   976  0 10:19 ?        00:00:00 sshd: root@notty
root      1651  1647  0 10:19 ?        00:00:00 [bash] <defunct>

Following what advice I could google up I've added 

shopt -s huponexit

to .bashrc and .bash_profile. This seems to have increased the
successful connection/disconnection rate a little, up to the 50%
mentioned above. Prior to this it almost always hung on this particular
cluster.

Any suggestions or tips whould be greatly appreciated.

Thanks,

jbh