Subject: | |
From: | |
Reply To: | |
Date: | Tue, 14 Feb 2006 10:23:11 -0700 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On several of my clusters I am seeing a problem with ssh and scp hanging
on exit. Here's an example:
root@uinta ~ # scp /etc/ssh/ssh_known_hosts node0000:/etc/ssh
ssh_known_hosts 100% 633KB 632.5KB/s 00:00
[program just hangs here]
Meanwhile, on node0000 (with some editing for brevity):
-bash-3.00# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1613 976 0 10:12 ? 00:00:00 sshd: root@notty
root 1617 1613 0 10:12 ? 00:00:00 [scp] <defunct>
If I kill the sshd process (1613) then things continue normally on
uinta. If I CTRL-C the scp process on uinta then I am left with a
defunct process and the sshd process it is connected to on node0000. The
hang rate for these scp attempts is roughly 50%, in a random fashion.
Looking for another way to skin the cat I have tried:
root@uinta /etc/ssh # cat /etc/ssh/ssh_known_hosts | ssh -f -n node0000
'cat > /etc/ssh/ssh_known_hosts'
This give me a much lower failure rate, ~1%, but eventually it hangs and
on node0000 we see:
-bash-3.00# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1647 976 0 10:19 ? 00:00:00 sshd: root@notty
root 1651 1647 0 10:19 ? 00:00:00 [bash] <defunct>
Following what advice I could google up I've added
shopt -s huponexit
to .bashrc and .bash_profile. This seems to have increased the
successful connection/disconnection rate a little, up to the 50%
mentioned above. Prior to this it almost always hung on this particular
cluster.
Any suggestions or tips whould be greatly appreciated.
Thanks,
jbh
|
|
|