LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

February 2006

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS February 2006

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	SSH/SCP hanging on exit.
From:	John Hanks <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Tue, 14 Feb 2006 10:23:11 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (48 lines)

On several of my clusters I am seeing a problem with ssh and scp hanging
on exit. Here's an example:

root@uinta ~ # scp /etc/ssh/ssh_known_hosts node0000:/etc/ssh
ssh_known_hosts                          100%  633KB 632.5KB/s   00:00

[program just hangs here]

Meanwhile, on node0000 (with some editing for brevity):

-bash-3.00# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root      1613   976  0 10:12 ?        00:00:00 sshd: root@notty
root      1617  1613  0 10:12 ?        00:00:00 [scp] <defunct>

If I kill the sshd process (1613) then things continue normally on
uinta. If I CTRL-C the scp process on uinta then I am left with a
defunct process and the sshd process it is connected to on node0000. The
hang rate for these scp attempts is roughly 50%, in a random fashion.

Looking for another way to skin the cat I have tried:

root@uinta /etc/ssh # cat /etc/ssh/ssh_known_hosts | ssh -f -n node0000
'cat > /etc/ssh/ssh_known_hosts'

This give me a much lower failure rate, ~1%, but eventually it hangs and
on node0000 we see:

-bash-3.00# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root      1647   976  0 10:19 ?        00:00:00 sshd: root@notty
root      1651  1647  0 10:19 ?        00:00:00 [bash] <defunct>

Following what advice I could google up I've added 

shopt -s huponexit

to .bashrc and .bash_profile. This seems to have increased the
successful connection/disconnection rate a little, up to the 50%
mentioned above. Prior to this it almost always hung on this particular
cluster.

Any suggestions or tips whould be greatly appreciated.

Thanks,

jbh

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV