Subject: | |
From: | |
Reply To: | |
Date: | Thu, 19 Jul 2007 07:57:39 +0800 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Pann McCuaig wrote:
> Greetings!
>
> I've just added a new node to our computing cluster and it's exhibiting
> some odd behavior. I'm afraid the background information is rather long,
> but I don't want to leave anything out that may be useful.
>
> Our cluster has a single node available on the public network (ssh only)
> and eight (now nine) additional nodes on a private network (accessible
> only via ssh from the login node).
>
> The original nodes are eight Sun V20z dual-Opteron boxes and one Sun
> V40z quad-Opteron box. RAM varies from 4G to 32G.
>
> We recently added a new node based on a Penguin Altus 600 box with two
> dual-core Opterons and 16G of RAM.
>
> One monitoring tool I use is htop (from dag) and screen. I log into the
> login node, run screen, run htop, and then ^Ac and ssh to one of the
> compute nodes and run htop. I repeat until I have htop running on all
> nodes and can ^An and ^Ap to move around. I can detach the screen and
> re-attach it (from anywhere) whenever I want to have a quick look at
> what's going on.
>
> I do this as a normal user.
>
> When I follow this scenario and start htop on the new node, screen
> consumes between 20-30% CPU and never updates. top runs fine.
>
> Furthermore, if I just ssh from the login node to the new compute node
> (screen not involved), htop behaves normally.
>
> And further-furthermore, if I use the original scenario, open a screen
> on the new node, su to root and run htop, htop behaves normally.
>
> Any ideas, list-dwellers?
Not really, but, within screen
for n in node1 node2 node3 node4 ... ; do screen -t $n ssh -t $n htop;done
It's shorter, and being different it just might get node9 to behave better.
--
Cheers
John
-- spambait
[log in to unmask] [log in to unmask]
Please do not reply off-list
|
|
|