Subject: | |
From: | |
Reply To: | |
Date: | Wed, 18 Jul 2007 15:30:36 -0400 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Greetings!
I've just added a new node to our computing cluster and it's exhibiting
some odd behavior. I'm afraid the background information is rather long,
but I don't want to leave anything out that may be useful.
Our cluster has a single node available on the public network (ssh only)
and eight (now nine) additional nodes on a private network (accessible
only via ssh from the login node).
The original nodes are eight Sun V20z dual-Opteron boxes and one Sun
V40z quad-Opteron box. RAM varies from 4G to 32G.
We recently added a new node based on a Penguin Altus 600 box with two
dual-core Opterons and 16G of RAM.
One monitoring tool I use is htop (from dag) and screen. I log into the
login node, run screen, run htop, and then ^Ac and ssh to one of the
compute nodes and run htop. I repeat until I have htop running on all
nodes and can ^An and ^Ap to move around. I can detach the screen and
re-attach it (from anywhere) whenever I want to have a quick look at
what's going on.
I do this as a normal user.
When I follow this scenario and start htop on the new node, screen
consumes between 20-30% CPU and never updates. top runs fine.
Furthermore, if I just ssh from the login node to the new compute node
(screen not involved), htop behaves normally.
And further-furthermore, if I use the original scenario, open a screen
on the new node, su to root and run htop, htop behaves normally.
Any ideas, list-dwellers?
Cheers,
Pann
--
Pann McCuaig <[log in to unmask]> 212-854-8689
Systems Coordinator, Economics Department, Columbia University
Department Computing Resources:
http://www.columbia.edu/cu/economics/computing/
|
|
|