SCIENTIFIC-LINUX-USERS Archives

February 2006

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
John Hearns <[log in to unmask]>
Reply To:
Date:
Wed, 8 Feb 2006 13:50:02 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (45 lines)
On Wed, 2006-02-08 at 13:28 +0100, Bruce Becker wrote:
> Hi Scientific-Linux-users
> 
> We have a small cluster of Sun Fire v20z's running SL 4.2 x86_64 here
> in Cape Town. These machines have service processors with dedicated
> ethernet ports, and of course an array of sensors in them that can be
> accessed using IPMI. I have installed the drivers on the head node to
> be able to use IPMI over the LAN ports
We have many Sun v20z's in service, and have set up the service processors 
as you say (easily well over 1000 and more)
We set up the processors to configure a network user, put on an ssh key
enable IPMI over LAN and the serial port redirection, using an inhouse
expect script.
I've worked quite a lot with these processors, so I'll be happy to give
you any hints and tips.
We also use the function in the SP which lists the platform MAC
addresses, which is invaluable for setting up the DHCP information for
the main nodes.


> I would like to set up a monitoring and reporting (of sensor data)
> system on these machines, apart from the usual remote management stuff
> (booting, changing BIOS settings, etc). I would like to know if anyone
> has done this before and of course what advice the community can give
> on this. What framework are people using ... I had BigBrother or PCP
> in mind but I'm only vaguely familiar with these and I'm not sure if
> they can be configured to monitor data with IPMI.
We have our own cluster monitoring appliance for this, which is a
dedicated server   http://www.streamline-computing.com/?page=73
There is a web interface to the IPMI data etc.
We also supply scripts to power up /power down nodes in sequence, and to
read out the sensors.

If I were you, I would consider either Ganglia for the monitoring,
or maybe on a new install Nagios   www.nagios.org

You can run a periodic cron job on the nodes themselves to access IPMI
information on the node also.
If using Nagios, you can use NRPE (nagios remote plugin executable)
If using Ganglia just run a cron job to report the variables.

You can either use ipmitool and parse the output from sensors,
or you can use "ssh sphostname" to run the sensors command,
and parse the output of that.

ATOM RSS1 RSS2