LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

August 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS August 2007

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: torque vs sockets in 4.4
From:	Steve Traylen <[log in to unmask]>
Reply To:	Steve Traylen <[log in to unmask]>
Date:	Tue, 7 Aug 2007 09:59:06 +0200
Content-Type:	multipart/signed
Parts/Attachments:	text/plain (1268 bytes) , smime.p7s (1609 bytes)


On Aug 7, 2007, at 4:56 AM, Miles O'Neal wrote:

> We recently migrated from PBS to torque, and most of our
> systems are now running 4.4 .  The torque server (a Core2
> Duo at 2.4GHz) is only handling about 3x the jobs our 300MHz
> Sun Ultra 5 could handle before bogging down horribly.  This
> seems a bit odd.
>

How many nodes and jobs?

> Watching the server logs, it seems there's a lot of time
> spent waiting for replies on sockets, though it's not clear
> whether it's on the same system between the scheduler and
> batch server, or between the batch server and client node
> processes (pbs_moms).
>

Do consider changing the values as described here.
http://www.clusterresources.com/torquedocs21/a.flargeclusters.shtml

in particular for large farms you really need to have poll_jobs set  
to true
and increase the job_stat_rate.

> We're beginning to wonder of it's OS-related.  Torque uses
> a lot of sockets, and sets them up and tears them down at a
> hefty rate.  We have the number set to 16K for the scheduler
> and server processes via ulimit, but we aren't getting much
> above 1400 between the two processes.
>
> Is anyone aware of an issue in 4.4 that might affect this?
>
> Thanks,
> Miles

-- 
Steve Traylen
Work Calendar: http://tinyurl.com/22lw9o
[log in to unmask]
CERN, IT-GD-OPS.

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV