Steve Traylen said...
|How many nodes and jobs?
About 325 nodes. Without just one layer of queues,
it slows down drastically at 1500 jobs or so. With
routing queues it can slos down at a few hundred
and gets unusable by 1200 jobs queued.
...
|Do consider changing the values as described here.
|http://www.clusterresources.com/torquedocs21/a.flargeclusters.shtml
|
|in particular for large farms you really need to have poll_jobs set
|to true and increase the job_stat_rate.
We've played with everything there quite a bit.
Today we're going to try pulling some nodes and
setting up a separate server on SL3 to see if the
OS is involved or not.
Thanks,
Miles