Subject: | |
From: | |
Reply To: | |
Date: | Mon, 6 Aug 2007 21:56:32 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
We recently migrated from PBS to torque, and most of our
systems are now running 4.4 . The torque server (a Core2
Duo at 2.4GHz) is only handling about 3x the jobs our 300MHz
Sun Ultra 5 could handle before bogging down horribly. This
seems a bit odd.
Watching the server logs, it seems there's a lot of time
spent waiting for replies on sockets, though it's not clear
whether it's on the same system between the scheduler and
batch server, or between the batch server and client node
processes (pbs_moms).
We're beginning to wonder of it's OS-related. Torque uses
a lot of sockets, and sets them up and tears them down at a
hefty rate. We have the number set to 16K for the scheduler
and server processes via ulimit, but we aren't getting much
above 1400 between the two processes.
Is anyone aware of an issue in 4.4 that might affect this?
Thanks,
Miles
|
|
|