Subject: | |
From: | |
Reply To: | |
Date: | Tue, 22 May 2007 13:10:20 -0400 |
Content-Type: | multipart/mixed |
Parts/Attachments: |
|
|
Hello,
I am trying to understand some weird performance characteristics
on a newly purchased blade (see statistics below).
The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors.
This is the first AMD and first 64 processor we have bought.
I installed SL44 x86_64 and we did some performance tests.
When running a single job (compute bound monte-carlo with HBOOK output)
the performance was about twice as slow as running on our
Intel based blade. Although this difference
could be attributed to difference in
proccesors, running several single
jobs in a row produced rather erratic results...
200-300 seconds different on a 900 second job.
Some were comparable to the 32 bit processor, some were not.
Also, running 4 of the same jobs in parallel
produced results which were almost twice as fast !
I then (for fun) installed SL43 x86_64 . This produced results
quite different than those on SL44 and more compatible with
our 32 bit blades.
Below is a sample of the CPU statistics
We first ran the existing 32 bit executable.
We then recompiled and ran the 64 bit executable.
Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have
missing libraries) so we would really like to understand this performance
discrepency on 32 bit executables and SL44.
32 bit executable single job
SL 44 SL43
906 sec 556 sec
32 bit executable 4 jobs in parallel
SL44 SL43
job 1 452 sec 446 sec
job 2 446 sec 442 sec
job 3 445 sec 444 sec
job 4 448 sec 446 sec
64 bit executable single job
510 sec 497 sec
The 64 bit executable seems to be a little more predictable
So, does anyone have any idea
1. Why such a difference in performance between SL44 and SL43 (Why does
SL44 produce much slower results on a single job)
2. Why running 4 jobs in parallel produces faster results than
a single job ? One would think jobs running in parallel
would produce slightly slower performance.
3. Why running 4 jobs in parallel on SL44 produces much
faster results (900 sec vs 452 sec) .
4. Should we not be running our 32 bit executables with an
SLxx x86_64 installed ?
I have not yet tried installing SL44(43) x86 to check the
performance. Should I ?
Thanks for any insight or help
Regards
Rochelle Lauer
Yale University Physics
|
|
|