Subject: | |
From: | |
Reply To: | |
Date: | Tue, 22 May 2007 13:20:43 -0500 |
Content-Type: | TEXT/PLAIN |
Parts/Attachments: |
|
|
On Tue, 22 May 2007, rochelle lauer wrote:
> This is a multi-part message in MIME format.
>
> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
> Content-type: text/plain; charset=ISO-8859-1; format=flowed
> Content-transfer-encoding: 7BIT
>
>
>
> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
> Content-type: text/plain; name=timing.txt
> Content-transfer-encoding: 7BIT
> Content-disposition: inline; filename=timing.txt
>
> Hello,
>
> I am trying to understand some weird performance characteristics
> on a newly purchased blade (see statistics below).
>
> The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors.
> This is the first AMD and first 64 processor we have bought.
Is this system a numa based system?
>
> I installed SL44 x86_64 and we did some performance tests.
>
> When running a single job (compute bound monte-carlo with HBOOK output)
> the performance was about twice as slow as running on our
> Intel based blade. Although this difference
> could be attributed to difference in
> proccesors, running several single
How does the amount of memory compare to the intel based tests?
> jobs in a row produced rather erratic results...
> 200-300 seconds different on a 900 second job.
> Some were comparable to the 32 bit processor, some were not.
>
> Also, running 4 of the same jobs in parallel
> produced results which were almost twice as fast !
>
> I then (for fun) installed SL43 x86_64 . This produced results
> quite different than those on SL44 and more compatible with
> our 32 bit blades.
>
> Below is a sample of the CPU statistics
>
> We first ran the existing 32 bit executable.
>
> We then recompiled and ran the 64 bit executable.
>
> Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have
> missing libraries) so we would really like to understand this performance
> discrepency on 32 bit executables and SL44.
>
> 32 bit executable single job
>
> SL 44 SL43
> 906 sec 556 sec
>
> 32 bit executable 4 jobs in parallel
>
> SL44 SL43
>
> job 1 452 sec 446 sec
> job 2 446 sec 442 sec
> job 3 445 sec 444 sec
> job 4 448 sec 446 sec
>
>
> 64 bit executable single job
>
> 510 sec 497 sec
> The 64 bit executable seems to be a little more predictable
>
>
> So, does anyone have any idea
>
> 1. Why such a difference in performance between SL44 and SL43 (Why does
> SL44 produce much slower results on a single job)
Not enough info to determine this.
The biggest difference between SL43 and SL44 is that the kernel has
changes.
>
> 2. Why running 4 jobs in parallel produces faster results than
> a single job ? One would think jobs running in parallel
> would produce slightly slower performance.
Depends on what they are doing?
>
> 3. Why running 4 jobs in parallel on SL44 produces much
> faster results (900 sec vs 452 sec) .
>
I suggest you try some of the performance tools to help determine what is
going on.
Things like oprofile, vmstat can help determine what is going on.
> 4. Should we not be running our 32 bit executables with an
> SLxx x86_64 installed ?
> I have not yet tried installing SL44(43) x86 to check the
> performance. Should I ?
>
Most see a performance improvement with 32bit on 64bit os. This has been
seen quite a bit with AMD 64bit Opteron cpu's because the memory bandwith
is faster on AMD 64bit Opteron cpu's.
faster on >
>
>
> Thanks for any insight or help
>
> Regards
> Rochelle Lauer
> Yale University Physics
>
>
> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)--
>
-Connie Sieh
|
|
|