Subject: | |
From: | |
Reply To: | |
Date: | Tue, 22 May 2007 13:23:00 -0500 |
Content-Type: | TEXT/PLAIN |
Parts/Attachments: |
|
|
On Tue, 22 May 2007, Connie Sieh wrote:
> On Tue, 22 May 2007, rochelle lauer wrote:
>
>> This is a multi-part message in MIME format.
>>
>> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
>> Content-type: text/plain; charset=ISO-8859-1; format=flowed
>> Content-transfer-encoding: 7BIT
>>
>>
>>
>> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)
>> Content-type: text/plain; name=timing.txt
>> Content-transfer-encoding: 7BIT
>> Content-disposition: inline; filename=timing.txt
>>
>> Hello,
>>
>> I am trying to understand some weird performance characteristics
>> on a newly purchased blade (see statistics below).
>>
>> The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors.
>> This is the first AMD and first 64 processor we have bought.
>
> Is this system a numa based system?
Numa can be involved in performance issues. If you have numa and it is on
try turning it off and rerun your tests.
-Connie Sieh
>
>>
>> I installed SL44 x86_64 and we did some performance tests.
>>
>> When running a single job (compute bound monte-carlo with HBOOK output)
>> the performance was about twice as slow as running on our
>> Intel based blade. Although this difference
>> could be attributed to difference in
>> proccesors, running several single
>
> How does the amount of memory compare to the intel based tests?
>
>> jobs in a row produced rather erratic results...
>> 200-300 seconds different on a 900 second job.
>> Some were comparable to the 32 bit processor, some were not.
>>
>> Also, running 4 of the same jobs in parallel
>> produced results which were almost twice as fast !
>>
>> I then (for fun) installed SL43 x86_64 . This produced results
>> quite different than those on SL44 and more compatible with
>> our 32 bit blades.
>>
>> Below is a sample of the CPU statistics
>>
>> We first ran the existing 32 bit executable.
>>
>> We then recompiled and ran the 64 bit executable.
>>
>> Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have
>> missing libraries) so we would really like to understand this performance
>> discrepency on 32 bit executables and SL44.
>>
>> 32 bit executable single job
>>
>> SL 44 SL43
>> 906 sec 556 sec
>>
>> 32 bit executable 4 jobs in parallel
>>
>> SL44 SL43
>>
>> job 1 452 sec 446 sec
>> job 2 446 sec 442 sec
>> job 3 445 sec 444 sec
>> job 4 448 sec 446 sec
>>
>>
>> 64 bit executable single job
>>
>> 510 sec 497 sec
>> The 64 bit executable seems to be a little more predictable
>>
>>
>> So, does anyone have any idea
>>
>> 1. Why such a difference in performance between SL44 and SL43 (Why does
>> SL44 produce much slower results on a single job)
>
> Not enough info to determine this.
> The biggest difference between SL43 and SL44 is that the kernel has
> changes.
>
>>
>> 2. Why running 4 jobs in parallel produces faster results than
>> a single job ? One would think jobs running in parallel
>> would produce slightly slower performance.
>
> Depends on what they are doing?
>
>>
>> 3. Why running 4 jobs in parallel on SL44 produces much
>> faster results (900 sec vs 452 sec) .
>>
>
> I suggest you try some of the performance tools to help determine what is
> going on.
>
> Things like oprofile, vmstat can help determine what is going on.
>
>> 4. Should we not be running our 32 bit executables with an
>> SLxx x86_64 installed ?
>> I have not yet tried installing SL44(43) x86 to check the
>> performance. Should I ?
>>
>
> Most see a performance improvement with 32bit on 64bit os. This has been
> seen quite a bit with AMD 64bit Opteron cpu's because the memory bandwith
> is faster on AMD 64bit Opteron cpu's.
>
> faster on >
>>
>>
>> Thanks for any insight or help
>>
>> Regards
>> Rochelle Lauer
>> Yale University Physics
>>
>>
>> --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)--
>>
> -Connie Sieh
>
|
|
|