On Tue, 22 May 2007, rochelle lauer wrote: > This is a multi-part message in MIME format. > > --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg) > Content-type: text/plain; charset=ISO-8859-1; format=flowed > Content-transfer-encoding: 7BIT > > > > --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg) > Content-type: text/plain; name=timing.txt > Content-transfer-encoding: 7BIT > Content-disposition: inline; filename=timing.txt > > Hello, > > I am trying to understand some weird performance characteristics > on a newly purchased blade (see statistics below). > > The hardware is an HP BL465 with 2 dual core AMD HE 2216 processors. > This is the first AMD and first 64 processor we have bought. Is this system a numa based system? > > I installed SL44 x86_64 and we did some performance tests. > > When running a single job (compute bound monte-carlo with HBOOK output) > the performance was about twice as slow as running on our > Intel based blade. Although this difference > could be attributed to difference in > proccesors, running several single How does the amount of memory compare to the intel based tests? > jobs in a row produced rather erratic results... > 200-300 seconds different on a 900 second job. > Some were comparable to the 32 bit processor, some were not. > > Also, running 4 of the same jobs in parallel > produced results which were almost twice as fast ! > > I then (for fun) installed SL43 x86_64 . This produced results > quite different than those on SL44 and more compatible with > our 32 bit blades. > > Below is a sample of the CPU statistics > > We first ran the existing 32 bit executable. > > We then recompiled and ran the 64 bit executable. > > Many of our jobs cannot be recompiled (won't compile on gcc 3.4 or have > missing libraries) so we would really like to understand this performance > discrepency on 32 bit executables and SL44. > > 32 bit executable single job > > SL 44 SL43 > 906 sec 556 sec > > 32 bit executable 4 jobs in parallel > > SL44 SL43 > > job 1 452 sec 446 sec > job 2 446 sec 442 sec > job 3 445 sec 444 sec > job 4 448 sec 446 sec > > > 64 bit executable single job > > 510 sec 497 sec > The 64 bit executable seems to be a little more predictable > > > So, does anyone have any idea > > 1. Why such a difference in performance between SL44 and SL43 (Why does > SL44 produce much slower results on a single job) Not enough info to determine this. The biggest difference between SL43 and SL44 is that the kernel has changes. > > 2. Why running 4 jobs in parallel produces faster results than > a single job ? One would think jobs running in parallel > would produce slightly slower performance. Depends on what they are doing? > > 3. Why running 4 jobs in parallel on SL44 produces much > faster results (900 sec vs 452 sec) . > I suggest you try some of the performance tools to help determine what is going on. Things like oprofile, vmstat can help determine what is going on. > 4. Should we not be running our 32 bit executables with an > SLxx x86_64 installed ? > I have not yet tried installing SL44(43) x86 to check the > performance. Should I ? > Most see a performance improvement with 32bit on 64bit os. This has been seen quite a bit with AMD 64bit Opteron cpu's because the memory bandwith is faster on AMD 64bit Opteron cpu's. faster on > > > > Thanks for any insight or help > > Regards > Rochelle Lauer > Yale University Physics > > > --Boundary_(ID_xQkpMk+I3bYu/zzmEDy/dg)-- > -Connie Sieh