SCIENTIFIC-LINUX-USERS Archives

September 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Devin Bougie <[log in to unmask]>
Reply To:
Devin Bougie <[log in to unmask]>
Date:
Wed, 8 Sep 2010 16:42:43 -0400
Content-Type:
multipart/mixed
Parts/Attachments:
text/plain (12 kB) , iostat.log (45 kB)
Hi, All.  We are seeing periodic I/O delays on a new large compute node using two Xeon X5670 CPU's (hyper-threaded for a total of 24 processors) with 48GB memory.  The system is running the latest kernel on a fully updated x86_64 SL5.5.  So far my attempts at changing schedulers and BIOS configurations has not helped.  This is a unique system for us, and I have not been able to reproduce this on any other x86_64 SL5.5 systems.  Various benchmarks we've run (most extensively using Mathematica) show the system performing very well.  So far we've only seen the problem when performing disk I/O.

Please let me know if there is any more information I can provide, and any suggestions would be greatly appreciated.

Many thanks,
Devin

------

Here is an example of how we can reproduce the delays by repeatedly writing modest amounts of data.

------
[dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.914825 seconds, 1.2 GB/s

real    0m2.037s
user    0m0.000s
sys     0m1.050s
[dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.903858 seconds, 1.2 GB/s

real    0m1.129s
user    0m0.001s
sys     0m1.125s
[dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.923554 seconds, 1.2 GB/s

real    0m1.141s
user    0m0.001s
sys     0m1.138s
[dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.911595 seconds, 1.2 GB/s

real    1m27.200s
user    0m0.001s
sys     0m1.148s

------

Here is a look at "vmstat -S M 5" before, during, and after the above tests.

------
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0  46937     35   1184    0    0     4    81   43    7  0  0 98  2  0
 0  0      0  46937     35   1184    0    0     1    18 1017   85  0  0 100  0  0
 0  0      0  46937     35   1184    0    0     0     5 1013   93  0  0 100  0  0
 0  0      0  46937     35   1184    0    0     0    42 1017   84  0  0 100  0  0
 0  1      0  46937     35   1184    0    0     1     2 1005   91  0  0 100  0  0
 0  0      0  46937     35   1184    0    0     0    17 1016   90  0  0 100  0  0
 0  0      0  46937     35   1184    0    0     0     5 1012   89  0  0 100  0  0
 0  0      0  46937     35   1184    0    0     0     5 1010   85  0  0 100  0  0
 0  1      0  46919     35   1186    0    0   596    12 1071  224  0  0 99  1  0
 0  0      0  46919     35   1186    0    0     0    70 1019   89  0  0 100  0  0
 0  0      0  46919     35   1186    0    0     0    11 1015   98  0  0 100  0  0
 0  0      0  46919     35   1186    0    0     0     7 1021  112  0  0 100  0  0
 1  0      0  47780     38    337    0    0     0    31 1035  760  0  2 98  0  0
 0  5      0  46929     38   1135    0    0     0 26431 1040  341  0  1 94  5  0
 0  6      0  46986     38   1081    0    0     2  9840 1032  158  0  0 84 16  0
 0  6      0  47054     39   1013    0    0    65 13113 1052  186  0  0 85 15  0
 0  5      0  47137     39    948    0    0     2 13006 1036  171  0  0 86 14  0
 0  7      0  47208     39    881    0    0     0 13135 1035  161  0  0 86 14  0
 0  6      0  47258     39    833    0    0     0  9736 1031  150  0  0 84 16  0
 0  6      0  47300     39    793    0    0     0  8656 1025  131  0  0 83 17  0
 0  5      0  47349     39    750    0    0     0     2 1027  569  0  0 89 11  0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  3      0  47437     39    660    0    0     0 26781 1042 1372  0  0 88 12  0
 0  5      0  47517     39    585    0    0     0 13008 1041  177  0  0 85 15  0
 0  5      0  47571     39    531    0    0     0 13107 1030  148  0  0 83 17  0
 0  5      0  47622     39    484    0    0     0  9734 1033  150  0  0 87 13  0
 0  5      0  47668     39    440    0    0     0  6554 1029  134  0  0 87 12  0
 0  5      0  47711     39    399    0    0     1  9832 1027  146  0  0 83 17  0
 0  4      0  47772     39    340    0    0     0 13006 1036  149  0  0 85 15  0
 0  5      0  47841     39    275    0    0     0 13110 1037  164  0  0 86 14  0
 0  5      0  47909     39    212    0    0     0  6754 1037  310  0  0 87 13  0
 0  1      0  46894     39   1186    0    0     0 11842 1031 1825  0  1 93  6  0
 0  4      0  46892     39   1186    0    0     0 27131 1055  104  0  0 95  5  0
 0  4      0  46893     39   1186    0    0     0  9834 1025  110  0  0 93  6  0
 0  5      0  46893     39   1186    0    0     0  6557 1031  105  0  0 91  9  0
 0  5      0  46893     39   1186    0    0     0  9732 1022  114  0  0 93  7  0
 0  5      0  46894     39   1186    0    0     0  6554 1021  104  0  0 96  4  0
 0  5      0  46893     39   1186    0    0     0  9837 1035  120  0  0 91  9  0
 0  4      0  46893     39   1186    0    0     0 13010 1039  103  0  0 96  4  0
 0  4      0  46893     39   1186    0    0     0 16283 1043  115  0  0 92  8  0
 0  4      0  46894     39   1186    0    0     0  9832 1033  103  0  0 92  8  0
------

And attached is a look at "iostat -m -x 5".

Here is some basic system information.

------
[root@lnx4103 ~]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping	: 2
cpu MHz		: 2933.540
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 6
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips	: 5867.08
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: [8]

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping	: 2
cpu MHz		: 2933.540
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 1
cpu cores	: 6
apicid		: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips	: 5866.57
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: [8]

...

processor	: 22
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping	: 2
cpu MHz		: 2933.540
cache size	: 12288 KB
physical id	: 1
siblings	: 12
core id		: 9
cpu cores	: 6
apicid		: 51
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips	: 5866.79
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: [8]

processor	: 23
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping	: 2
cpu MHz		: 2933.540
cache size	: 12288 KB
physical id	: 1
siblings	: 12
core id		: 10
cpu cores	: 6
apicid		: 53
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips	: 5866.95
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: [8]
------

[root@lnx4103 ~]# lspci
00:00.0 Host bridge: Intel Corporation 5500 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 22)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 22)
00:10.0 PIC: Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 (rev 22)
00:10.1 PIC: Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 (rev 22)
00:11.0 PIC: Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 (rev 22)
00:11.1 PIC: Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 (rev 22)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.3 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2
01:01.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10)
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
------



ATOM RSS1 RSS2