SCIENTIFIC-LINUX-USERS Archives

September 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Steven Timm <[log in to unmask]>
Reply To:
Steven Timm <[log in to unmask]>
Date:
Wed, 8 Sep 2010 16:03:33 -0500
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (264 lines)
Devin--what is the output of the command "dmesg"
Are there any kernel traces or bugs in that output?

Steve


On Wed, 8 Sep 2010, Devin Bougie wrote:

> Hi, All.  We are seeing periodic I/O delays on a new large compute node using two Xeon X5670 CPU's (hyper-threaded for a total of 24 processors) with 48GB memory.  The system is running the latest kernel on a fully updated x86_64 SL5.5.  So far my attempts at changing schedulers and BIOS configurations has not helped.  This is a unique system for us, and I have not been able to reproduce this on any other x86_64 SL5.5 systems.  Various benchmarks we've run (most extensively using Mathematica) show the system performing very well.  So far we've only seen the problem when performing disk I/O.
>
> Please let me know if there is any more information I can provide, and any suggestions would be greatly appreciated.
>
> Many thanks,
> Devin
>
> ------
>
> Here is an example of how we can reproduce the delays by repeatedly writing modest amounts of data.
>
> ------
> [dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 0.914825 seconds, 1.2 GB/s
>
> real    0m2.037s
> user    0m0.000s
> sys     0m1.050s
> [dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 0.903858 seconds, 1.2 GB/s
>
> real    0m1.129s
> user    0m0.001s
> sys     0m1.125s
> [dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 0.923554 seconds, 1.2 GB/s
>
> real    0m1.141s
> user    0m0.001s
> sys     0m1.138s
> [dab66@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 0.911595 seconds, 1.2 GB/s
>
> real    1m27.200s
> user    0m0.001s
> sys     0m1.148s
>
> ------
>
> Here is a look at "vmstat -S M 5" before, during, and after the above tests.
>
> ------
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 0  0      0  46937     35   1184    0    0     4    81   43    7  0  0 98  2  0
> 0  0      0  46937     35   1184    0    0     1    18 1017   85  0  0 100  0  0
> 0  0      0  46937     35   1184    0    0     0     5 1013   93  0  0 100  0  0
> 0  0      0  46937     35   1184    0    0     0    42 1017   84  0  0 100  0  0
> 0  1      0  46937     35   1184    0    0     1     2 1005   91  0  0 100  0  0
> 0  0      0  46937     35   1184    0    0     0    17 1016   90  0  0 100  0  0
> 0  0      0  46937     35   1184    0    0     0     5 1012   89  0  0 100  0  0
> 0  0      0  46937     35   1184    0    0     0     5 1010   85  0  0 100  0  0
> 0  1      0  46919     35   1186    0    0   596    12 1071  224  0  0 99  1  0
> 0  0      0  46919     35   1186    0    0     0    70 1019   89  0  0 100  0  0
> 0  0      0  46919     35   1186    0    0     0    11 1015   98  0  0 100  0  0
> 0  0      0  46919     35   1186    0    0     0     7 1021  112  0  0 100  0  0
> 1  0      0  47780     38    337    0    0     0    31 1035  760  0  2 98  0  0
> 0  5      0  46929     38   1135    0    0     0 26431 1040  341  0  1 94  5  0
> 0  6      0  46986     38   1081    0    0     2  9840 1032  158  0  0 84 16  0
> 0  6      0  47054     39   1013    0    0    65 13113 1052  186  0  0 85 15  0
> 0  5      0  47137     39    948    0    0     2 13006 1036  171  0  0 86 14  0
> 0  7      0  47208     39    881    0    0     0 13135 1035  161  0  0 86 14  0
> 0  6      0  47258     39    833    0    0     0  9736 1031  150  0  0 84 16  0
> 0  6      0  47300     39    793    0    0     0  8656 1025  131  0  0 83 17  0
> 0  5      0  47349     39    750    0    0     0     2 1027  569  0  0 89 11  0
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 0  3      0  47437     39    660    0    0     0 26781 1042 1372  0  0 88 12  0
> 0  5      0  47517     39    585    0    0     0 13008 1041  177  0  0 85 15  0
> 0  5      0  47571     39    531    0    0     0 13107 1030  148  0  0 83 17  0
> 0  5      0  47622     39    484    0    0     0  9734 1033  150  0  0 87 13  0
> 0  5      0  47668     39    440    0    0     0  6554 1029  134  0  0 87 12  0
> 0  5      0  47711     39    399    0    0     1  9832 1027  146  0  0 83 17  0
> 0  4      0  47772     39    340    0    0     0 13006 1036  149  0  0 85 15  0
> 0  5      0  47841     39    275    0    0     0 13110 1037  164  0  0 86 14  0
> 0  5      0  47909     39    212    0    0     0  6754 1037  310  0  0 87 13  0
> 0  1      0  46894     39   1186    0    0     0 11842 1031 1825  0  1 93  6  0
> 0  4      0  46892     39   1186    0    0     0 27131 1055  104  0  0 95  5  0
> 0  4      0  46893     39   1186    0    0     0  9834 1025  110  0  0 93  6  0
> 0  5      0  46893     39   1186    0    0     0  6557 1031  105  0  0 91  9  0
> 0  5      0  46893     39   1186    0    0     0  9732 1022  114  0  0 93  7  0
> 0  5      0  46894     39   1186    0    0     0  6554 1021  104  0  0 96  4  0
> 0  5      0  46893     39   1186    0    0     0  9837 1035  120  0  0 91  9  0
> 0  4      0  46893     39   1186    0    0     0 13010 1039  103  0  0 96  4  0
> 0  4      0  46893     39   1186    0    0     0 16283 1043  115  0  0 92  8  0
> 0  4      0  46894     39   1186    0    0     0  9832 1033  103  0  0 92  8  0
> ------
>
> And attached is a look at "iostat -m -x 5".
>
> Here is some basic system information.
>
> ------
> [root@lnx4103 ~]# cat /proc/cpuinfo
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
> stepping	: 2
> cpu MHz		: 2933.540
> cache size	: 12288 KB
> physical id	: 0
> siblings	: 12
> core id		: 0
> cpu cores	: 6
> apicid		: 0
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
> bogomips	: 5867.08
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management: [8]
>
> processor	: 1
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
> stepping	: 2
> cpu MHz		: 2933.540
> cache size	: 12288 KB
> physical id	: 0
> siblings	: 12
> core id		: 1
> cpu cores	: 6
> apicid		: 2
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
> bogomips	: 5866.57
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management: [8]
>
> ...
>
> processor	: 22
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
> stepping	: 2
> cpu MHz		: 2933.540
> cache size	: 12288 KB
> physical id	: 1
> siblings	: 12
> core id		: 9
> cpu cores	: 6
> apicid		: 51
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
> bogomips	: 5866.79
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management: [8]
>
> processor	: 23
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
> stepping	: 2
> cpu MHz		: 2933.540
> cache size	: 12288 KB
> physical id	: 1
> siblings	: 12
> core id		: 10
> cpu cores	: 6
> apicid		: 53
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
> bogomips	: 5866.95
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management: [8]
> ------
>
> [root@lnx4103 ~]# lspci
> 00:00.0 Host bridge: Intel Corporation 5500 I/O Hub to ESI Port (rev 22)
> 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
> 00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 22)
> 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
> 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
> 00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 22)
> 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
> 00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 22)
> 00:10.0 PIC: Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 (rev 22)
> 00:10.1 PIC: Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 (rev 22)
> 00:11.0 PIC: Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 (rev 22)
> 00:11.1 PIC: Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 (rev 22)
> 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)
> 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
> 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
> 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
> 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
> 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
> 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
> 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
> 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
> 00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
> 00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6
> 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
> 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
> 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
> 00:1d.3 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
> 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
> 00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
> 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
> 00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2
> 01:01.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10)
> 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
> 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
> ------
>
>

-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[log in to unmask]  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

ATOM RSS1 RSS2