On 18/07/12 18:35, Orion Poplawski wrote:
> On 07/17/2012 11:22 AM, Orion Poplawski wrote:
>> Our SL6.2 KVM and nfs/backup server has been crashing frequently recently
>> (starting around Fri 13th - yikes!) with Kernel panic - Out of memory
>> and no
>> killable processes. The server has 48GB ram, 2GB swap, only about 15GB
>> dedicated to VM guests. I've tried bumping up vm.min_free_kbytes to
>> 262144 to
>> no avail. Nothing strange is getting written to the logs before the
>> crash.
>>
>> Happening with both 2.6.32-220.23.1 and 2.6.32-279.1.1.
>>
>> Anyone else seeing this? Any other ideas? I've set a serial console
>> log to
>> try to catch more information the next time it happens.
>>
>
> here we go, see below. This makes no sense to me.
>
>
> lvm invoked oom-killer: gfp_mask=0x201d0, order=0, oom_adj=0,
> oom_score_adj=0
> lvm cpuset=/ mems_allowed=0
> Pid: 3400, comm: lvm Not tainted 2.6.32-279.1.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810c4981>] ? cpuset_print_task_mems_allowed+0x91/0xb0
> [<ffffffff811170f0>] ? dump_header+0x90/0x1b0
> [<ffffffff8121470c>] ? security_real_capable_noaudit+0x3c/0x70
> [<ffffffff81117572>] ? oom_kill_process+0x82/0x2a0
> [<ffffffff811174b1>] ? select_bad_process+0xe1/0x120
> [<ffffffff811179b0>] ? out_of_memory+0x220/0x3c0
> [<ffffffff811b3380>] ? blkdev_get_block+0x0/0x70
> [<ffffffff811276ce>] ? __alloc_pages_nodemask+0x89e/0x940
> [<ffffffff8115c1ea>] ? alloc_pages_current+0xaa/0x110
> [<ffffffff811144f7>] ? __page_cache_alloc+0x87/0x90
> [<ffffffff81113ede>] ? find_get_page+0x1e/0xa0
> [<ffffffff8111606b>] ? do_read_cache_page+0x4b/0x180
> [<ffffffff811b4330>] ? blkdev_readpage+0x0/0x20
> [<ffffffff811161e9>] ? read_cache_page_async+0x19/0x20
> [<ffffffff811161fe>] ? read_cache_page+0xe/0x20
> [<ffffffff811ecaa0>] ? read_dev_sector+0x30/0xa0
> [<ffffffff811edc5d>] ? amiga_partition+0x6d/0x460
^^^^^^^^^^^^^^^
wtf!?! What kind of partition tables and file systems do you use? This
OOM kill seems to be caused by the amiga partition table code in the
kernel. It looks like it's some LVM command causing this to happen
somehow, though.
(more coming lower down)
> [<ffffffff811161e9>] ? read_cache_page_async+0x19/0x20
> [<ffffffff811ecaa0>] ? read_dev_sector+0x30/0xa0
> [<ffffffff811ef1ac>] ? osf_partition+0x6c/0x120
> [<ffffffff811ed7d7>] ? rescan_partitions+0x1a7/0x470
> [<ffffffff811b4ab6>] ? __blkdev_get+0x1b6/0x3c0
> [<ffffffff811b4ce0>] ? blkdev_open+0x0/0xc0
> [<ffffffff811b4cd0>] ? blkdev_get+0x10/0x20
> [<ffffffff811b4d51>] ? blkdev_open+0x71/0xc0
> [<ffffffff8117889a>] ? __dentry_open+0x10a/0x360
> [<ffffffff8121c272>] ? selinux_inode_permission+0x72/0xb0
> [<ffffffff812142af>] ? security_inode_permission+0x1f/0x30
> [<ffffffff81178c04>] ? nameidata_to_filp+0x54/0x70
> [<ffffffff8118c110>] ? do_filp_open+0x6c0/0xd60
> [<ffffffff81198192>] ? alloc_fd+0x92/0x160
> [<ffffffff81178649>] ? do_sys_open+0x69/0x140
> [<ffffffff81178760>] ? sys_open+0x20/0x30
> [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Node 0 DMA32 per-cpu:
> CPU 0: hi: 42, btch: 7 usd: 23
> active_anon:49 inactive_anon:97 isolated_anon:0
> active_file:0 inactive_file:0 isolated_file:0
> unevictable:3846 dirty:0 writeback:0 unstable:0
> free:412 slab_reclaimable:1194 slab_unreclaimable:5681
> mapped:356 shmem:0 pagetables:31 bounce:0
> Node 0 DMA free:224kB min:0kB low:0kB high:0kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:328kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 125 125 125
> Node 0 DMA32 free:1424kB min:1428kB low:1784kB high:2140kB
> active_anon:196kB inactive_anon:388kB active_file:0kB inactive_file:0kB
> unevictable:15384kB isolated(anon):0kB isolated(file):0kB
> present:128256kB mlocked:0kB dirty:0kB writeback:0kB mapped:1424kB
> shmem:0kB slab_reclaimable:4776kB slab_unreclaimable:22724kB
> kernel_stack:600kB pagetables:124kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 224kB
> Node 0 DMA32: 0*4kB 2*8kB 2*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB
> 1*1024kB 0*2048kB 0*4096kB = 1424kB
> 3846 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
Now this is concerning ... you're out of swap, unless that's disabled.
> 45035 pages RAM
> 16585 pages reserved
> 359 pages shared
> 23771 pages non-shared
> [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> [ 3400] 0 3400 5650 439 0 0 0 lvm
> Out of memory: Kill process 3400 (lvm) score 1 or sacrifice child
> Killed process 3400, UID 0, (lvm) total-vm:22600kB, anon-rss:528kB,
> file-rss:1228kB
> unknown partition table
> KILL
Now, here LVM got killed in the middle of doing some partition table
checks (found above) ... and it complains about some unknown partition
tables as well.
> Activating logicinit invoked oom-killer: gfp_mask=0x84d0, order=0,
> oom_adj=0, oom_score_adj=0
> al volumes
> init cpuset=/ mems_allowed=0
> Pid: 3401, comm: init Not tainted 2.6.32-279.1.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810c4981>] ? cpuset_print_task_mems_allowed+0x91/0xb0
> [<ffffffff811170f0>] ? dump_header+0x90/0x1b0
> [<ffffffff8121470c>] ? security_real_capable_noaudit+0x3c/0x70
> [<ffffffff81117572>] ? oom_kill_process+0x82/0x2a0
> [<ffffffff811174b1>] ? select_bad_process+0xe1/0x120
> [<ffffffff811179b0>] ? out_of_memory+0x220/0x3c0
> [<ffffffff811276ce>] ? __alloc_pages_nodemask+0x89e/0x940
> [<ffffffff8115c1ea>] ? alloc_pages_current+0xaa/0x110
> [<ffffffff81048aab>] ? pte_alloc_one+0x1b/0x50
> [<ffffffff8113af22>] ? __pte_alloc+0x32/0x160
> [<ffffffff8113fd79>] ? handle_mm_fault+0x149/0x2b0
> [<ffffffff81044479>] ? __do_page_fault+0x139/0x480
> [<ffffffff8150327e>] ? do_page_fault+0x3e/0xa0
> [<ffffffff81500635>] ? page_fault+0x25/0x30
Here /sbin/init fails to allocate memory too.
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Node 0 DMA32 per-cpu:
> CPU 0: hi: 42, btch: 7 usd: 11
> active_anon:1 inactive_anon:16 isolated_anon:0
> active_file:1 inactive_file:0 isolated_file:0
> unevictable:3846 dirty:0 writeback:0 unstable:0
> free:410 slab_reclaimable:1192 slab_unreclaimable:5685
> mapped:50 shmem:1 pagetables:6 bounce:0
> Node 0 DMA free:224kB min:0kB low:0kB high:0kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:328kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 125 125 125
> Node 0 DMA32 free:1416kB min:1428kB low:1784kB high:2140kB
> active_anon:4kB inactive_anon:64kB active_file:4kB inactive_file:0kB
> unevictable:15384kB isolated(anon):0kB isolated(file):0kB
> present:128256kB mlocked:0kB dirty:0kB writeback:0kB mapped:200kB
> shmem:4kB slab_reclaimable:4768kB slab_unreclaimable:22740kB
> kernel_stack:592kB pagetables:24kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 224kB
> Node 0 DMA32: 0*4kB 2*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
> 1*1024kB 0*2048kB 0*4096kB = 1424kB
> 3848 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 45035 pages RAM
> 16585 pages reserved
> 65 pages shared
> 24077 pages non-shared
> [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> [ 3401] 0 3401 288 13 0 0 0 init
> Out of memory: Kill process 3401 (init) score 1 or sacrifice child
> Killed process 3401, UID 0, (init) total-vm:1152kB, anon-rss:52kB,
> file-rss:0kB
> KILL
> init invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0,
> oom_score_adj=0
And OOM killer kills /sbin/init ... that's bad, that's really bad.
> init cpuset=/ mems_allowed=0
> Pid: 1, comm: init Not tainted 2.6.32-279.1.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810c4981>] ? cpuset_print_task_mems_allowed+0x91/0xb0
> [<ffffffff811170f0>] ? dump_header+0x90/0x1b0
> [<ffffffff8121470c>] ? security_real_capable_noaudit+0x3c/0x70
> [<ffffffff81117572>] ? oom_kill_process+0x82/0x2a0
> [<ffffffff811174b1>] ? select_bad_process+0xe1/0x120
> [<ffffffff811179b0>] ? out_of_memory+0x220/0x3c0
> [<ffffffff811276ce>] ? __alloc_pages_nodemask+0x89e/0x940
> [<ffffffff8115c2ea>] ? alloc_pages_vma+0x9a/0x150
> [<ffffffff8113e3fd>] ? do_wp_page+0xfd/0x8d0
> [<ffffffff8113f3ad>] ? handle_pte_fault+0x2cd/0xb50
> [<ffffffff8113fe14>] ? handle_mm_fault+0x1e4/0x2b0
> [<ffffffff81054a04>] ? check_preempt_wakeup+0x1a4/0x260
> [<ffffffff810632c4>] ? enqueue_task_fair+0x64/0x100
> [<ffffffff81044479>] ? __do_page_fault+0x139/0x480
> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120
> [<ffffffff8106a873>] ? do_fork+0x133/0x460
> [<ffffffff81198192>] ? alloc_fd+0x92/0x160
> [<ffffffff81178407>] ? fd_install+0x47/0x90
> [<ffffffff8150327e>] ? do_page_fault+0x3e/0xa0
> [<ffffffff81500635>] ? page_fault+0x25/0x30
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Node 0 DMA32 per-cpu:
> CPU 0: hi: 42, btch: 7 usd: 12
> active_anon:4 inactive_anon:10 isolated_anon:0
> active_file:1 inactive_file:0 isolated_file:0
> unevictable:3846 dirty:0 writeback:0 unstable:0
> free:413 slab_reclaimable:1192 slab_unreclaimable:5685
> mapped:50 shmem:1 pagetables:6 bounce:0
> Node 0 DMA free:224kB min:0kB low:0kB high:0kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:328kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 125 125 125
> Node 0 DMA32 free:1428kB min:1428kB low:1784kB high:2140kB
> active_anon:16kB inactive_anon:40kB active_file:4kB inactive_file:0kB
> unevictable:15384kB isolated(anon):0kB isolated(file):0kB
> present:128256kB mlocked:0kB dirty:0kB writeback:0kB mapped:200kB
> shmem:4kB slab_reclaimable:4768kB slab_unreclaimable:22740kB
> kernel_stack:592kB pagetables:24kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 224kB
> Node 0 DMA32: 3*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
> 1*1024kB 0*2048kB 0*4096kB = 1428kB
> 3848 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 45035 pages RAM
> 16585 pages reserved
> 66 pages shared
> 24074 pages non-shared
> [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> [ 3402] 0 3402 288 13 0 0 0 init
> Out of memory: Kill process 3402 (init) score 1 or sacrifice child
And again ...
> Killed process 3402, UID 0, (init) total-vm:1152kB, anon-rss:52kB,
> file-rss:0kB
> init invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0,
> oom_score_adj=0
> init cpuset=/ mems_allowed=0
> Pid: 1, comm: init Not tainted 2.6.32-279.1.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff810c4981>] ? cpuset_print_task_mems_allowed+0x91/0xb0
> [<ffffffff811170f0>] ? dump_header+0x90/0x1b0
> [<ffffffff8111746e>] ? select_bad_process+0x9e/0x120
> [<ffffffff81117b0a>] ? out_of_memory+0x37a/0x3c0
> [<ffffffff811276ce>] ? __alloc_pages_nodemask+0x89e/0x940
> [<ffffffff8115c2ea>] ? alloc_pages_vma+0x9a/0x150
> [<ffffffff8113e3fd>] ? do_wp_page+0xfd/0x8d0
> [<ffffffff8113f3ad>] ? handle_pte_fault+0x2cd/0xb50
> [<ffffffff8113fe14>] ? handle_mm_fault+0x1e4/0x2b0
> [<ffffffff81054a04>] ? check_preempt_wakeup+0x1a4/0x260
> [<ffffffff810632c4>] ? enqueue_task_fair+0x64/0x100
> [<ffffffff81044479>] ? __do_page_fault+0x139/0x480
> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120
> [<ffffffff8106a873>] ? do_fork+0x133/0x460
> [<ffffffff81198192>] ? alloc_fd+0x92/0x160
> [<ffffffff81178407>] ? fd_install+0x47/0x90
> [<ffffffff8150327e>] ? do_page_fault+0x3e/0xa0
> [<ffffffff81500635>] ? page_fault+0x25/0x30
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Node 0 DMA32 per-cpu:
> CPU 0: hi: 42, btch: 7 usd: 22
> active_anon:4 inactive_anon:10 isolated_anon:0
> active_file:1 inactive_file:0 isolated_file:0
> unevictable:3846 dirty:0 writeback:0 unstable:0
> free:413 slab_reclaimable:1192 slab_unreclaimable:5685
> mapped:50 shmem:1 pagetables:6 bounce:0
> Node 0 DMA free:224kB min:0kB low:0kB high:0kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:328kB mlocked:0kB
> dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 125 125 125
> Node 0 DMA32 free:1428kB min:1428kB low:1784kB high:2140kB
> active_anon:16kB inactive_anon:40kB active_file:4kB inactive_file:0kB
> unevictable:15384kB isolated(anon):0kB isolated(file):0kB
> present:128256kB mlocked:0kB dirty:0kB writeback:0kB mapped:200kB
> shmem:4kB slab_reclaimable:4768kB slab_unreclaimable:22740kB
> kernel_stack:592kB pagetables:24kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 224kB
> Node 0 DMA32: 3*4kB 2*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB
> 1*1024kB 0*2048kB 0*4096kB = 1436kB
> 3848 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 45035 pages RAM
> 16585 pages reserved
> 53 pages shared
> 24075 pages non-shared
> [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> Kernel panic - not syncing: Out of memory and no killable processes...
>
> Pid: 1, comm: init Not tainted 2.6.32-279.1.1.el6.x86_64 #1
> Call Trace:
> [<ffffffff814fd12a>] ? panic+0xa0/0x168
> [<ffffffff8111716d>] ? dump_header+0x10d/0x1b0
> [<ffffffff81117b1f>] ? out_of_memory+0x38f/0x3c0
> [<ffffffff811276ce>] ? __alloc_pages_nodemask+0x89e/0x940
> [<ffffffff8115c2ea>] ? alloc_pages_vma+0x9a/0x150
> [<ffffffff8113e3fd>] ? do_wp_page+0xfd/0x8d0
> [<ffffffff8113f3ad>] ? handle_pte_fault+0x2cd/0xb50
> [<ffffffff8113fe14>] ? handle_mm_fault+0x1e4/0x2b0
> [<ffffffff81054a04>] ? check_preempt_wakeup+0x1a4/0x260
> [<ffffffff810632c4>] ? enqueue_task_fair+0x64/0x100
> [<ffffffff81044479>] ? __do_page_fault+0x139/0x480
> [<ffffffff81060a83>] ? wake_up_new_task+0xd3/0x120
> [<ffffffff8106a873>] ? do_fork+0x133/0x460
> [<ffffffff81198192>] ? alloc_fd+0x92/0x160
> [<ffffffff81178407>] ? fd_install+0x47/0x90
> [<ffffffff8150327e>] ? do_page_fault+0x3e/0xa0
> [<ffffffff81500635>] ? page_fault+0x25/0x30
And finally, the panic()
> Last monitor screens:
>
> ATOP - saga 2012/07/18 10:09:56 -----P
> 10s elapsed
> PRC | sys 2.86s | user 3.46s | #proc 889 | #tslpu 1 |
> #zombie 0 | #exit 98 |
> CPU | sys 25% | user 33% | irq 1% | idle 1535% | wait
> 2% | curscal 72% |
> cpu | sys 1% | user 16% | irq 0% | idle 82% |
> cpu009 w 0% | curscal 100% |
> cpu | sys 1% | user 8% | irq 0% | idle 89% |
> cpu001 w 0% | curscal 70% |
> cpu | sys 4% | user 2% | irq 0% | idle 94% |
> cpu000 w 0% | curscal 70% |
> cpu | sys 6% | user 1% | irq 0% | idle 92% |
> cpu003 w 1% | curscal 70% |
> cpu | sys 2% | user 1% | irq 0% | idle 96% |
> cpu011 w 1% | curscal 70% |
> cpu | sys 2% | user 1% | irq 0% | idle 96% |
> cpu004 w 0% | curscal 70% |
> cpu | sys 2% | user 1% | irq 0% | idle 97% |
> cpu012 w 0% | curscal 70% |
> cpu | sys 1% | user 1% | irq 0% | idle 97% |
> cpu008 w 0% | curscal 70% |
> cpu | sys 1% | user 1% | irq 0% | idle 98% |
> cpu006 w 0% | curscal 70% |
> cpu | sys 1% | user 1% | irq 0% | idle 98% |
> cpu014 w 0% | curscal 70% |
> cpu | sys 1% | user 0% | irq 0% | idle 98% |
> cpu005 w 0% | curscal 70% |
> cpu | sys 1% | user 0% | irq 0% | idle 99% |
> cpu013 w 0% | curscal 70% |
> cpu | sys 0% | user 0% | irq 0% | idle 99% |
> cpu002 w 0% | curscal 70% |
> cpu | sys 0% | user 0% | irq 0% | idle 99% |
> cpu007 w 0% | curscal 70% |
> cpu | sys 0% | user 0% | irq 0% | idle 100% |
> cpu010 w 0% | curscal 70% |
> cpu | sys 0% | user 0% | irq 0% | idle 100% |
> cpu015 w 0% | curscal 70% |
> CPL | avg1 1.02 | avg5 1.16 | avg15 1.53 | csw 67236 | intr
> 45803 | numcpu 16 |
> MEM | tot 47.1G | free 5.5G | cache 24.6G | dirty 0.2M | buff
> 4.2G | slab 6.1G |
> SWP | tot 8.0G | free 8.0G | | | vmcom
> 16.6G | vmlim 31.6G |
> LVM | abbix--disk0 | busy 13% | read 0 | write 299 | MBw/s
> 0.74 | avio 4.29 ms |
> LVM | vg_root-var | busy 11% | read 0 | write 349 | MBw/s
> 0.13 | avio 3.15 ms |
> LVM | pute1--disk0 | busy 0% | read 0 | write 4 | MBw/s
> 0.00 | avio 0.50 ms |
> MDD | md1 | busy 0% | read 0 | write 722 | MBw/s
> 0.87 | avio 0.00 ms |
> DSK | sdc | busy 34% | read 393 | write 271 | MBw/s
> 0.26 | avio 5.13 ms |
> DSK | sdf | busy 34% | read 396 | write 268 | MBw/s
> 0.22 | avio 5.10 ms |
> DSK | sde | busy 34% | read 390 | write 265 | MBw/s
> 0.22 | avio 5.14 ms |
> DSK | sdd | busy 33% | read 389 | write 273 | MBw/s
> 0.26 | avio 5.03 ms |
> DSK | sdh | busy 31% | read 389 | write 238 | MBw/s
> 0.19 | avio 4.97 ms |
> DSK | sdb | busy 30% | read 396 | write 215 | MBw/s
> 0.20 | avio 4.83 ms |
> DSK | sdg | busy 29% | read 392 | write 238 | MBw/s
> 0.19 | avio 4.67 ms |
> DSK | sda | busy 28% | read 397 | write 214 | MBw/s
> 0.20 | avio 4.54 ms |
> NET | transport | tcpi 1481 | tcpo 668 | udpi 147 | udpo
> 142 | tcpao 2 |
> NET | network | ipi 1639 | ipo 820 | ipfrw 0 | deliv
> 1628 | icmpo 6 |
> NET | vnet1 0% | pcki 672 | pcko 701 | si 38 Kbps | so
> 42 Kbps | erro 0 |
> NET | eth0 0% | pcki 1923 | pcko 2769 | si 177 Kbps | so
> 2403 Kbps | erro 0 |
> NET | vnet3 0% | pcki 3 | pcko 78 | si 0 Kbps | so
> 5 Kbps | erro 0 |
> NET | vnet0 0% | pcki 0 | pcko 75 | si 0 Kbps | so
> 5 Kbps | erro 0 |
> NET | vnet4 0% | pcki 0 | pcko 75 | si 0 Kbps | so
> 5 Kbps | erro 0 |
> NET | vnet2 0% | pcki 0 | pcko 75 | si 0 Kbps | so
> 5 Kbps | erro 0 |
> NET | eth1 0% | pcki 18 | pcko 12 | si 3 Kbps | so
> 0 Kbps | erro 0 |
> NET | br0 ---- | pcki 1681 | pcko 807 | si 140 Kbps | so
> 2293 Kbps | erro 0 |
> NET | lo ---- | pcki 6 | pcko 6 | si 0 Kbps | so
> 0 Kbps | erro 0 |
> Write failed: Broken pipe
> [root@orca ~]# SCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S
> CPUNR CPU CMD 1/7
> 4137 4 0.10s 0.40s 1588K 512K 0K 7556K -- - S
> 10 5% qemu-kvm
> 4548 2 0.14s 0.27s 0K 0K 0K 0K -- - S
> 12 4% qemu-kvm
> 4299 9 0.05s 0.03s 0K 0K 0K 16K -- - S
> 14 1% qemu-kvm
> 20921 5 0.01s 0.05s 0K 0K 0K 0K -- -
> S 9 1% qemu-kvm
> 4047 2 0.03s 0.02s 0K 0K 0K 0K -- - S
> 12 1% qemu-kvm
>
>
> slabtop:
> Active / Total Slabs (% used) : 1599178 / 1599216 (100.0%)
> Active / Total Caches (% used) : 132 / 204 (64.7%)
> Active / Total Size (% used) : 6195130.43K / 6274921.86K (98.7%)
> Minimum / Average / Maximum Object : 0.02K / 0.28K / 4096.00K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 7227321 6709793 92% 0.10K 195333 37 781332K buffer_head
> 4242650 4242336 99% 0.07K 80050 53 320200K
> selinux_inode_security
> 4224700 4224638 99% 1.00K 1056175 4 4224700K ext4_inode_cache
This smells a bit bad ... ext4_inode_cache is using a lot of memory ...
> 3257480 3257186 99% 0.19K 162874 20 651496K dentry
> 1324786 1250981 94% 0.06K 22454 59 89816K size-64
> 484128 484094 99% 0.02K 3362 144 13448K avtab_node
> 347088 342539 98% 0.03K 3099 112 12396K size-32
> 342580 324110 94% 0.55K 48940 7 195760K radix_tree_node
> 236059 235736 99% 0.06K 4001 59 16004K ksm_rmap_item
> 123980 123566 99% 0.19K 6199 20 24796K size-192
> 105630 47803 45% 0.12K 3521 30 14084K size-128
> 24300 24261 99% 0.14K 900 27 3600K sysfs_dir_cache
> 17402 15599 89% 0.05K 226 77 904K anon_vma_chain
> 16055 14874 92% 0.20K 845 19 3380K vm_area_struct
> 9844 8471 86% 0.04K 107 92 428K anon_vma
> 8952 8775 98% 0.58K 1492 6 5968K inode_cache
> 7518 5829 77% 0.62K 1253 6 5012K proc_inode_cache
> 6840 4692 68% 0.19K 342 20 1368K filp
> 5888 5532 93% 0.04K 64 92 256K dm_io
>
>
> top - 10:10:02 up 22:34, 4 users, load average: 1.02, 1.15, 1.53
> Tasks: 888 total, 1 running, 887 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.8%us, 1.2%sy, 0.0%ni, 97.9%id, 0.1%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 49421492k total, 43619512k used, 5801980k free, 4409144k buffers
> Swap: 8388600k total, 16308k used, 8372292k free, 25837164k cached
Somehow, this doesn't reflect what the kernel complains about when the
OOM killer starts its mission.
I see that you're using kernel-2.6.32-279.1.1.el6.x86_64 ... that
smells a bit like a SL 6.3 Beta ... is that right? As SL 6.2 is usually
around 2.6.32-220-something. I would probably recommend you to try a
6.2 kernel if you're running something much more bleeding edge.
And it somehow seems to be related to some file system issues ... at
least from what I can see. Could be a bugy kernel which leaks memory,
somewhere in either the parition table code or ext4 code paths.
Not sure I'm able to provide any better clues right now.
kind regards,
David Sommerseth
|