Subject: | |
From: | |
Reply To: | |
Date: | Tue, 17 Jul 2012 14:00:21 -0600 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On 07/17/2012 11:46 AM, Stephan Wiesand wrote:
> On Jul 17, 2012, at 19:22 , Orion Poplawski wrote:
>
>> Our SL6.2 KVM and nfs/backup server has been crashing frequently recently (starting around Fri 13th - yikes!) with Kernel panic - Out of memory and no killable processes. The server has 48GB ram, 2GB swap, only about 15GB dedicated to VM guests. I've tried bumping up vm.min_free_kbytes to 262144 to no avail. Nothing strange is getting written to the logs before the crash.
Hmm, I suppose bumping up min_free_kbytes might be making things worse?
>> Happening with both 2.6.32-220.23.1 and 2.6.32-279.1.1.
>>
>> Anyone else seeing this?
>
> Not on our KVM servers (which don't have any other duties though), which have been running -220.23.1 for three weeks.
>
>> Any other ideas?
>
> Is swap space sufficient?
It was 2GB, but barely used. The system should have way more RAM than needed.
Upped to 8GB.
>
> Have you modified vm.overcommit_* ? Doing so may help turning the panics into allocation failures that can be handled.
>
Haven't modified them:
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.nr_overcommit_hugepages = 0
I suppose:
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
would limit total RAM usage to about 46.4GB which should be safe. I might try
that next.
> Do any slab pools keep growing, to an unusual size?
>
Here's what I have shortly after reboot. I'll keep watching it.
Active / Total Objects (% used) : 1500116 / 1526912 (98.2%)
Active / Total Slabs (% used) : 37344 / 37481 (99.6%)
Active / Total Caches (% used) : 134 / 204 (65.7%)
Active / Total Size (% used) : 147024.04K / 152389.66K (96.5%)
Minimum / Average / Maximum Object : 0.02K / 0.10K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
123540 123283 99% 0.19K 6177 20 24708K size-192
236059 235736 99% 0.06K 4001 59 16004K ksm_rmap_item
484128 484094 99% 0.02K 3362 144 13448K avtab_node
203 203 100% 32.12K 203 1 12992K kmem_cache
341936 341360 99% 0.03K 3053 112 12212K size-32
14136 14109 99% 0.58K 2356 6 9424K inode_cache
71595 71595 100% 0.10K 1935 37 7740K buffer_head
31500 31078 98% 0.19K 1575 20 6300K dentry
10857 10857 100% 0.55K 1551 7 6204K radix_tree_node
5140 4839 94% 1.00K 1285 4 5140K size-1024
4480 4462 99% 1.00K 1120 4 4480K ext4_inode_cache
5772 5684 98% 0.62K 962 6 3848K proc_inode_cache
24300 24269 99% 0.14K 900 27 3600K sysfs_dir_cache
1558 1348 86% 2.00K 779 2 3116K size-2048
13794 13421 97% 0.20K 726 19 2904K vm_area_struct
1074 1050 97% 2.59K 358 3 2864K task_struct
699 699 100% 4.00K 699 1 2796K size-4096
975 951 97% 2.06K 325 3 2600K sighand_cache
4536 3262 71% 0.50K 567 8 2268K size-512
17 17 100% 128.00K 17 1 2176K size-131072
27401 27229 99% 0.07K 517 53 2068K selinux_inode_security
2255 2232 98% 0.78K 451 5 1804K shmem_inode_cache
22007 21365 97% 0.06K 373 59 1492K size-64
10950 9234 84% 0.12K 365 30 1460K size-128
22 22 100% 64.00K 22 1 1408K size-65536
326 326 100% 4.00K 326 1 1304K biovec-256
5720 4125 72% 0.19K 286 20 1144K filp
1020 985 96% 1.00K 255 4 1020K signal_cache
After some disk activity I'm at:
Active / Total Objects (% used) : 4829537 / 4855308 (99.5%)
Active / Total Slabs (% used) : 163899 / 163988 (99.9%)
Active / Total Caches (% used) : 132 / 204 (64.7%)
Active / Total Size (% used) : 630344.49K / 634988.70K (99.3%)
Minimum / Average / Maximum Object : 0.02K / 0.13K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
2918893 2918736 99% 0.10K 78889 37 315556K buffer_head
112616 112599 99% 1.00K 28154 4 112616K ext4_inode_cache
98637 98563 99% 0.55K 14091 7 56364K radix_tree_node
165540 165060 99% 0.19K 8277 20 33108K dentry
123520 123363 99% 0.19K 6176 20 24704K size-192
236059 235736 99% 0.06K 4001 59 16004K ksm_rmap_item
484128 484094 99% 0.02K 3362 144 13448K avtab_node
203 203 100% 32.12K 203 1 12992K kmem_cache
342384 341570 99% 0.03K 3057 112 12228K size-32
139019 138470 99% 0.07K 2623 53 10492K selinux_inode_security
Still watching it...
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder Office FAX: 303-415-9702
3380 Mitchell Lane [log in to unmask]
Boulder, CO 80301 http://www.nwra.com
|
|
|