SCIENTIFIC-LINUX-USERS Archives

September 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephan Wiesand <[log in to unmask]>
Reply To:
Stephan Wiesand <[log in to unmask]>
Date:
Thu, 13 Sep 2012 16:32:00 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (43 lines)
Hello Winnie,

On Sep 13, 2012, at 16:01 , Winnie Lacesso wrote:

> Several times over past few years I've seen user processes "go mad" 
> (programming error) & use all RAM, then all swap (as ganglia so vividly 
> shows), then the box ends up at a kernel panic.
> (Server OS is SL5.x 64-bit BTW)

we rarely see panics in these cases. The box just becomes unusable. Which effectively makes no difference though.

> What's puzzling is, shouldn't the OS by default not allow users to do 
> "something bad enough" to cause grief to the OS?
> 
> Possibly some sort of tuning can fix this, but one expects that, out of 
> the box, this should not be needed, users just can't bring OS down.

There are plenty of ways to bring a *x system down, or make it deny service to users, by making mistakes in userland. Just think of the classic fork bomb.

> In the past on SL4 I did see oom come into play when box too 
> loaded (killing the mysqld process for instance) & am wondering
> why this isn't happening on SL5 with badly behaved user processes.

By any chance, were your SL4 systems mostly 32-bit, and your SL5 systems are mostly 64-bit? As much as I do advocate using 64-bit, I have to admit that the x86-64 kernel seems to be handling OOM situations much worse than x86 used to. And I think it started with SL3 already.

> Grateful for advice!


The one way I know about to reliably prevent these problems is to use syctl to change the value of vm.overcommit_ratio, and possibly adapt vm.overcommit_memory. Both are documented in proc(5).

The problem with this approach is that there's more and more software making very generous use of virtual address space without ever using what was allocated. The current Maple and Oracle's Java come to mind.

Having sufficient swap space does help. We used to set aside only 2GB for swap even on systems with much more RAM, because they weren't supposed to swap/page much at all. But it turns out that having the recommended amount makes systems much more resilient to memory hogs.

Hope this helps,
	Stephan

-- 
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany

ATOM RSS1 RSS2