SCIENTIFIC-LINUX-USERS Archives

March 2015

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Arnau Bria <[log in to unmask]>
Reply To:
Arnau Bria <[log in to unmask]>
Date:
Thu, 12 Mar 2015 09:31:02 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (40 lines)
On Wed, 11 Mar 2015 09:46:37 -0500
Andy Wettstein wrote:

> Hi,
Hi Andy,
 
> I've seen a similar problem with Slurm on various kernels:
> http://bugs.schedmd.com/show_bug.cgi?id=1242

this is not the same issue as we are seeing:
- In our case the system reboots.
- I see it when many jobs finish at the same time, not when jobs
finish one by one.

the cgroups thing has been working until last kernel upgrade.

> This is likely a kernel bug that has existed for a long time. I found
> a mailing list message from November of 2011 with similar problems:
> https://lists.linux-foundation.org/pipermail/containers/2011-November/028382.html

well, in my case it works perfectly with "old" kernel
2.6.32-431.29.2.el6.x86_64, so seems that something has been fixed since 2011.

> I finally decided to just disable cgroup enforcement in slurm and use
> an alternate slurm method for killing jobs that go over the memory
> limit.

I use cgroups no only for limiting the memory usage, I like the
resource usage isolation (cpusets).

> I did not file a bug with redhat at the time.

Seems that RH accepted Andrea's bug, so seems that there is something
wrong there. 


> Andy
Cheers,
Arnau

ATOM RSS1 RSS2