SCIENTIFIC-LINUX-USERS Archives

March 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Paul Robert Marino <[log in to unmask]>
Reply To:
Paul Robert Marino <[log in to unmask]>
Date:
Fri, 29 Mar 2013 11:12:11 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (120 lines)
well openmip is the app that executes it so thats where the limitation
is probably coming from.
With a little time on Google you will find plenty of posts on the
subject of openmpi not being able to take advantage of all the
resources available to it.
The problem is Ive never seen an answer as to why, not that I looked
all that long. Most of the suggestions talk about the ulimit setting
which on the surface makes some sense but those numbers aren't right
for an issue caused by a ulimit. the other  the most of the openmpi
users who have asked the question and got told it was ulimits said
latter that adjusting the ulimits didn't fix their issues. so again it
sounds like a problem in the code for either openmpi or the code you
are trying to execute with it.
but the only other possibility is maybe SELinux is preventing
something that capping the memory somehow as a side effect but i doubt
it.




On Thu, Mar 28, 2013 at 12:39 PM, Duke Nguyen <[log in to unmask]> wrote:
> On 3/28/13 9:00 PM, Paul Robert Marino wrote:
>>
>> kernel.shmmax does nothing if you don't bump up kernel.shmall
>> accordingly but I can tell you the cap is something wrong with your
>> application not the OS.
>> at one time I supported an application that in normal operation used
>> 64BG Resident memory per instance.
>> And currently my PostgreSQL servers often spike to as much as 2GB of
>> ram per connection and would use more if i didn't cap it there in the
>> configurations.
>
>
> Interesting, I never knew of any server process that takes that much of
> memory. Anyway, it is good to know :).
>
>
>>
>> I don't think the kernel settings are your problem what language is
>> the application written in?
>> Is it executed by an other process like Apache or Tomcat for example?
>
>
> The app (a material simulation app) is just an input file which will calling
> abinit (http://www.abinit.org/) using openmpi to run. So it is executed by
> abinit. At the time the app runs, we make sure that no other process
> (apache, tomcat etc...) is running, so basically the app should take all
> available memory.
>
> Thanks,
>
> D.
>
>
>>
>>
>> On Wed, Mar 27, 2013 at 11:09 PM, Duke Nguyen <[log in to unmask]> wrote:
>>>
>>> On 3/27/13 11:52 PM, Attilio De Falco wrote:
>>>>
>>>> Just a stab in the dark, but did you check the Shared Memory kernel
>>>> parameter (shmmax), type "cat /proc/sys/kernel/shmmax".  We have it set
>>>> very
>>>> high so that any process/thread can use as much memory as it needs.  You
>>>> set
>>>> the limit to 1 GB without rebooting by typing "echo 1073741824 >
>>>> /proc/sys/kernel/shmmax"  or modify /etc/sysctl.conf and add the line
>>>> "kernel.shmmax = 1073741824" so remains after a reboot.  I'm not sure
>>>> about
>>>> abinit but some fortran programs need shmmax limit to be set high…
>>>
>>>
>>> Hi Attilio, we already had it at very high value (not sure why, I never
>>> changed/edited this value before)
>>>
>>> [root@biobos:~]# sysctl -p
>>> net.ipv4.ip_forward = 1
>>> net.ipv4.conf.default.rp_filter = 1
>>> net.ipv4.conf.default.accept_source_route = 0
>>> kernel.sysrq = 0
>>> kernel.core_uses_pid = 1
>>> net.ipv4.tcp_syncookies = 1
>>> error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
>>> error: "net.bridge.bridge-nf-call-iptables" is an unknown key
>>> error: "net.bridge.bridge-nf-call-arptables" is an unknown key
>>> kernel.msgmnb = 65536
>>> kernel.msgmax = 65536
>>> kernel.shmmax = 68719476736
>>> kernel.shmall = 4294967296
>>> [root@biobos:~]# cat /proc/sys/kernel/shmmax
>>> 68719476736
>>>
>>> Any other suggestions?
>>>
>>>
>>>> On Mar 26, 2013, at 9:59 PM, Duke Nguyen <[log in to unmask]> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> We have SL6.3 64bit installed on a box with two quad core and 8GB RAM.
>>>>> We
>>>>> installed openmpi, Intel Studio XE and abinit to run parallel (8
>>>>> cores/processes) some of our applications. To our surprise, the system
>>>>> usually takes only about half of available memory (about 500MB each
>>>>> core)
>>>>> and then the job/task was killed with the low-resource error.
>>>>>
>>>>> We dont really understand why there is a cap of "512MB" (I guess it
>>>>> would
>>>>> be 512MB instead of 500MB) for each of our cores whereas in theory,
>>>>> each of
>>>>> the core should be able to run up to 1GB. Any
>>>>> suggestions/comments/experience about this issue?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> D.
>>>>>
>

ATOM RSS1 RSS2