SCIENTIFIC-LINUX-USERS Archives

March 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Duke Nguyen <[log in to unmask]>
Reply To:
Duke Nguyen <[log in to unmask]>
Date:
Sat, 30 Mar 2013 11:24:12 +0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (121 lines)
On 3/29/13 10:12 PM, Paul Robert Marino wrote:
> well openmip is the app that executes it so thats where the limitation
> is probably coming from.
> With a little time on Google you will find plenty of posts on the
> subject of openmpi not being able to take advantage of all the
> resources available to it.
> The problem is Ive never seen an answer as to why, not that I looked
> all that long. Most of the suggestions talk about the ulimit setting
> which on the surface makes some sense but those numbers aren't right
> for an issue caused by a ulimit. the other  the most of the openmpi
> users who have asked the question and got told it was ulimits said
> latter that adjusting the ulimits didn't fix their issues. so again it
> sounds like a problem in the code for either openmpi or the code you
> are trying to execute with it.
> but the only other possibility is maybe SELinux is preventing
> something that capping the memory somehow as a side effect but i doubt
> it.
>

Very useful comments Paul. I am jumping to openmpi forum to ask if they 
are of help. Anyway, is there a way of testing the total memory of the 
system? Any simple bash program (no use of openmpi) that I can try for 
all the cores so that I can know that my system can take up to 8GB RAM?

Thanks,

D.

>
>
> On Thu, Mar 28, 2013 at 12:39 PM, Duke Nguyen <[log in to unmask]> wrote:
>> On 3/28/13 9:00 PM, Paul Robert Marino wrote:
>>> kernel.shmmax does nothing if you don't bump up kernel.shmall
>>> accordingly but I can tell you the cap is something wrong with your
>>> application not the OS.
>>> at one time I supported an application that in normal operation used
>>> 64BG Resident memory per instance.
>>> And currently my PostgreSQL servers often spike to as much as 2GB of
>>> ram per connection and would use more if i didn't cap it there in the
>>> configurations.
>>
>> Interesting, I never knew of any server process that takes that much of
>> memory. Anyway, it is good to know :).
>>
>>
>>> I don't think the kernel settings are your problem what language is
>>> the application written in?
>>> Is it executed by an other process like Apache or Tomcat for example?
>>
>> The app (a material simulation app) is just an input file which will calling
>> abinit (http://www.abinit.org/) using openmpi to run. So it is executed by
>> abinit. At the time the app runs, we make sure that no other process
>> (apache, tomcat etc...) is running, so basically the app should take all
>> available memory.
>>
>> Thanks,
>>
>> D.
>>
>>
>>>
>>> On Wed, Mar 27, 2013 at 11:09 PM, Duke Nguyen <[log in to unmask]> wrote:
>>>> On 3/27/13 11:52 PM, Attilio De Falco wrote:
>>>>> Just a stab in the dark, but did you check the Shared Memory kernel
>>>>> parameter (shmmax), type "cat /proc/sys/kernel/shmmax".  We have it set
>>>>> very
>>>>> high so that any process/thread can use as much memory as it needs.  You
>>>>> set
>>>>> the limit to 1 GB without rebooting by typing "echo 1073741824 >
>>>>> /proc/sys/kernel/shmmax"  or modify /etc/sysctl.conf and add the line
>>>>> "kernel.shmmax = 1073741824" so remains after a reboot.  I'm not sure
>>>>> about
>>>>> abinit but some fortran programs need shmmax limit to be set high…
>>>>
>>>> Hi Attilio, we already had it at very high value (not sure why, I never
>>>> changed/edited this value before)
>>>>
>>>> [root@biobos:~]# sysctl -p
>>>> net.ipv4.ip_forward = 1
>>>> net.ipv4.conf.default.rp_filter = 1
>>>> net.ipv4.conf.default.accept_source_route = 0
>>>> kernel.sysrq = 0
>>>> kernel.core_uses_pid = 1
>>>> net.ipv4.tcp_syncookies = 1
>>>> error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
>>>> error: "net.bridge.bridge-nf-call-iptables" is an unknown key
>>>> error: "net.bridge.bridge-nf-call-arptables" is an unknown key
>>>> kernel.msgmnb = 65536
>>>> kernel.msgmax = 65536
>>>> kernel.shmmax = 68719476736
>>>> kernel.shmall = 4294967296
>>>> [root@biobos:~]# cat /proc/sys/kernel/shmmax
>>>> 68719476736
>>>>
>>>> Any other suggestions?
>>>>
>>>>
>>>>> On Mar 26, 2013, at 9:59 PM, Duke Nguyen <[log in to unmask]> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> We have SL6.3 64bit installed on a box with two quad core and 8GB RAM.
>>>>>> We
>>>>>> installed openmpi, Intel Studio XE and abinit to run parallel (8
>>>>>> cores/processes) some of our applications. To our surprise, the system
>>>>>> usually takes only about half of available memory (about 500MB each
>>>>>> core)
>>>>>> and then the job/task was killed with the low-resource error.
>>>>>>
>>>>>> We dont really understand why there is a cap of "512MB" (I guess it
>>>>>> would
>>>>>> be 512MB instead of 500MB) for each of our cores whereas in theory,
>>>>>> each of
>>>>>> the core should be able to run up to 1GB. Any
>>>>>> suggestions/comments/experience about this issue?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> D.
>>>>>>

ATOM RSS1 RSS2