SCIENTIFIC-LINUX-USERS Archives

January 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Keith Chadwick <[log in to unmask]>
Reply To:
Keith Chadwick <[log in to unmask]>
Date:
Wed, 11 Jan 2012 15:52:36 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (113 lines)
It appears that we can likely eliminate 32/64 bit issues, then.

Some more questions:

Is this 20K command job:
- a sequence of trivially parallel commands,
- an MPI job,
- a job "array",
- or is it a complicated DAG?

Can you capture the qsub(s) commands associated with this job?

Are you sure that the number of systems and number of streams are 
correctly specified?

-Keith.

At 1:39 PM -0800 1/11/12, Wil Irwin wrote:
>Hi-
>
>It is 64-bit on 64-bit. The exact version is from 
>'ge-6.2-bin-lx24-amd64.tar.gz' and 'ge-6.2-common.tar.gz'. So I can 
>rule out that issue.
>
>As for the problems, I can provide more detail, but in brief (sort of):
>
>1. The installation is w/o incident and I have used all the 
>suggested defaults. Out of frustration, I've also installed in a 
>couple of dozen time changing some of the more flexible defaults one 
>at a time.
>
>2. The "simple" job runs as it should.
>
>3. There are 3 nodes (with the master also serving as an executor). 
>All are talking to each other in term of the SGE ports and NFS.
>
>4. My inquire was intended to be general in terms of some possible 
>incompatibility between SGE and SL 6.1, the comment which follow 
>have, unfortunately, the factor of submitting jobs using an analysis 
>application. The script which this application uses is a bit 
>convoluted, but I studied pretty well and, if there is some problem, 
>I don't see it. I have not received any negative feedback from other 
>users of this application. Unfortunately, it really isn't possible 
>to submit the job from this application w/o using the accompanying 
>script. So, of course, there is a bit of black-box factor.
>
>5. One particular job is very large (~20K commands). After the 
>commands are generated and submitted, SGE returns the rather 
>confusing error message of "Unable to run job: job rejected: You try 
>to submit a job with more than 75000 tasks. Exiting." 75000 is the 
>configured limit, but I can readily see the command lines being 
>generated and it is exactly 16900. I would say in general, this is 
>the most perplexing problem.
>6. #5 is accompanied by "failure" email messages, but no 16900 
>messages (I would say many hundred). I can't explain this behavior 
>either. It could actually be an email server issue and not related 
>to SGE, per se.
>
>7. Another example is or will appear to be very specific to the 
>analysis application I am using as opposed to a general SGE issue. 
>For this application, there is an explicit user variable to set the 
>queue, and I have set it to 'verylong.q'. When I submit a much 
>smaller job (~200 commands) to try to figure out what is going 
>wrong, the 'verylong.q' is ignored and 'short.q' is selected. But 
>more curious and more SGE-related is the job will run, but it runs 
>the commands in series and only uses 1 processor on the master node 
>(each node has 6 x 2 cores).
>
>That's a flavor of what is causing my sanity to slowly drift away.
>
>Regards,
>Wil
>
>On Wed, Jan 11, 2012 at 1:00 PM, Keith Chadwick 
><<mailto:[log in to unmask]>[log in to unmask]> wrote:
>
>Are you trying to run either:
>
>        1. A 32 bit version of SGE 6.2 on a 64 bit SL 6.1 system?
>
>or
>
>        2. A 64 bit version of SGE 6.2 on a 32 bit SL 6.1 system?
>
>In the case #1, you should be able to get SGE to run once you install
>the necessary 32 bit compatibility libraries, or (recommended) switch
>to a 64 bit version of SGE 6.2.
>
>In the case #2, you are going to be out of luck...
>
>-Keith.
>
>
>At 12:43 PM -0800 1/11/12, Wil Irwin wrote:
>
>Hello-
>
>I am having unparalleled (no pun intended) problems getting SGE 6.2 
>to run under SL 6.1. I have consulted with others who have quite a 
>bit of experience using SGE on an earlier version of SL, and we 
>cannot determine why it won't run.
>
>Before I list the nature of the problems, I though I would start by 
>asking if anyone has had a successful experience with SGE 6.2 on SL 
>6.1.
>
>I'm running kernel:  2.6.32-220.2.1.el6.x86_64 #1 SMP Thu Dec 22 
>11:15:52 CST 2011 x86_64
>
>Thanks for any help.
>
>-Wil

ATOM RSS1 RSS2