LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

April 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS April 2012

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Multiple terminal windows won't send jobs to individual cores
From:	zxq9 <[log in to unmask]>
Reply To:	zxq9 <[log in to unmask]>
Date:	Fri, 6 Apr 2012 08:22:24 +0900
Content-Type:	text/plain
Parts/Attachments:	text/plain (38 lines)

On 04/06/2012 04:54 AM, Bluejay Adametz wrote:
>> 3. The fact that the tar extraction process is so slow as to be effectively useless, suggest
>> something of a larger problem.
>
> I would expect tar to be more I/O bound.
>
>> If you submit multiple CPU-intensive tasks then you should see multiple
>> cores
>> go to high percent used.  But if your tasks are I/O bound then the CPU %
>> will
>> not hit 100% as the process block for I/O.
>
> hmmm.... perhaps the CPU and CPU scheduling are not the issues.
>
> Can you try some other I/O bound processes, say, backups or just dd
> if=/dev/sdwhatever of=/dev/null and see what happens?
>
> How's memory look?

You're distributing jobs across a networked cluster, correct? In that 
case network based I/O should be the bottleneck, especially if you're 
untarring something that lives on an NFS share to another NFS share -- 
and this would become a serious problem (like 20min to untar something 
tiny) if you've got an addressing collision somewhere in the network or 
a device is connected via a slow physical connection.

The other day we had a major slowdown on one segment of a network, and 
what caught the users' attention was "how slow OpenOffice is today". 
OpenOffice wasn't being slow, but grabbing stuff from the NFS shares 
sure was. It was simply because of a misconfigured Windows laptop 
someone put on the network that was issuing erroneous responses to DHCP 
requests -- which hadn't been an issue until a collision occured.

In a clustered environment I can only imagine you'd be incredibly more 
sensitive to issues like this (and anything else network related).

-z

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV