SCIENTIFIC-LINUX-USERS Archives

November 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephen John Smoogen <[log in to unmask]>
Reply To:
Stephen John Smoogen <[log in to unmask]>
Date:
Wed, 14 Nov 2012 10:37:15 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (85 lines)
On 14 November 2012 10:20, Ken Teh <[log in to unmask]> wrote:
> The common thread is I/O to a MegaRAID raid5 device.  Which is cause for
> concern since the primary function of both machines where I've encountered
> this problem is file-serving.
>
> Perhaps I am just unlucky and have 2 bad MegaRAID cards in a row.  I'm
> trying
> to understand this better, figure out if I am doing something wrong.

Well there are a couple of issues this could be:

1) You are asking more than the MegaRaid is meant to do... it may be
running out of cache, or other resources.
2) The megaraid is still rebuilding its array beneath and you are
hitting a locking problem because it hasn't finished what it needs to
do before you ask it to do something else (really sort of #1).

Most of the time you will need to install the proprietary Megaraid
tools to see what is going on under the disks to find out.

> My procedure is create a RAID 5 volume on the megaraid, do a slow init.
> Reboot
> the system into Linux, write a single large partition with parted, then put
> one or more logical volumes on the drive.
>
> The "hung" problem has cropped up under the following situations:
>
> (1) pvcreate on the disk
>
> (2) mkfs.ext4 on the volumes created on the disk
>
> (3) writes to the filesystem on the disk
>
> It's happened on 2 fileservers each with a megaraid.
>
>
>
>
>
> On 11/14/2012 10:19 AM, Jamie Duncan wrote:
>>
>> is there a specific bug/bugs you're referring to?
>>
>> a hung task means that a process is sitting on a core waiting on a
>> specific bit of I/O for > 120 seconds. Not the length of the entire process,
>> mind you, which depends on countless inputs and outputs to complete, but
>> something on the other side isn't answering for a very long time.  It
>> usually means an unhealthy system at some level.
>>
>>
>> On Wed, Nov 14, 2012 at 11:04 AM, Ken Teh <[log in to unmask]
>> <mailto:[log in to unmask]>> wrote:
>>
>>     I've recently been encountering this problem trying to stand up a
>> large RAID 5 disk server.  My first encounter was when I was doing write
>> speed tests.  I thought I had solved this problem by letting the megaraid
>> card complete a slow init of the volume before trying to create a linux
>> filesystem on it and re-doing my speed measurements.
>>
>>     But I have just now encountered it again on a new RAID 5 volume which
>> I also let complete a slow init over the weekend.  I was in fact trying to
>> do a pvcreate on the volume when it hung.
>>
>>     Can anyone shed some light?  I see posts for it but everything I read
>> suggests it's been taken care of.
>>
>>
>>
>>
>> --
>> Thanks,
>>
>> Jamie Duncan
>> 804.571.0458



-- 
Stephen J Smoogen.
"Don't derail a useful feature for the 99% because you're not in it."
Linus Torvalds
"Years ago my mother used to say to me,... Elwood, you must be oh
so smart or oh so pleasant. Well, for years I was smart. I
recommend pleasant. You may quote me."  —James Stewart as Elwood P. Dowd

ATOM RSS1 RSS2