SCIENTIFIC-LINUX-USERS Archives

October 2008

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
dave peck <[log in to unmask]>
Reply To:
Date:
Fri, 3 Oct 2008 18:51:29 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (163 lines)
Hi Jon,
On Fri, 2008-10-03 at 01:55 +0100, Jon Peatfield wrote:
> On Wed, 1 Oct 2008, dave peck wrote:
> 
> <snip>
> >
> > Well, precisely the same thing as the package install/update did:
> >
> > $ rpm -q --scripts kernel | grep mkinit
> > /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.10.el5 || exit $?
> > /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.13.el5 || exit $?
> > $
> > $ sudo /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.13.el5
> > /sbin/mkinitrd: line 368: cd: slaves: No such file or directory
> 
> That line is trying to cd into a dm-*/slaves sub-directory.  I don't see 
> why it is failing (the dm- directory must exist to pass the test 3 lines 
> above the cd).
> 
> What does running:
> 
>    ls -ld /sys/block/dm-*/slaves
> 
> show on the systems which fail?
> 

The systems are all pretty much configured identically to make things
easy for us (me), but this is what I'm seeing from the running config on
our boxes:

$ ls -ld /sys/block/dm-*/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:27 /sys/block/dm-0/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-10/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-11/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-12/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-13/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-1/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-2/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-3/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-4/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-5/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-6/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-7/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-8/slaves
drwxr-xr-x 2 root root 0 Oct  2 17:58 /sys/block/dm-9/slaves

> The mkinitrd script should probably check that the cd worked, but to be 
> honest I've yet to see a /sys/block/dm-*/ directory without a slaves 
> sub-directory...
> 

Well, no neither have I, so it seems a fair assumption that the cd to
'slaves; worked correctly, but after a fair bit of poking at this, that
assumption doesn't necessarily appear to be the source of the problem.

> (line 368 is near the top of the definition of findstoragedriverinsys() 
> which gets called from findstoragedriver() at least some of the time).
> 

Yes, and the cd almost certainly worked. The pertinent chunk of code
appears to be:

        while [ ! -L device ]; do
            if [ -L subsystem ]; then
                cd slaves
       	        for x in *;do
        	    if [ -L $x ]; then
	                cd $x;
        	        break
               	    fi;
	        done
            fi
	done

So, we have done the cd to '/sys/block/dm-0/slaves/' and find:

        $ ls -l /sys/block/dm-0/slaves/
        total 0
        lrwxrwxrwx 1 root root 0 Oct  3 17:01 md1 -> ../../../block/md1

The only file in the directory is md1 which is a link that we follow and
'cd' to '/sys/block/md1' and break. In this directory, we see:

        $ ls -l /sys/block/md1/
        total 0
        -r--r--r-- 1 root root 4096 Oct  3 17:09 dev
        drwxr-xr-x 2 root root    0 Oct  2 17:58 holders
        drwxr-xr-x 4 root root    0 Oct  3 00:24 md
        -r--r--r-- 1 root root 4096 Oct  3 17:09 range
        -r--r--r-- 1 root root 4096 Oct  3 17:09 removable
        -r--r--r-- 1 root root 4096 Oct  3 17:09 size
        drwxr-xr-x 2 root root    0 Oct  2 17:40 slaves
        -r--r--r-- 1 root root 4096 Oct  3 17:09 stat
        lrwxrwxrwx 1 root root    0 Oct  3 17:09 subsystem
        -> ../../block
        --w------- 1 root root 4096 Oct  3 17:09 uevent
        
So far so good, and at this point we are still in the while loop and
again cd to slaves; which contains:

   $ ls -l /sys/block/md1/slaves
   total 0
   lrwxrwxrwx 1 root root 0 Oct  3 17:12 sda2 -> ../../../block/sda/sda2
   lrwxrwxrwx 1 root root 0 Oct  3 17:12 sdc2 -> ../../../block/sdc/sdc2

The code then does a cd to the link file sda2 which points
to /sys/block/sda/sda2 and again we break. This directory contains:

   $ ls -l /sys/block/sda/sda2
   total 0
   -r--r--r-- 1 root root 4096 Oct  3 17:15 dev
   drwxr-xr-x 2 root root    0 Oct  2 17:58 holders
   -r--r--r-- 1 root root 4096 Oct  3 17:15 size
   -r--r--r-- 1 root root 4096 Oct  3 17:15 start
   -r--r--r-- 1 root root 4096 Oct  3 17:15 stat
   lrwxrwxrwx 1 root root    0 Oct  3 17:15 subsystem -> ../../../block
   --w------- 1 root root 4096 Oct  3 17:15 uevent

We are still in the while loop so we again test and see that the file
'subsystem' exists and is a symbolic link. It is, so we 'blindly'
attempt to cd to slaves--which doesn't exist--and we emit the error
message, but remain in the '/sys/block/sda/sda2' directory. The code
then looks at each file in the directory, believing the cd completed
properly, and the first symbolic link file it finds is 'subsystem',
pointing to '/sys/block/', which we then cd to and then again break:

       	        for x in *;do
        	    if [ -L $x ]; then
	                cd $x;
        	        break
               	    fi;
	        done

The problem is that '/sys/block' matches the while condition; i.e., no
linked file named 'device' but also has no linked file matching
'subsystem' and we hang in the now infinite loop, and fully occupying
one full CPU (my workstation only has only two) running the expression:

        while [ ! -L device ]; do
            if [ -L subsystem ]; then
...
            fi
	done

So. I can see what has gone awry with the mkinitrd script, but I'm not
sure I know how to fix it, and specifically the findstoragedriverinsys
function--I've built several unbootable initrd images trying to sort
this out but it seems a bit of a tangled mess at this point.

Do you know what directory should I be in after this 'while' loop
completes to successfully allow mkinitrd run to completion? This small
bit of knowledge might help me make a proper patch for our systems.

>   -- Jon
> 

Jon,

Thank you for your interest in this (my) problem and my very best
regards,

    ==> dave

ATOM RSS1 RSS2