Hi Jon, On Fri, 2008-10-03 at 01:55 +0100, Jon Peatfield wrote: > On Wed, 1 Oct 2008, dave peck wrote: > > <snip> > > > > Well, precisely the same thing as the package install/update did: > > > > $ rpm -q --scripts kernel | grep mkinit > > /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.10.el5 || exit $? > > /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.13.el5 || exit $? > > $ > > $ sudo /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod --install 2.6.18-92.1.13.el5 > > /sbin/mkinitrd: line 368: cd: slaves: No such file or directory > > That line is trying to cd into a dm-*/slaves sub-directory. I don't see > why it is failing (the dm- directory must exist to pass the test 3 lines > above the cd). > > What does running: > > ls -ld /sys/block/dm-*/slaves > > show on the systems which fail? > The systems are all pretty much configured identically to make things easy for us (me), but this is what I'm seeing from the running config on our boxes: $ ls -ld /sys/block/dm-*/slaves drwxr-xr-x 2 root root 0 Oct 2 17:27 /sys/block/dm-0/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-10/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-11/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-12/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-13/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-1/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-2/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-3/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-4/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-5/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-6/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-7/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-8/slaves drwxr-xr-x 2 root root 0 Oct 2 17:58 /sys/block/dm-9/slaves > The mkinitrd script should probably check that the cd worked, but to be > honest I've yet to see a /sys/block/dm-*/ directory without a slaves > sub-directory... > Well, no neither have I, so it seems a fair assumption that the cd to 'slaves; worked correctly, but after a fair bit of poking at this, that assumption doesn't necessarily appear to be the source of the problem. > (line 368 is near the top of the definition of findstoragedriverinsys() > which gets called from findstoragedriver() at least some of the time). > Yes, and the cd almost certainly worked. The pertinent chunk of code appears to be: while [ ! -L device ]; do if [ -L subsystem ]; then cd slaves for x in *;do if [ -L $x ]; then cd $x; break fi; done fi done So, we have done the cd to '/sys/block/dm-0/slaves/' and find: $ ls -l /sys/block/dm-0/slaves/ total 0 lrwxrwxrwx 1 root root 0 Oct 3 17:01 md1 -> ../../../block/md1 The only file in the directory is md1 which is a link that we follow and 'cd' to '/sys/block/md1' and break. In this directory, we see: $ ls -l /sys/block/md1/ total 0 -r--r--r-- 1 root root 4096 Oct 3 17:09 dev drwxr-xr-x 2 root root 0 Oct 2 17:58 holders drwxr-xr-x 4 root root 0 Oct 3 00:24 md -r--r--r-- 1 root root 4096 Oct 3 17:09 range -r--r--r-- 1 root root 4096 Oct 3 17:09 removable -r--r--r-- 1 root root 4096 Oct 3 17:09 size drwxr-xr-x 2 root root 0 Oct 2 17:40 slaves -r--r--r-- 1 root root 4096 Oct 3 17:09 stat lrwxrwxrwx 1 root root 0 Oct 3 17:09 subsystem -> ../../block --w------- 1 root root 4096 Oct 3 17:09 uevent So far so good, and at this point we are still in the while loop and again cd to slaves; which contains: $ ls -l /sys/block/md1/slaves total 0 lrwxrwxrwx 1 root root 0 Oct 3 17:12 sda2 -> ../../../block/sda/sda2 lrwxrwxrwx 1 root root 0 Oct 3 17:12 sdc2 -> ../../../block/sdc/sdc2 The code then does a cd to the link file sda2 which points to /sys/block/sda/sda2 and again we break. This directory contains: $ ls -l /sys/block/sda/sda2 total 0 -r--r--r-- 1 root root 4096 Oct 3 17:15 dev drwxr-xr-x 2 root root 0 Oct 2 17:58 holders -r--r--r-- 1 root root 4096 Oct 3 17:15 size -r--r--r-- 1 root root 4096 Oct 3 17:15 start -r--r--r-- 1 root root 4096 Oct 3 17:15 stat lrwxrwxrwx 1 root root 0 Oct 3 17:15 subsystem -> ../../../block --w------- 1 root root 4096 Oct 3 17:15 uevent We are still in the while loop so we again test and see that the file 'subsystem' exists and is a symbolic link. It is, so we 'blindly' attempt to cd to slaves--which doesn't exist--and we emit the error message, but remain in the '/sys/block/sda/sda2' directory. The code then looks at each file in the directory, believing the cd completed properly, and the first symbolic link file it finds is 'subsystem', pointing to '/sys/block/', which we then cd to and then again break: for x in *;do if [ -L $x ]; then cd $x; break fi; done The problem is that '/sys/block' matches the while condition; i.e., no linked file named 'device' but also has no linked file matching 'subsystem' and we hang in the now infinite loop, and fully occupying one full CPU (my workstation only has only two) running the expression: while [ ! -L device ]; do if [ -L subsystem ]; then ... fi done So. I can see what has gone awry with the mkinitrd script, but I'm not sure I know how to fix it, and specifically the findstoragedriverinsys function--I've built several unbootable initrd images trying to sort this out but it seems a bit of a tangled mess at this point. Do you know what directory should I be in after this 'while' loop completes to successfully allow mkinitrd run to completion? This small bit of knowledge might help me make a proper patch for our systems. > -- Jon > Jon, Thank you for your interest in this (my) problem and my very best regards, ==> dave