Subject: | |
From: | |
Reply To: | |
Date: | Fri, 20 Jan 2012 09:19:58 +0000 |
Content-Type: | TEXT/PLAIN |
Parts/Attachments: |
|
|
> SL6 systems, we can't reboot reliably, particularly if someone is logged
> on the console or if the user initiates the shutdown from the desktop.
> Have not seen one hang if no automounts are mounted. The last messages
> on the console are:
>
> Unmounting file systems: [ OK ]
> /home: rcercrcrcrcrcrcrcrcrcrcrcrcrc[...]rcrce
> init: rc main process (19211) killed by KILL signal
>
> The "rc" is repeated for about 3.5 console lines (note "e" near
> beginning and at end is not a typo).
Something simliar happened here 27 Dec 2010 on a batch of SL5.x
64-bit cluster WN, in run level 3 (no Desktop), no automount or NIS (local
pool accounts only), but they did have an NFS mount for /software, and
they were multiclustered with a gpfs storage cluster to mount the /gpfs
storage.
The WN were running
kernel-2.6.18-194.11.4.el5.x86_64
After yum update to kernel-2.6.18-194.26.1.el5.x86_64
(they skipped kernel-2.6.18-194.17.1 update)
then shutdown -r now to make them boot into new kernel, they all hung coming
down with this on console:
sbin: rcercrcrcrcrcrcrcrcrcr
INIT: no more processes left in this runlevel
/bin: rcercrcrcrcrcrcrcrcrcrcr
Could not kill process 2050: no such process
Had never seen that before.
Poke reset button -> they all rebooted fine.
On a 2nd batch of WN (headless) they also did exactly the same,
updating from kernel-2.6.18-194.11.4 to kernel-2.6.18-194.32.1 in Feb'11.
(In both cases in-between kernel updates were skpped; unsure if
relevant). Never happened since.
It was v curious but no time to look into it.
Sympathies if you're seeing this repeatedly.
|
|
|