Hi Stephen,
Replies in-line below.
Thanks,
- Larry
On 3/3/15 11:49 AM, Stephen John Smoogen wrote:
>
> On Mar 3, 2015 8:49 AM, "P. Larry Nelson" <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
> >
> > I am seeing a bizarre bug where an SL6.x system hangs on either
> > shutdown or reboot at the point where it wants to shutdown the
> > loopback interface.
> >
> > Let me start off by saying I'm running a mixed shop of SL5.x servers
> > (DNS, NIS, NTP, DHCP, NFS, etc.) along with a bunch of new cluster-esque
> > nodes running SL6.x. All new SL6 nodes are Dell R410, R510, R710, for
> > whatever that's worth, but I don't believe they have anything to do
> > with the bug, per se.
> >
> > Since building these new SL6 nodes many weeks back, they have all
> > exhibited this extremely annoying habit of hanging on shutdown or
> > reboot at the shutdown of the loopback interface.
> > Eventually (for the most part) they stop spinning whatever wheels
> > they're spinning and do manage to complete either the shutdown or
> > reboot, but it takes upwards of 15, 20, or 30 minutes! Usually
> > I can't wait that long and just do a power off/on of the node.
> >
> > No amount of trying to find out what they are doing has worked,
> > from trying to open another console window (Alt-F1, etc.) at
> > shutdown/reboot to having top running in one terminal window while
> > doing a 'service network restart' in another. Everything just freezes!
> >
> > I tried any number of things over the past several weeks, including
> > ripping out NetworkManager knowing that it has had a history of mucking
> > things up. No luck. They still hang.
> >
> > On another front, I was having some UID/GID problems with the mix of
> > NFS v3 from my SL5.x file servers and NFS v4 on the SL6 nodes, so
> > I forced all mounts to use NFS v3. I thought maybe that could be
> > the problem, but again, no luck - still hanging.
> >
> > Revisiting it again in earnest this weekend via Google, I came up
> > empty as all hits seemed to have something to do with scenarios that
> > just did not apply, including many hits about a problem with running
> > the iscsi daemon (and there was a patch for that). But I'm not running
> > the iscsi daemon. It's not even installed.
> >
> > One comment by someone who also had the same problem was that he, not
> > ever figuring out the cause, just commented out the line in
> > /etc/init.d/network that shuts down the loopback interface, saying it's
> > not a real device anyway, so what the hell.
> >
> > So yesterday I thought I'd try the commenting out the loopback
> shutdown tactic on a test system. Sure enough, the reboot was normal
> with no
> > hangs.
> >
> > Ok, at least now I have a workaround, though that seems pretty kludgy.
> >
> > I decided to try and nail the culprit down with a fresh rebuild of
> > a test system and see just where in the build process the bug appears.
> >
> > After the basic install of SL6, the system reboots just fine.
> > Then do a 'yum update' with all its hundreds of patches.
> > It reboots just fine, as I expected.
> >
> > So the first "local" change was to configure NIS.
> > Try the reboot. Reboots fine.
> >
> > [ok, here is where it becomes bizarre]
> > Modify /etc/nsswitch.conf to switch the order of "files nis" to
> > "nis files" for passwd, shadow, and group, as I've always done.
> > Reboot. Boom! It hangs at loopback interface shutdown!
> >
>
> I want to thank you for giving all the details of your testing. I would
> like to use it as a future example of how to be constructive and helpful
> to other people needing help.
Thanks. Yep, feel free to use this as an example. I suppose it comes
from being in the biz for over 46 years and shaking my head at *SO* many
ill conceived requests for help on listservs.
> So have you looked at nscd any? Does having nscd turned on or off alter
> this problem.
Nay, I have not, and frankly, it didn't occur to me till you asked.
I will explore that when I get a chance and see if it alters the problem.
> Also what is in hosts and is the NIS server listed. Thanks
I assume you're talking about /etc/hosts on the clients.
The SL6.x clients just have the following in hosts:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
> > I repeated this many times to be sure, and it happens the same on
> > every SL6.x node.
> >
> > Bug or feature? I can't imagine it to be a feature nor can I
> > fathom what the order of "files" and "nis" in /etc/nsswitch.conf
> > has to do with the hanging of the loopback interface shutdown.
> > It's possible that an SL6.x NIS server might correct the situation,
> > but I have no time right now to spend a week on that not knowing
> > it would even work.
> >
> > Comments and suggestions are welcome.
> >
> > - Larry
> >
> > --
> > P. Larry Nelson (217-244-9855) | IT Administrator
> > 461 Loomis Lab | High Energy Physics Group
> > 1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill.
> > MailTo:[log in to unmask] <mailto:[log in to unmask]> |
> http://www.brf-llc.com/lnelson/
> > -------------------------------------------------------------------
> > "Information without accountability is just noise." - P.L. Nelson
>
--
P. Larry Nelson (217-244-9855) | IT Administrator
461 Loomis Lab | High Energy Physics Group
1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill.
MailTo:[log in to unmask] | http://www.brf-llc.com/lnelson/
-------------------------------------------------------------------
"Information without accountability is just noise." - P.L. Nelson
|