SCIENTIFIC-LINUX-USERS Archives

March 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Martin Flemming <[log in to unmask]>
Reply To:
Martin Flemming <[log in to unmask]>
Date:
Fri, 23 Mar 2007 11:04:08 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (207 lines)
Hi, Troy et all !


Good news from the hardware-front ...

I 've found the solution under 

http://www.sun.com/products-n-solutions/hardware/docs/html/819-4347-14/software.html#58439

RHEL4 NMI Watchdog Timer Must Be Disabled In Servers With BIOS 38 
(6486170)

The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by 
software and hardware developers to detect system lockups during 
development. The NMI Watchdog periodically checks the CPU status to 
determine if a program is holding the CPU in an interrupted state for an 
extended period of time.

It has been observed in servers runnning BIOS 38 that the SMP kernel in 
RHEL4 will not boot without crashing when the NMI watchdog is enabled. If 
the watchdog timer is disabled, the server running RHEL4 will boot with no 
problems.
Workaround

Disable the watchdog timer on RHEL4 by performing the following steps:

1. Log in as superuser (root).

2. Edit the /boot/grub/menu.lst file.

3. At the end of each line that begins with kernel, append this text:

nmi_watchdog=0s

4. Save the changes to the file.

5. Reboot the system. 


After appending "nmi_watchdog=0s" to /boot/grub/menu.lst

all kernels 

(kernel-largesmp-2.6.9-42.0.3.EL.x86_64 && and my own 
kernel-smp-2.6.9-42.0.4.EL.x86_64 with  CONFIG_NR_CPUS=16 )

works great with all cpu's ..

Cheers & nice weekend

           Martin 


______________________________________________________
Martin Flemming
DESY / IT          office : Building 2b / 008a
Notkestr. 85       phone  : 040 - 8998 - 4667
22603 Hamburg      mail   : [log in to unmask]
______________________________________________________



On Fri, 16 Mar 2007, Martin Flemming wrote:

> Hi, Troy !
> 
> I will test the kernel-largesmp-2.6.9-42.0.10.EL.x86_64 as soon as
> possible, but unfortunatley my machine is not really my machines ...
> 
> One of our scientific-groups has got the ownership and it's still in
> production too .... :-)
> 
> I've contaced them for testing this new kernel and still waiting
> for an answer ...
> 
> I will report to you again if i've tested the kernel ..
> 
> Cheers & nice weeekend
> 
>          martin
> 
> On Fri, 16 Mar 2007, Troy Dawson wrote:
> 
> > Hi Martin and all,
> > I've just double checked with Sascha, the admin for the machine.
> > Remember, this is in production, so he can't do any tests.
> >
> > It is currently running kernel kernel-largesmp-2.6.9-42.0.10.EL.x86_64
> > It see's all 16 CPU's (8 dual core opteron's) (I have the output of
> > cpuinfo if you need)
> > Output of uname -a
> > Linux <hostname deleted> 2.6.9-42.0.10.ELlargesmp #1 SMP Tue Feb 27
> > 12:54:30 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
> >
> > I have the output of grub if you want, but the import part looks normal
> >
> > title Scientific Linux SL (2.6.9-42.0.10.ELlargesmp)
> >         root (hd0,0)
> >         kernel /boot/vmlinuz-2.6.9-42.0.10.ELlargesmp ro root=LABEL=/
> > message=/boot/boot.msg console=tty0 console=ttyS0,9600N8 rhgb quiet
> >         initrd /boot/initrd-2.6.9-42.0.10.ELlargesmp.img
> >
> > Maybe it's the original S.L. 4.4 x86_64 kernel (2.6.9-42.0.3) that is
> > having the problems.  Or maybe it's some setting in the bios.
> > Does the kernel crash go away when you update the kernel to
> > 2.6.9-42.0.10.ELlargesmp?
> >
> > Troy
> >
> > Troy Dawson wrote:
> > > Hi Martin,
> > > I'm double checking right now, but it might be a day or two.  The
> > > machine in question is in germany, and is in production right now, so I
> > > have to contact the system administrator to get the information.
> > >
> > > I do know that for i386, I saw all 16 CPU's and had no problems at all
> > > (with SL 4.4).
> > > For x86_64 my data somehow got blanked.  You know you mean to save a
> > > file and push the wrong keys, and you don't notice until your test
> > > system is away in production.
> > >
> > > I will get that information and update the page if it needs be.
> > >
> > > Thanks
> > > Troy
> > >
> > > Martin Flemming wrote:
> > >> Hi, Stephen !
> > >>
> > >> Yep, this was also my thought,
> > >> but this kernel "kernel-largesmp-2.6.9-42.0.3.EL.x86_64"
> > >> crashes as i remarked ...
> > >>
> > >> Any other ideas ?
> > >>
> > >> cheers,
> > >>              Martin
> > >>
> > >> On Thu, 15 Mar 2007, Stephen J. Gowdy wrote:
> > >>
> > >>> It looks like you need largesmp (assuming you have the 8 dual-core
> > >>> CPU version, most option look to only include 4 dual-core CPUs);
> > >>>
> > >>> "Please note that limits for <USV> v4 are for Update 3 or later.
> > >>> Update 3 was released in March 2006. CPU counts over 8 (AMD64/EM64T)
> > >>> or 64 (other architectures) require use of the largesmp kernel.
> > >>> Certified limits reflect the current state of system testing by <USV>
> > >>> and its partners, and the limit of support provided by a <USV> Linux
> > >>> subscription."
> > >>>
> > >>> On Thu, 15 Mar 2007, Martin Flemming wrote:
> > >>>
> > >>>> Hi, Troy et all !
> > >>>>
> > >>>> I've recognized today,
> > >>>> that on the hardware-webside
> > >>>>
> > >>>> https://www.scientificlinux.org/documentation/hardware/
> > >>>>
> > >>>> you've published the sucessfull
> > >>>> installation of a "Sun Fire x4600"-Machine ...
> > >>>>
> > >>>> We've got the same machine in our lab, but unfortunatley
> > >>>> we see only 8 cpu's not 16 ...
> > >>>>
> > >>>> So my question is, which kernel do you have installed ?
> > >>>>
> > >>>> I've installed following one:
> > >>>>
> > >>>> kernel-smp-2.6.9-42.0.3.EL.x86_64
> > >>>>
> > >>>> which displays only 8 CPU's ...
> > >>>>
> > >>>> At first, i've got the largesmp-kernel
> > >>>>
> > >>>> kernel-largesmp-2.6.9-42.0.3.EL.x86_64
> > >>>>
> > >>>> but this one generate a kernelpanic ....
> > >>>>
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>>       Martin
> > >>>>
> > >
> >
> >
> > --
> > __________________________________________________
> > Troy Dawson  [log in to unmask]  (630)840-6468
> > Fermilab  ComputingDivision/LCSI/CSI DSS Group
> > __________________________________________________
> >
> 
> Gruss
> 
>        Martin Flemming
> 
> 
> ______________________________________________________
> Martin Flemming
> DESY / IT          office : Building 2b / 008a
> Notkestr. 85       phone  : 040 - 8998 - 4667
> 22603 Hamburg      mail   : [log in to unmask]
> ______________________________________________________
> 

ATOM RSS1 RSS2