Hi, Troy et all !
Good news from the hardware-front ...
I 've found the solution under
http://www.sun.com/products-n-solutions/hardware/docs/html/819-4347-14/software.html#58439
RHEL4 NMI Watchdog Timer Must Be Disabled In Servers With BIOS 38
(6486170)
The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by
software and hardware developers to detect system lockups during
development. The NMI Watchdog periodically checks the CPU status to
determine if a program is holding the CPU in an interrupted state for an
extended period of time.
It has been observed in servers runnning BIOS 38 that the SMP kernel in
RHEL4 will not boot without crashing when the NMI watchdog is enabled. If
the watchdog timer is disabled, the server running RHEL4 will boot with no
problems.
Workaround
Disable the watchdog timer on RHEL4 by performing the following steps:
1. Log in as superuser (root).
2. Edit the /boot/grub/menu.lst file.
3. At the end of each line that begins with kernel, append this text:
nmi_watchdog=0s
4. Save the changes to the file.
5. Reboot the system.
After appending "nmi_watchdog=0s" to /boot/grub/menu.lst
all kernels
(kernel-largesmp-2.6.9-42.0.3.EL.x86_64 && and my own
kernel-smp-2.6.9-42.0.4.EL.x86_64 with CONFIG_NR_CPUS=16 )
works great with all cpu's ..
Cheers & nice weekend
Martin
______________________________________________________
Martin Flemming
DESY / IT office : Building 2b / 008a
Notkestr. 85 phone : 040 - 8998 - 4667
22603 Hamburg mail : [log in to unmask]
______________________________________________________
On Fri, 16 Mar 2007, Martin Flemming wrote:
> Hi, Troy !
>
> I will test the kernel-largesmp-2.6.9-42.0.10.EL.x86_64 as soon as
> possible, but unfortunatley my machine is not really my machines ...
>
> One of our scientific-groups has got the ownership and it's still in
> production too .... :-)
>
> I've contaced them for testing this new kernel and still waiting
> for an answer ...
>
> I will report to you again if i've tested the kernel ..
>
> Cheers & nice weeekend
>
> martin
>
> On Fri, 16 Mar 2007, Troy Dawson wrote:
>
> > Hi Martin and all,
> > I've just double checked with Sascha, the admin for the machine.
> > Remember, this is in production, so he can't do any tests.
> >
> > It is currently running kernel kernel-largesmp-2.6.9-42.0.10.EL.x86_64
> > It see's all 16 CPU's (8 dual core opteron's) (I have the output of
> > cpuinfo if you need)
> > Output of uname -a
> > Linux <hostname deleted> 2.6.9-42.0.10.ELlargesmp #1 SMP Tue Feb 27
> > 12:54:30 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
> >
> > I have the output of grub if you want, but the import part looks normal
> >
> > title Scientific Linux SL (2.6.9-42.0.10.ELlargesmp)
> > root (hd0,0)
> > kernel /boot/vmlinuz-2.6.9-42.0.10.ELlargesmp ro root=LABEL=/
> > message=/boot/boot.msg console=tty0 console=ttyS0,9600N8 rhgb quiet
> > initrd /boot/initrd-2.6.9-42.0.10.ELlargesmp.img
> >
> > Maybe it's the original S.L. 4.4 x86_64 kernel (2.6.9-42.0.3) that is
> > having the problems. Or maybe it's some setting in the bios.
> > Does the kernel crash go away when you update the kernel to
> > 2.6.9-42.0.10.ELlargesmp?
> >
> > Troy
> >
> > Troy Dawson wrote:
> > > Hi Martin,
> > > I'm double checking right now, but it might be a day or two. The
> > > machine in question is in germany, and is in production right now, so I
> > > have to contact the system administrator to get the information.
> > >
> > > I do know that for i386, I saw all 16 CPU's and had no problems at all
> > > (with SL 4.4).
> > > For x86_64 my data somehow got blanked. You know you mean to save a
> > > file and push the wrong keys, and you don't notice until your test
> > > system is away in production.
> > >
> > > I will get that information and update the page if it needs be.
> > >
> > > Thanks
> > > Troy
> > >
> > > Martin Flemming wrote:
> > >> Hi, Stephen !
> > >>
> > >> Yep, this was also my thought,
> > >> but this kernel "kernel-largesmp-2.6.9-42.0.3.EL.x86_64"
> > >> crashes as i remarked ...
> > >>
> > >> Any other ideas ?
> > >>
> > >> cheers,
> > >> Martin
> > >>
> > >> On Thu, 15 Mar 2007, Stephen J. Gowdy wrote:
> > >>
> > >>> It looks like you need largesmp (assuming you have the 8 dual-core
> > >>> CPU version, most option look to only include 4 dual-core CPUs);
> > >>>
> > >>> "Please note that limits for <USV> v4 are for Update 3 or later.
> > >>> Update 3 was released in March 2006. CPU counts over 8 (AMD64/EM64T)
> > >>> or 64 (other architectures) require use of the largesmp kernel.
> > >>> Certified limits reflect the current state of system testing by <USV>
> > >>> and its partners, and the limit of support provided by a <USV> Linux
> > >>> subscription."
> > >>>
> > >>> On Thu, 15 Mar 2007, Martin Flemming wrote:
> > >>>
> > >>>> Hi, Troy et all !
> > >>>>
> > >>>> I've recognized today,
> > >>>> that on the hardware-webside
> > >>>>
> > >>>> https://www.scientificlinux.org/documentation/hardware/
> > >>>>
> > >>>> you've published the sucessfull
> > >>>> installation of a "Sun Fire x4600"-Machine ...
> > >>>>
> > >>>> We've got the same machine in our lab, but unfortunatley
> > >>>> we see only 8 cpu's not 16 ...
> > >>>>
> > >>>> So my question is, which kernel do you have installed ?
> > >>>>
> > >>>> I've installed following one:
> > >>>>
> > >>>> kernel-smp-2.6.9-42.0.3.EL.x86_64
> > >>>>
> > >>>> which displays only 8 CPU's ...
> > >>>>
> > >>>> At first, i've got the largesmp-kernel
> > >>>>
> > >>>> kernel-largesmp-2.6.9-42.0.3.EL.x86_64
> > >>>>
> > >>>> but this one generate a kernelpanic ....
> > >>>>
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>> Martin
> > >>>>
> > >
> >
> >
> > --
> > __________________________________________________
> > Troy Dawson [log in to unmask] (630)840-6468
> > Fermilab ComputingDivision/LCSI/CSI DSS Group
> > __________________________________________________
> >
>
> Gruss
>
> Martin Flemming
>
>
> ______________________________________________________
> Martin Flemming
> DESY / IT office : Building 2b / 008a
> Notkestr. 85 phone : 040 - 8998 - 4667
> 22603 Hamburg mail : [log in to unmask]
> ______________________________________________________
>
|