Hi Martin,
Thank you for the feedback. I will update the web page, and also inform
the sys-admin I was working with. He was very curious why it wasn't
working on other machines.
Did you want me to put your name on the webpage with this information?
I think I should, but I don't want to put people's names where they
don't want.
Troy
Martin Flemming wrote:
> Hi, Troy et all !
>
>
> Good news from the hardware-front ...
>
> I 've found the solution under
>
> http://www.sun.com/products-n-solutions/hardware/docs/html/819-4347-14/software.html#58439
>
> RHEL4 NMI Watchdog Timer Must Be Disabled In Servers With BIOS 38
> (6486170)
>
> The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by
> software and hardware developers to detect system lockups during
> development. The NMI Watchdog periodically checks the CPU status to
> determine if a program is holding the CPU in an interrupted state for an
> extended period of time.
>
> It has been observed in servers runnning BIOS 38 that the SMP kernel in
> RHEL4 will not boot without crashing when the NMI watchdog is enabled. If
> the watchdog timer is disabled, the server running RHEL4 will boot with no
> problems.
> Workaround
>
> Disable the watchdog timer on RHEL4 by performing the following steps:
>
> 1. Log in as superuser (root).
>
> 2. Edit the /boot/grub/menu.lst file.
>
> 3. At the end of each line that begins with kernel, append this text:
>
> nmi_watchdog=0s
>
> 4. Save the changes to the file.
>
> 5. Reboot the system.
>
>
> After appending "nmi_watchdog=0s" to /boot/grub/menu.lst
>
> all kernels
>
> (kernel-largesmp-2.6.9-42.0.3.EL.x86_64 && and my own
> kernel-smp-2.6.9-42.0.4.EL.x86_64 with CONFIG_NR_CPUS=16 )
>
> works great with all cpu's ..
>
> Cheers & nice weekend
>
> Martin
>
>
> ______________________________________________________
> Martin Flemming
> DESY / IT office : Building 2b / 008a
> Notkestr. 85 phone : 040 - 8998 - 4667
> 22603 Hamburg mail : [log in to unmask]
> ______________________________________________________
>
>
>
> On Fri, 16 Mar 2007, Martin Flemming wrote:
>
>> Hi, Troy !
>>
>> I will test the kernel-largesmp-2.6.9-42.0.10.EL.x86_64 as soon as
>> possible, but unfortunatley my machine is not really my machines ...
>>
>> One of our scientific-groups has got the ownership and it's still in
>> production too .... :-)
>>
>> I've contaced them for testing this new kernel and still waiting
>> for an answer ...
>>
>> I will report to you again if i've tested the kernel ..
>>
>> Cheers & nice weeekend
>>
>> martin
>>
>> On Fri, 16 Mar 2007, Troy Dawson wrote:
>>
>>> Hi Martin and all,
>>> I've just double checked with Sascha, the admin for the machine.
>>> Remember, this is in production, so he can't do any tests.
>>>
>>> It is currently running kernel kernel-largesmp-2.6.9-42.0.10.EL.x86_64
>>> It see's all 16 CPU's (8 dual core opteron's) (I have the output of
>>> cpuinfo if you need)
>>> Output of uname -a
>>> Linux <hostname deleted> 2.6.9-42.0.10.ELlargesmp #1 SMP Tue Feb 27
>>> 12:54:30 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> I have the output of grub if you want, but the import part looks normal
>>>
>>> title Scientific Linux SL (2.6.9-42.0.10.ELlargesmp)
>>> root (hd0,0)
>>> kernel /boot/vmlinuz-2.6.9-42.0.10.ELlargesmp ro root=LABEL=/
>>> message=/boot/boot.msg console=tty0 console=ttyS0,9600N8 rhgb quiet
>>> initrd /boot/initrd-2.6.9-42.0.10.ELlargesmp.img
>>>
>>> Maybe it's the original S.L. 4.4 x86_64 kernel (2.6.9-42.0.3) that is
>>> having the problems. Or maybe it's some setting in the bios.
>>> Does the kernel crash go away when you update the kernel to
>>> 2.6.9-42.0.10.ELlargesmp?
>>>
>>> Troy
>>>
>>> Troy Dawson wrote:
>>>> Hi Martin,
>>>> I'm double checking right now, but it might be a day or two. The
>>>> machine in question is in germany, and is in production right now, so I
>>>> have to contact the system administrator to get the information.
>>>>
>>>> I do know that for i386, I saw all 16 CPU's and had no problems at all
>>>> (with SL 4.4).
>>>> For x86_64 my data somehow got blanked. You know you mean to save a
>>>> file and push the wrong keys, and you don't notice until your test
>>>> system is away in production.
>>>>
>>>> I will get that information and update the page if it needs be.
>>>>
>>>> Thanks
>>>> Troy
>>>>
>>>> Martin Flemming wrote:
>>>>> Hi, Stephen !
>>>>>
>>>>> Yep, this was also my thought,
>>>>> but this kernel "kernel-largesmp-2.6.9-42.0.3.EL.x86_64"
>>>>> crashes as i remarked ...
>>>>>
>>>>> Any other ideas ?
>>>>>
>>>>> cheers,
>>>>> Martin
>>>>>
>>>>> On Thu, 15 Mar 2007, Stephen J. Gowdy wrote:
>>>>>
>>>>>> It looks like you need largesmp (assuming you have the 8 dual-core
>>>>>> CPU version, most option look to only include 4 dual-core CPUs);
>>>>>>
>>>>>> "Please note that limits for <USV> v4 are for Update 3 or later.
>>>>>> Update 3 was released in March 2006. CPU counts over 8 (AMD64/EM64T)
>>>>>> or 64 (other architectures) require use of the largesmp kernel.
>>>>>> Certified limits reflect the current state of system testing by <USV>
>>>>>> and its partners, and the limit of support provided by a <USV> Linux
>>>>>> subscription."
>>>>>>
>>>>>> On Thu, 15 Mar 2007, Martin Flemming wrote:
>>>>>>
>>>>>>> Hi, Troy et all !
>>>>>>>
>>>>>>> I've recognized today,
>>>>>>> that on the hardware-webside
>>>>>>>
>>>>>>> https://www.scientificlinux.org/documentation/hardware/
>>>>>>>
>>>>>>> you've published the sucessfull
>>>>>>> installation of a "Sun Fire x4600"-Machine ...
>>>>>>>
>>>>>>> We've got the same machine in our lab, but unfortunatley
>>>>>>> we see only 8 cpu's not 16 ...
>>>>>>>
>>>>>>> So my question is, which kernel do you have installed ?
>>>>>>>
>>>>>>> I've installed following one:
>>>>>>>
>>>>>>> kernel-smp-2.6.9-42.0.3.EL.x86_64
>>>>>>>
>>>>>>> which displays only 8 CPU's ...
>>>>>>>
>>>>>>> At first, i've got the largesmp-kernel
>>>>>>>
>>>>>>> kernel-largesmp-2.6.9-42.0.3.EL.x86_64
>>>>>>>
>>>>>>> but this one generate a kernelpanic ....
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Martin
>>>>>>>
--
__________________________________________________
Troy Dawson [log in to unmask] (630)840-6468
Fermilab ComputingDivision/LCSI/CSI DSS Group
__________________________________________________
|