SCIENTIFIC-LINUX-USERS Archives

March 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Troy Dawson <[log in to unmask]>
Reply To:
Troy Dawson <[log in to unmask]>
Date:
Fri, 23 Mar 2007 08:19:04 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (205 lines)
Hi Martin,

Thank you for the feedback.  I will update the web page, and also inform 
the sys-admin I was working with.  He was very curious why it wasn't 
working on other machines.

Did you want me to put your name on the webpage with this information? 
I think I should, but I don't want to put people's names where they 
don't want.

Troy

Martin Flemming wrote:
> Hi, Troy et all !
> 
> 
> Good news from the hardware-front ...
> 
> I 've found the solution under 
> 
> http://www.sun.com/products-n-solutions/hardware/docs/html/819-4347-14/software.html#58439
> 
> RHEL4 NMI Watchdog Timer Must Be Disabled In Servers With BIOS 38 
> (6486170)
> 
> The Non-Maskable Interrupt (NMI) Watchdog in RHEL4 is a mechanism used by 
> software and hardware developers to detect system lockups during 
> development. The NMI Watchdog periodically checks the CPU status to 
> determine if a program is holding the CPU in an interrupted state for an 
> extended period of time.
> 
> It has been observed in servers runnning BIOS 38 that the SMP kernel in 
> RHEL4 will not boot without crashing when the NMI watchdog is enabled. If 
> the watchdog timer is disabled, the server running RHEL4 will boot with no 
> problems.
> Workaround
> 
> Disable the watchdog timer on RHEL4 by performing the following steps:
> 
> 1. Log in as superuser (root).
> 
> 2. Edit the /boot/grub/menu.lst file.
> 
> 3. At the end of each line that begins with kernel, append this text:
> 
> nmi_watchdog=0s
> 
> 4. Save the changes to the file.
> 
> 5. Reboot the system. 
> 
> 
> After appending "nmi_watchdog=0s" to /boot/grub/menu.lst
> 
> all kernels 
> 
> (kernel-largesmp-2.6.9-42.0.3.EL.x86_64 && and my own 
> kernel-smp-2.6.9-42.0.4.EL.x86_64 with  CONFIG_NR_CPUS=16 )
> 
> works great with all cpu's ..
> 
> Cheers & nice weekend
> 
>            Martin 
> 
> 
> ______________________________________________________
> Martin Flemming
> DESY / IT          office : Building 2b / 008a
> Notkestr. 85       phone  : 040 - 8998 - 4667
> 22603 Hamburg      mail   : [log in to unmask]
> ______________________________________________________
> 
> 
> 
> On Fri, 16 Mar 2007, Martin Flemming wrote:
> 
>> Hi, Troy !
>>
>> I will test the kernel-largesmp-2.6.9-42.0.10.EL.x86_64 as soon as
>> possible, but unfortunatley my machine is not really my machines ...
>>
>> One of our scientific-groups has got the ownership and it's still in
>> production too .... :-)
>>
>> I've contaced them for testing this new kernel and still waiting
>> for an answer ...
>>
>> I will report to you again if i've tested the kernel ..
>>
>> Cheers & nice weeekend
>>
>>          martin
>>
>> On Fri, 16 Mar 2007, Troy Dawson wrote:
>>
>>> Hi Martin and all,
>>> I've just double checked with Sascha, the admin for the machine.
>>> Remember, this is in production, so he can't do any tests.
>>>
>>> It is currently running kernel kernel-largesmp-2.6.9-42.0.10.EL.x86_64
>>> It see's all 16 CPU's (8 dual core opteron's) (I have the output of
>>> cpuinfo if you need)
>>> Output of uname -a
>>> Linux <hostname deleted> 2.6.9-42.0.10.ELlargesmp #1 SMP Tue Feb 27
>>> 12:54:30 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> I have the output of grub if you want, but the import part looks normal
>>>
>>> title Scientific Linux SL (2.6.9-42.0.10.ELlargesmp)
>>>         root (hd0,0)
>>>         kernel /boot/vmlinuz-2.6.9-42.0.10.ELlargesmp ro root=LABEL=/
>>> message=/boot/boot.msg console=tty0 console=ttyS0,9600N8 rhgb quiet
>>>         initrd /boot/initrd-2.6.9-42.0.10.ELlargesmp.img
>>>
>>> Maybe it's the original S.L. 4.4 x86_64 kernel (2.6.9-42.0.3) that is
>>> having the problems.  Or maybe it's some setting in the bios.
>>> Does the kernel crash go away when you update the kernel to
>>> 2.6.9-42.0.10.ELlargesmp?
>>>
>>> Troy
>>>
>>> Troy Dawson wrote:
>>>> Hi Martin,
>>>> I'm double checking right now, but it might be a day or two.  The
>>>> machine in question is in germany, and is in production right now, so I
>>>> have to contact the system administrator to get the information.
>>>>
>>>> I do know that for i386, I saw all 16 CPU's and had no problems at all
>>>> (with SL 4.4).
>>>> For x86_64 my data somehow got blanked.  You know you mean to save a
>>>> file and push the wrong keys, and you don't notice until your test
>>>> system is away in production.
>>>>
>>>> I will get that information and update the page if it needs be.
>>>>
>>>> Thanks
>>>> Troy
>>>>
>>>> Martin Flemming wrote:
>>>>> Hi, Stephen !
>>>>>
>>>>> Yep, this was also my thought,
>>>>> but this kernel "kernel-largesmp-2.6.9-42.0.3.EL.x86_64"
>>>>> crashes as i remarked ...
>>>>>
>>>>> Any other ideas ?
>>>>>
>>>>> cheers,
>>>>>              Martin
>>>>>
>>>>> On Thu, 15 Mar 2007, Stephen J. Gowdy wrote:
>>>>>
>>>>>> It looks like you need largesmp (assuming you have the 8 dual-core
>>>>>> CPU version, most option look to only include 4 dual-core CPUs);
>>>>>>
>>>>>> "Please note that limits for <USV> v4 are for Update 3 or later.
>>>>>> Update 3 was released in March 2006. CPU counts over 8 (AMD64/EM64T)
>>>>>> or 64 (other architectures) require use of the largesmp kernel.
>>>>>> Certified limits reflect the current state of system testing by <USV>
>>>>>> and its partners, and the limit of support provided by a <USV> Linux
>>>>>> subscription."
>>>>>>
>>>>>> On Thu, 15 Mar 2007, Martin Flemming wrote:
>>>>>>
>>>>>>> Hi, Troy et all !
>>>>>>>
>>>>>>> I've recognized today,
>>>>>>> that on the hardware-webside
>>>>>>>
>>>>>>> https://www.scientificlinux.org/documentation/hardware/
>>>>>>>
>>>>>>> you've published the sucessfull
>>>>>>> installation of a "Sun Fire x4600"-Machine ...
>>>>>>>
>>>>>>> We've got the same machine in our lab, but unfortunatley
>>>>>>> we see only 8 cpu's not 16 ...
>>>>>>>
>>>>>>> So my question is, which kernel do you have installed ?
>>>>>>>
>>>>>>> I've installed following one:
>>>>>>>
>>>>>>> kernel-smp-2.6.9-42.0.3.EL.x86_64
>>>>>>>
>>>>>>> which displays only 8 CPU's ...
>>>>>>>
>>>>>>> At first, i've got the largesmp-kernel
>>>>>>>
>>>>>>> kernel-largesmp-2.6.9-42.0.3.EL.x86_64
>>>>>>>
>>>>>>> but this one generate a kernelpanic ....
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>       Martin
>>>>>>>


-- 
__________________________________________________
Troy Dawson  [log in to unmask]  (630)840-6468
Fermilab  ComputingDivision/LCSI/CSI DSS Group
__________________________________________________

ATOM RSS1 RSS2