SCIENTIFIC-LINUX-USERS Archives

March 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Robert Ryans <[log in to unmask]>
Reply To:
Robert Ryans <[log in to unmask]>
Date:
Sat, 31 Mar 2007 12:03:53 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (67 lines)
I recently had to deal with a brand new PowerEdge 1950 (dual Core2 
Xeons) which began to reboot itself at random, and eventually could not 
get all the way through the boot sequence before rebooting again.

It worked OK from a SL boot CD, and passed all the Dell diags fine. 
However, no 'operational' kernel would work. Turned out one of the CPUs 
was bad and had to be replaced. The Dell diags clearly do not do 
significant testing on the CPU in all cases - my guess is some specific 
sequence of instructions triggered the error in this case.

If you have a similar (ideally identical) machine and can swap the hard 
drives you should be able to tell if this is a software problem by 
booting from the 'known good' disk. If that doesn't solve it then there 
is a hardware problem. The usual things to try would be to swap the RAM 
and PSU, remove all cards but the graphics one and try swapping it, 
etc. If that doesn't help then it's either the mainboard or CPU, and at 
that point it's time to call Dell and tell them to send someone out 
with spare parts. I've found them to be perfectly open to doing that 
sort of thing provided the simpler possibilities have been eliminated.

Robert


> On Thu, Mar 29, 2007 at 06:14:24PM -0700, Michael Hannon wrote:
>> Greetings.  One of the profs here has got a Dell Optiplex 620 running 
>> SL
>> 4.4.  It has an Intel Pentium D chip that is dual-core-capable, and it
>> has the capability enabled.
>>
>>> From time to time the owner has had problems with the system hanging.
>> He usually solves the problem with some stupid computer trick, such as
>> cycling the power, etc.  But yesterday he had one of the usual hangs,
>> except that it was one from which he could not recover.
>>
>> The problem is very similar to one that was reported on the SL-users
>> list not too long ago.  In more detail, the system either has a kernel
>> panic during the boot sequence, or it boots all the way and allows a
>> login, but almost immediately has a "hard freeze" that requires a 
>> power
>> cycle to thaw.
>>
>> We've run the Dell diagnostic utilities to test processor, memory, and
>> video, but we didn't find any problems.
>>
>> The system is running the latest kernel,  but it will not boot 
>> reliably
>> with any of the four SMP kernels currently installed on it.
>>
>> We've tried all of the voodoo that I saw mentioned in the previous
>> discussion (run-level 3, no "rhgb quiet" on the command line), but the
>> only thing that seems to work reliably is to boot with the 
>> uni-processor
>> kernel.
>>
>> This is probably an acceptable work-around for the time being, and my
>> hope is that when we do a fresh install with SL 5, we'll all be happy
>> again.  But I wonder if any of y'all can provide any further insight
>> into this.
>
> Could the machine be overheating, by any chance?  I've had similar
> difficulties with a couple of Dell Optiplex 620 Small Form Factor
> machines running SL305.  They would get into a state where they would
> only boot with the uniprocessor kernel, but if you left them off
> overnight to cool down, they would boot the SMP kernel again.  Several
> other identical machines kept on working fine, though.
> Eva.

ATOM RSS1 RSS2