SCIENTIFIC-LINUX-USERS Archives

April 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Joseph Areeda <[log in to unmask]>
Reply To:
Joseph Areeda <[log in to unmask]>
Date:
Tue, 23 Apr 2013 13:26:09 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (72 lines)
Thank you Steven and Todd,

atop is now installed waiting for the next time it happens.

nvidia-smi reports fan at 40% and temp at 33°C but I do have a 550ti 
sitting around so I will replace it to see if it makes a difference.

liveCD is being downloaded.

Thank you again.  I was running out of things to try.

Joe

On 04/23/2013 12:43 PM, Steven J. Yellin wrote:
>     The atop service from epel logs processes into /var/log/atop 
> files. You can run 'atop -r ...' interactively on the file being 
> updated at the time the computer froze in order to see what was 
> happening just before it happened.
>
> Steven Yellin
>
> On Tue, 23 Apr 2013, Joseph Areeda wrote:
>
>>
>> On 04/23/2013 11:44 AM, Joseph Areeda wrote:
>>> Greetings,
>>>
>>> I'm having this strange behavior that I think is a hardware problem 
>>> I can't find.
>>>
>>> I can usually run for 4-8 hrs without a problem then all of a sudden 
>>> I get one of the following:
>>>
>>>   * System freezes, mouse and keyboard dead, sshd unresponsive 
>>> sometimes
>>>   * if the keyboard is alive going to an open terminal I get one of
>>>     the following errors about equally probable:
>>>       o input out put error
>>>       o too many files open
>>>       o bus error
>>>       o may be others that haven't happened for a while
>>>
>>> I've run memtest for 10 hrs, no problem.  Fsck shows now problem, 
>>> disk utility show those with SMART are all fine.
>>>
>>> I have now found any particular program or operation that causes the 
>>> failure.
>>>
>>> Any suggestions on how to find the cause.
>>>
>>> I'm just about ready to sacrifice a small animal as soon as I find 
>>> the old gypsy woman who reads the entrails and tells me which part 
>>> to replace.
>>>
>>> Thanks,
>>> Joe
>>>
>> Sorry about the typos in my first message.  I wanted to add that 
>> Einstein at Home runs both CPU and GPU jobs and they validate, so 
>> those parts don't have any hard failures.
>>
>> And lm sensors show temperatures in the 30-50 °C range depending on 
>> what's running.
>>
>> And the system has been running well for over a year so I don't think 
>> it's a build problem.
>>
>> I'm looking for any way to test more.
>>
>> Joe
>>

ATOM RSS1 RSS2