SCIENTIFIC-LINUX-USERS Archives

April 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Steven J. Yellin" <[log in to unmask]>
Reply To:
Steven J. Yellin
Date:
Tue, 23 Apr 2013 12:43:26 -0700
Content-Type:
multipart/mixed
Parts/Attachments:
text/plain (1764 bytes)
     The atop service from epel logs processes into /var/log/atop files. 
You can run 'atop -r ...' interactively on the file being updated at the 
time the computer froze in order to see what was happening just before it 
happened.

Steven Yellin

On Tue, 23 Apr 2013, Joseph Areeda wrote:

>
> On 04/23/2013 11:44 AM, Joseph Areeda wrote:
>> Greetings,
>> 
>> I'm having this strange behavior that I think is a hardware problem I can't 
>> find.
>> 
>> I can usually run for 4-8 hrs without a problem then all of a sudden I get 
>> one of the following:
>>
>>   * System freezes, mouse and keyboard dead, sshd unresponsive sometimes
>>   * if the keyboard is alive going to an open terminal I get one of
>>     the following errors about equally probable:
>>       o input out put error
>>       o too many files open
>>       o bus error
>>       o may be others that haven't happened for a while
>> 
>> I've run memtest for 10 hrs, no problem.  Fsck shows now problem, disk 
>> utility show those with SMART are all fine.
>> 
>> I have now found any particular program or operation that causes the 
>> failure.
>> 
>> Any suggestions on how to find the cause.
>> 
>> I'm just about ready to sacrifice a small animal as soon as I find the old 
>> gypsy woman who reads the entrails and tells me which part to replace.
>> 
>> Thanks,
>> Joe
>> 
> Sorry about the typos in my first message.  I wanted to add that Einstein at 
> Home runs both CPU and GPU jobs and they validate, so those parts don't have 
> any hard failures.
>
> And lm sensors show temperatures in the 30-50 °C range depending on what's 
> running.
>
> And the system has been running well for over a year so I don't think it's a 
> build problem.
>
> I'm looking for any way to test more.
>
> Joe
>

ATOM RSS1 RSS2