Subject: | |
From: | |
Reply To: | |
Date: | Wed, 24 Apr 2013 13:20:12 -0400 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On 04/24/2013 11:03 AM, Joseph Areeda wrote:
> Thanks for the tips Konstantin,
>
> I assume that your recommendation for 24 hrs of memtest is cumulative
> and I can probably see the same results starting it each night when I
> quit for the day.
>
> When I mentioned SMART I was talking about the self tests not the status
> that comes up. I've also copied large files around and checked their
> md5sum's.
>
> I played with LiveCD for 4 or 5 hours today, much of it was trying to
> install it on a different spinning hard drive.
>
> I did see one time when the SSD was shown in the disk utility but all
> the partitions were zero length. that's where my root directory used to be.
I recently discovered that a flaky disk can really mess a system up. I
had an old CentOS5 machine that I recently reinstalled as SL6 because it
was hanging frequently and eventually, after a reboot from a frozen
state, had so many fsck errors that it would not boot.
Since upgrading to SL the hangs continued. Nothing in the logs, and
whenever I went to the machine after it hung it just had a sleeping
monitor but was otherwise entirely unresponsive.
Ran memtest for 24+ hours, no errors. But recently it threw these
errors on the console while the monitor was _not_ asleep:
kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x90200 action 0xe frozen
kernel: ata4: irq_stat 0x00400000, PHY RDY changed
kernel: ata4: SError: { Persist PHYRdyChg 10B8B }
kernel: ata4: hard resetting link
Swapped out the drive and now everything runs smoothy.
When running pvmove with the disk installed in another machine I found a
number of similar errors in that machine's logs but because the disk was
not the root/swap partition drive on that machine it could reset the
link and continue moving data.
Jeff
|
|
|