LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

April 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS April 2013

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Help finding a hardware problem (I think)
From:	Yasha Karant <[log in to unmask]>
Reply To:	Yasha Karant <[log in to unmask]>
Date:	Thu, 25 Apr 2013 09:09:19 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (68 lines)

On 04/25/2013 07:48 AM, Jeff Siddall wrote:
> On 04/25/2013 08:31 AM, Elias Persson wrote:
>> On 2013-04-24 19:34, Joseph Areeda wrote:
>>> Thanks Jeff,
>>>
>>> This does support my current hypothesis that the SSD I was mounting on /
>>> is the most likely culprit.
>>>
>>> What fun.
>>>
>>> Joe
>>>
>>> On 04/24/2013 10:27 AM, Jeff Siddall wrote:
>>>> On 04/23/2013 07:20 PM, Konstantin Olchanski wrote:
>>>>>> disk utility show ... SMART [is] fine.
>>>>>> >
>>>>> SMART "health report" is useless. I had dead disks report "SMART OK"
>>>>> and perfectly functional disks report "SMART Failure, replace your
>>>>> disk now".
>>>>
>>>> Agreed.  SMART doesn't diagnose everything.
>>>>
>>>> On the flaky drive I recently replaced smart extended offline tests
>>>> all passed as did the smart health assessment check. Nothing else
>>>> wrong either (no pending/offline uncorrectable or CRC errors).  But it
>>>> surely was not working well.
>>>>
>>>> Jeff
>>
>>
>> badblocks might be useful?
>>
>> http://en.wikipedia.org/wiki/Badblocks
>>
>> You'd presumably want the "non-destructive" tests...
>
> smartctl -t long is probably a better option.  If a small number of bad
> blocks are detected they should be swapped out by the drive itself
> meaning they are transparent to the FS.  You won't see any of that with
> badblocks.
>
> Jeff

Such blocks swapped out by the hardware controller built into the hard 
drive (the controller to which the computer hard drive interface 
controller communicates -- e.g., the SATA controller on a motherboard) 
might or might not be transparent.  Certain types of information are 
duplicated automatically on a file system and some on a disk by the 
hardware controller.  Those that are not would find that "chunk M", for 
some M, of a file composed of N chunks is defective (where a chunk 
depends on the specifics of a drive, typically a block), the information 
is not recoverable; and another chunk held in reserve by the hardware 
controller is substituted for chunk M, and still given the same 
location.  I can explain this mapping algorithm in greater detail if the 
reader is not familiar with the algorithm.
Thus, the total size of the file has not been lost, but the contents of 
the former chunk M are in fact destroyed.  This condition might not be 
reported as a bad block depending upon the internal error detection and 
correction methodology used by the file system implementation. 
However, if the information is in fact lost and an application attempts 
to use the information in the file, then an error will occur.  In the 
case of a non-critical video stream, this could be nothing more than 
some "black" pixels in the image that might not be detectable to the 
casual human viewer of the image.  In other cases, e.g., a text file, 
the loss might be obvious.

Yasha Karant

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV