On 04/25/2013 07:48 AM, Jeff Siddall wrote:
> On 04/25/2013 08:31 AM, Elias Persson wrote:
>> On 2013-04-24 19:34, Joseph Areeda wrote:
>>> Thanks Jeff,
>>>
>>> This does support my current hypothesis that the SSD I was mounting on /
>>> is the most likely culprit.
>>>
>>> What fun.
>>>
>>> Joe
>>>
>>> On 04/24/2013 10:27 AM, Jeff Siddall wrote:
>>>> On 04/23/2013 07:20 PM, Konstantin Olchanski wrote:
>>>>>> disk utility show ... SMART [is] fine.
>>>>>> >
>>>>> SMART "health report" is useless. I had dead disks report "SMART OK"
>>>>> and perfectly functional disks report "SMART Failure, replace your
>>>>> disk now".
>>>>
>>>> Agreed. SMART doesn't diagnose everything.
>>>>
>>>> On the flaky drive I recently replaced smart extended offline tests
>>>> all passed as did the smart health assessment check. Nothing else
>>>> wrong either (no pending/offline uncorrectable or CRC errors). But it
>>>> surely was not working well.
>>>>
>>>> Jeff
>>
>>
>> badblocks might be useful?
>>
>> http://en.wikipedia.org/wiki/Badblocks
>>
>> You'd presumably want the "non-destructive" tests...
>
> smartctl -t long is probably a better option. If a small number of bad
> blocks are detected they should be swapped out by the drive itself
> meaning they are transparent to the FS. You won't see any of that with
> badblocks.
>
> Jeff
Such blocks swapped out by the hardware controller built into the hard
drive (the controller to which the computer hard drive interface
controller communicates -- e.g., the SATA controller on a motherboard)
might or might not be transparent. Certain types of information are
duplicated automatically on a file system and some on a disk by the
hardware controller. Those that are not would find that "chunk M", for
some M, of a file composed of N chunks is defective (where a chunk
depends on the specifics of a drive, typically a block), the information
is not recoverable; and another chunk held in reserve by the hardware
controller is substituted for chunk M, and still given the same
location. I can explain this mapping algorithm in greater detail if the
reader is not familiar with the algorithm.
Thus, the total size of the file has not been lost, but the contents of
the former chunk M are in fact destroyed. This condition might not be
reported as a bad block depending upon the internal error detection and
correction methodology used by the file system implementation.
However, if the information is in fact lost and an application attempts
to use the information in the file, then an error will occur. In the
case of a non-critical video stream, this could be nothing more than
some "black" pixels in the image that might not be detectable to the
casual human viewer of the image. In other cases, e.g., a text file,
the loss might be obvious.
Yasha Karant
|