On Fri, Oct 22, 2010 at 09:13, William Lutter <[log in to unmask]> wrote:
> I have a desktop PC at work that shows a bad block. PC runs Scientific LInux 5.0 and is a 2 TB WD Green Technology 2 Tb HD (Caviar Green WD20000CSRTL). This one has worked fine out of the box for several months. No problems.
>
> Yesterday, the SMART diagnostics program smartctl (version 5.36) showed a bad block. Deciding to waste some time on it, I used
> http://smartmontools.sourceforge.net/badblockhowto.html approach.
>
> So, I unmounted, figured out the block and that it had a file associated with it, determined the ext3 file system inode. But, I could not deduce the file as it could not read the next file inode. I zeroed out the position using dd and then rerunning smartctl that it showed another bad block:
>
> # 3 Extended offline Completed: read failure 90% 2151 3764125871
> # 4 Short offline Completed without error 00% 2151 -
> # 5 Short offline Completed without error 00% 2150 -
> # 6 Short offline Completed: read failure 90% 2146 3764125865
> # 7 Extended offline Completed without error 00% 2097
>
> The LBA is in the one partition on the HD
> Disk /dev/sdb: 2000.3 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Device Boot Start End Blocks Id System
> /dev/sdb1 63 3907024064 1953512001 83 Linux
>
> Since, it's a new HD and not expecting catastrophic failure, I did not run ddrescue. Having a copy of spinrite around, I ran that and the HD came out squeaky clean. I use spinrite occasionally on windows xp and linux HD where I expect only one bad block. Never had problems with it. Spinrite did not find any more bad blocks. Of course, I had zeroed out the original one. Rebooting and running e2fsck, the file system is clean.
>
> Running smartctl again, I again find a bad block at LBA 3764125871
> # 1 Extended offline Completed: read failure 90% 2169 3764125871
> # 2 Short offline Completed without error 00% 2169 -
My understanding of SMART is that once an event occurs it can not be
cleaned up so smartctl is going to 'see' a bad block til the disk
drive is replaced. Basically the bad block might have been remapped or
not 'used' but the onboard counters only go up not down. [Since it
could be indicative of other failures that might occur soon.]
Everytime I have had this sort of issue with a drive I just had to
replace the drive.
--
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines
|