LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

October 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS October 2010

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: smart statistics issue
From:	Stephen John Smoogen <[log in to unmask]>
Reply To:	Stephen John Smoogen <[log in to unmask]>
Date:	Fri, 22 Oct 2010 12:36:12 -0600
Content-Type:	text/plain
Parts/Attachments:	text/plain (45 lines)

On Fri, Oct 22, 2010 at 09:13, William Lutter <[log in to unmask]> wrote:
> I have a desktop PC at work that shows a bad block.  PC runs Scientific LInux 5.0 and is a 2 TB  WD Green Technology 2 Tb HD (Caviar Green WD20000CSRTL).   This one has worked fine out of the box for several months.  No problems.
>
> Yesterday, the SMART diagnostics program smartctl (version 5.36) showed a bad block.  Deciding to waste some time on it, I used
> http://smartmontools.sourceforge.net/badblockhowto.html approach.
>
> So, I unmounted, figured out the block and that it had a file associated with it, determined the ext3 file system inode.  But, I could not deduce the file as it could not read the next file inode.   I zeroed out the position using dd and then rerunning smartctl that it showed another bad block:
>
> # 3  Extended offline    Completed: read failure       90%      2151         3764125871
> # 4  Short offline       Completed without error       00%      2151         -
> # 5  Short offline       Completed without error       00%      2150         -
> # 6  Short offline       Completed: read failure       90%      2146         3764125865
> # 7  Extended offline    Completed without error       00%      2097
>
> The LBA is in the one partition on the HD
> Disk /dev/sdb: 2000.3 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
> Units = sectors of 1 * 512 = 512 bytes
>   Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1              63  3907024064  1953512001   83  Linux
>
> Since, it's a new HD and not expecting catastrophic failure, I did not run ddrescue.  Having a copy of spinrite around, I ran that  and the HD came out squeaky clean.  I use spinrite occasionally on windows xp and linux HD where I expect only one bad block.  Never had problems with it.   Spinrite did not find any more  bad blocks.  Of course, I had zeroed out the original one.  Rebooting and running e2fsck, the file system is clean.
>
> Running smartctl again, I again  find a bad block at LBA  3764125871
> # 1  Extended offline    Completed: read failure       90%      2169         3764125871
> # 2  Short offline       Completed without error       00%      2169         -

My understanding of SMART is that once an event occurs it can not be
cleaned up so smartctl is going to 'see' a bad block til the disk
drive is replaced. Basically the bad block might have been remapped or
not 'used' but the onboard counters only go up not down. [Since it
could be indicative of other failures that might occur soon.]

Everytime I have had this sort of issue with a drive I just had to
replace the drive.



-- 
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV