LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

October 2018

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS October 2018

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: is the disk failing ?
From:	Radha Mohan <[log in to unmask]>
Reply To:	Radha Mohan <[log in to unmask]>
Date:	Thu, 18 Oct 2018 17:21:24 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (94 lines)

On Thu, Oct 18, 2018 at 11:52 AM Paul Robert Marino <[log in to unmask]> wrote:
>
> Radha,
> Over the decades of my dealing with hundreds of thousands of disks in data centers my experience comes down to this.
> 1) if smart says its going to die trust it its rarely wrong about failures.
> 2) if smart says its fine, but you are getting IO errors use the badblocks and or fdisk to verify. %60 of the time it will be a file system problem, %39 of the time it will be something wrong smart didn't detect, the rest will be something else like in no particular order a bad kernel version, bad bios revision, bad controller, or bad cable. by the way this happened to me this year on one of my personal laptops with a toshiba drive smart said it was fine but badblocks revealed it had bad sectors and more were going bad by the day.

Thank you all for the insights. I am glad that the disk hasn't gone bad.

>
> Lastly any one who would like to discuss how the raid controllers for HP server, Dell servers, work and relates subjects like SMI-S feel free to contact me off the list, butI wont engage any further with a flame war based on the uninformed opinions of people on an open list. frankly I can back up what I say with published proven facts by reputable experts, but many people do not respond well when presented with real facts based on evidence.
>
>
> On Wed, Oct 17, 2018 at 8:11 PM Konstantin Olchanski <[log in to unmask]> wrote:
>>
>> On Wed, Oct 17, 2018 at 11:57:34PM +0000, Hinz, David (GE Healthcare) wrote:
>> > I'd like to submit an opposing viewpoint.
>> > If SMART disk analysis says it's going to break, replace it.
>> > Nothing is worth risking lost data.
>>
>>
>> I second this.
>>
>> My only case of false positive (SMART reports complete failure while
>> disk still seems to work) has been a worn out 2 TB "green" WD disk.
>> By "worn out" I mean that it was (a) heavily used and (b) all it's mates
>> of same vintage, age and heavy use have already failed (with i/o errors, etc).
>>
>>
>> K.O.
>>
>>
>>
>>
>> >
>> >
>> > On 10/17/18, 4:50 PM, "[log in to unmask] on behalf of Konstantin Olchanski" <[log in to unmask] on behalf of [log in to unmask]> wrote:
>> >
>> >     >
>> >     > # smartctl -a /dev/sda
>> >     > ...
>> >     > Device Model:     TOSHIBA MG03ACA100
>> >     > ...
>> >
>> >     Thank you for posting your data, here is my reading of smartctl data:
>> >
>> >     >
>> >     > === START OF READ SMART DATA SECTION ===
>> >     > SMART overall-health self-assessment test result: PASSED
>> >     >
>> >
>> >     this you can ignore, I have held in my hands disks that reported "PASSED"
>> >     but were dead, could not read, could not write anything. Also had
>> >     disks that worked perfectly but reported "FAILED" here.
>> >
>> >
>> >     Next goes the meat of the data:
>> >
>> >     > ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  WHEN_FAILED RAW_VALUE
>> >     >   4 Start_Stop_Count        0x0032   100   100   000    Old_age Always       -       26
>> >
>> >     Your disk is brand new, only ever saw 26 power cycles.
>> >
>> >     >   9 Power_On_Hours          0x0032   051   051   000    Old_age Always       -       19725
>> >
>> >     Your disk is brand new, 19725 hours is 2.2 years.
>> >
>> >     > 194 Temperature_Celsius     0x0022   100   100   000    Old_age Always       -       32 (Min/Max 20/37)
>> >
>> >     You have good cooling, temperature is 32C, as high as 40C is usually okey, above 50C means the cooling fans are dead.
>> >
>> >     >   5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail Always       -       0
>> >     > 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age  Always       -       0
>> >     > 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age  Offline      -       0
>> >     > 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age  Always       -       0
>> >
>> >     Your disk does not report any problems reading or writing data to the magnetic media.
>> >
>> >     Conclusion: healthy as a bull.
>> >
>> >     --
>> >     Konstantin Olchanski
>> >     Data Acquisition Systems: The Bytes Must Flow!
>> >     Email: olchansk-at-triumf-dot-ca
>> >     Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
>> >
>> >
>>
>> --
>> Konstantin Olchanski
>> Data Acquisition Systems: The Bytes Must Flow!
>> Email: olchansk-at-triumf-dot-ca
>> Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV