SCIENTIFIC-LINUX-USERS Archives

October 2018

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Konstantin Olchanski <[log in to unmask]>
Reply To:
Konstantin Olchanski <[log in to unmask]>
Date:
Wed, 17 Oct 2018 15:15:17 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (103 lines)
On Tue, Oct 16, 2018 at 09:09:36PM -0400, Paul Robert Marino wrote:
>
> to be clear I wasn't saying Smart is useless just that smartctl doesn't
> always tell you every thing so you shouldn't rely as a definitive answer on
> all issues on all disks.
> 

I disagree with "smartctl doesn't always tell you every thing".

I maintain that smartctl reports all the data available regarding disk health.

But perhaps you can give us an example of disk health data that is available,
but is not reported by SMART.

I can give folowing examples of disk health related data that cannot be reported
by SMART because relevant sensors are not present on the disk devices.

You can gain additional insight by touching the disk with your fingers
to see if there is excessive vibrations or excessive heat build up
or no flow of cooling air across the disk.

With ancient disks you could also see gradual degradation of read/write
speeds (as reported by the OS, "iostat -x 1", "vmstat 1", & co).

>
> As for raid controllers well that's a very long conversation there are good
> reasons the enterprise ones do not, at least not directly in a way you can
> extract using the smartctl command instead they have more advanced checks
> available through the drivers and additional monitoring tools provided by
> the manufacturer of the raid controller.
> 

"good reasons ... [to hide SMART data]", "advanced checks",
"additional monitoring tools".

I smell snake oil.

In my experience, the "advanced additional tools" is a plan-jane raid
scrub cycle ("dd if=/dev/raid-disk of=/dev/null") with complaints
about any i/o errors. No checks/complaints if per-disk SMART counters increase,
no check for increment of disk "realocated sectors count",
no check for increased "raw read error rate", and definitely no complaint
if disk temperature (reported by SMART) goes above any reasonable limit.

>
> as for the predictive nature of smart well that's actually in its
> specification it predicts errors based on indicators.
> 

Well, if wikipedia are saying "SMART predicts disk failures", then it must be so.

Myself, I do not read wikipedia, I read this:
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.backblaze.com_blog_what-2Dsmart-2Dstats-2Dindicate-2Dhard-2Ddrive-2Dfailures_&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=gd8BzeSQcySVxr0gDWSEbN-P-pgDXkdyCtaMqdCgPPdW1cyL5RIpaIYrCn8C5x2A&m=G32iZqqMKOiFI0IWyvFtMeiPBUmMGLDfwv-XJICRLMg&s=WP3knY1XbE8k3rNWhR9RbTdJY8wSqIkutTWHO7cbgZw&e=


K.O.



> On Tue, Oct 16, 2018 at 7:55 PM Konstantin Olchanski <[log in to unmask]>
> wrote:
> 
> > On Tue, Oct 16, 2018 at 04:20:03PM -0400, Paul Robert Marino wrote:
> > >
> > > smart is predictive and doesn't catch all errors its also not compatible
> > > with all disks and controllers especially raid capable controllers.
> > >
> >
> >
> > Do not reject SMART as useless, it correctly reports many actual disk
> > failures:
> >
> > a) overheating (actual disk temperature is reported in degrees Centigrade)
> > b) unreadable sectors (data on these sectors is already lost) - disk model
> > dependant
> > c) "hard to read" sectors (WD specific - "raw read error rate")
> > d) sata link communication errors ("CRC error count")
> >
> > even more useful actual (*not* predictive) stuff is reported for SSDs
> > (again, model dependant)
> >
> > it is true that much of this information is disk model dependant and
> > one has to have some experience with the SMART data to be able
> > to read it in a meaningful way.
> >
> > as for raid controllers that prevent access to disk SMART data,
> > they are as safe to use a car with a blank dashboard (no fuel level,
> > no engine temperature, no speedometer, etc).
> >
> >
> > --
> > Konstantin Olchanski
> > Data Acquisition Systems: The Bytes Must Flow!
> > Email: olchansk-at-triumf-dot-ca
> > Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
> >

-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2