LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

April 2017

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS April 2017

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: RAID 6 array and failing harddrives
From:	David Sommerseth <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Wed, 5 Apr 2017 11:50:35 +0200
Content-Type:	text/plain
Parts/Attachments:	text/plain (116 lines)

On 05/04/17 00:58, Steven Haigh wrote:
> On 05/04/17 05:44, Konstantin Olchanski wrote:
[...snip...]
>> ZFS is also scary, but seems to behave well, even in the presence
>> of "bad disks". ZFS scrub seems to overcome/rewrite bad sectors,
>> bad data on disk (if I poop on a disk using "dd"), all without corrupting
>> data files (I compute and check my own sha-512 checksums for each file).
> 
> Heh - another soon to be victim of ZFS on linux :)

ZFS looks great, so does btrfs - on the paper.  But until ZFS is native
in Linux or btrfs stabilizes on the same level as ext4 and XFS, I'm not
going that path for production environments :)

Most of the nice features I need can in most cases be handled with
md-raid, LVM and XFS.  And those have worked very well over the last
decade for me.  I have not experienced a single data loss despite
failing drives, plus I have replaced drives and expanded the mount
points dynamically with close to no downtime (some hosts don't have
hot-swap, which will require some downtime).

[...snip...]
> 
> However, BTRFS is very stable if you use it as a simple filesystem. You
> will get more flexible results in using mdadm with btrfs on top of it.

This is my understanding as well, in addition to this is what several
file system gurus also say.

> mdadm can be a pain to tweak - but almost all problems are well known
> and documented - and unless you really lose all your parity, you'll be
> able to recover with much less data loss than most other concoctions.

I believe I found out what my issue was.  It really seems to have been
caused by poor cable seating.  After taking server down for "physical
maintenance" (vacuum cleaning, cable/drive re-seating, etc) no errors
occurred again.  I had failures happening regularly at at often around
30 minutes intervals or so before this.  Now the server have been up for
14 hours without a single failure.  Before it didn't even manage to
complete several of the longer smartd self-tests.  But after this
maintenance round, all tests completed without any issues or error reports.

Re-adding (mdadm --re-add) the failed drives also worked flawlessly.
RAID 6 recovered incredibly quickly (despite it being 4TB) - took about
10 minutes.  So this time, it was yet another md-raid + LVM + XFS success.

But what I really learnt was that the SMART pre-failure reports from
smartctl may not always be as bad as it sounds.  I discovered that
Seagate puts a lot of additional data encoded into these fields and
these numbers must be decoded properly.  In addition it's key to
understand the calculations done for these reports too.  And in
particular the RAW_VALUE column can be very misleading.  One of the more
interesting reads I found was this one:
<http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html>

-- 
kind regards,

David Sommerseth

>> On Tue, Apr 04, 2017 at 04:17:22PM +0200, David Sommerseth wrote:
>>> Hi,
>>>
>>> I just need some help to understand what might be the issue on a SL7.3
>>> server which today decided to disconnect two drives from a RAID 6 setup.
>>>
>>> First some gory details
>>>
>>> - smartctl + mdadm output
>>> <https://paste.fedoraproject.org/paste/wLyz44nipkJ7FgKxWk-1mV5M1UNdIGYhyRLivL9gydE=>
>>>
>>> - kernel log messages
>>> https://paste.fedoraproject.org/paste/mkyjZINKnkD4SQcXTSxyt15M1UNdIGYhyRLivL9gydE=
>>>
>>>
>>> The server is setup with 2x WD RE4 harddrives and 2x Seagate
>>> Constellation ES.3 drives.  All 4TB, all was bought brand new.  They're
>>> installed in a mixed pattern (sda: RE4, sdb: ES3, sdc: RE4, sdd: ES3)
>>> ... and the curious devil in the detail ... there are no /dev/sde
>>> installed on this system - never have been even, at least not on that
>>> controller.  (Later today, I attached a USB drive to make some backups -
>>> which got designated /dev/sde)
>>>
>>> This morning *both* ES.3 drives (sdb, sdd) got disconnected and removed
>>> from the mdraid setup.  With just minutes in between.  On drives which
>>> have been in production for less than 240 days or so.
>>>
>>> lspci details:
>>> 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset
>>> Family SATA AHCI Controller (rev 05)
>>>
>>> Server: HP ProLiant MicroServer Gen8 (F9A40A)
>>>
>>> <https://www.hpe.com/us/en/product-catalog/servers/proliant-servers/pip.specifications.hpe-proliant-microserver-gen8.5379860.html>
>>>
>>>
>>> Have any one else experienced such issues?  Several places on the net,
>>> the ata kernel error messages have been resolved by checking SATA cables
>>> and their seating.  It just sounds a bit too incredible that two
>>> harddrives of the same brand and type in different HDD slots have the
>>> same issues but not at the exact same time (but close, though).  And I
>>> struggle to believe two identical drives just failing so close in time.
>>>
>>> What am I missing? :)  Going to shut down the server soon (after last
>>> backup round) and will double check all the HDD seating and cabling.
>>> But I'm not convinced that's all just yet.
>>>
>>>
>>> -- 
>>> kind regards,
>>>
>>> David Sommerseth

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV