SCIENTIFIC-LINUX-USERS Archives

April 2017

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
David Sommerseth <[log in to unmask]>
Reply To:
Date:
Tue, 4 Apr 2017 16:17:22 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (52 lines)
Hi,

I just need some help to understand what might be the issue on a SL7.3
server which today decided to disconnect two drives from a RAID 6 setup.

First some gory details

- smartctl + mdadm output
<https://paste.fedoraproject.org/paste/wLyz44nipkJ7FgKxWk-1mV5M1UNdIGYhyRLivL9gydE=>

- kernel log messages
https://paste.fedoraproject.org/paste/mkyjZINKnkD4SQcXTSxyt15M1UNdIGYhyRLivL9gydE=


The server is setup with 2x WD RE4 harddrives and 2x Seagate
Constellation ES.3 drives.  All 4TB, all was bought brand new.  They're
installed in a mixed pattern (sda: RE4, sdb: ES3, sdc: RE4, sdd: ES3)
... and the curious devil in the detail ... there are no /dev/sde
installed on this system - never have been even, at least not on that
controller.  (Later today, I attached a USB drive to make some backups -
which got designated /dev/sde)

This morning *both* ES.3 drives (sdb, sdd) got disconnected and removed
from the mdraid setup.  With just minutes in between.  On drives which
have been in production for less than 240 days or so.

lspci details:
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset
Family SATA AHCI Controller (rev 05)

Server: HP ProLiant MicroServer Gen8 (F9A40A)

<https://www.hpe.com/us/en/product-catalog/servers/proliant-servers/pip.specifications.hpe-proliant-microserver-gen8.5379860.html>


Have any one else experienced such issues?  Several places on the net,
the ata kernel error messages have been resolved by checking SATA cables
and their seating.  It just sounds a bit too incredible that two
harddrives of the same brand and type in different HDD slots have the
same issues but not at the exact same time (but close, though).  And I
struggle to believe two identical drives just failing so close in time.

What am I missing? :)  Going to shut down the server soon (after last
backup round) and will double check all the HDD seating and cabling.
But I'm not convinced that's all just yet.


-- 
kind regards,

David Sommerseth

ATOM RSS1 RSS2