SCIENTIFIC-LINUX-USERS Archives

July 2014

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Andras Horvath <[log in to unmask]>
Reply To:
Andras Horvath <[log in to unmask]>
Date:
Tue, 1 Jul 2014 21:30:32 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (95 lines)
On Tue, 1 Jul 2014 14:24:48 -0500
Pat Riehecky <[log in to unmask]> wrote:

> On 07/01/2014 01:29 PM, Andras Horvath wrote:
> > On Mon, 30 Jun 2014 16:23:45 -0400
> > Lamar Owen <[log in to unmask]> wrote:
> >
> >> On 06/30/2014 03:52 PM, Andras Horvath wrote:
> >>> Actually the drive has its own power so it is not USB powered. I
> >>> cannot tell if the drive spins down (did not get the idea to check
> >>> it), but the CPU is in 100% I/O wait all the time after this happens.
> >>> I was told the disk is a WD RED, but I'll check the power mode later
> >>> with hdparm.
> >> The only time I've personally run into the 100% I/O wait issue with EL6
> >> was when I was trying to RAID a Seagate 1.5TB internal SATA drive with a
> >> WD GREEN 1.5TB SATA drive.  The system was basically unusable, with
> >> frequent and long forays into 100% iowait territory. Replacing the WD
> >> GREEN drive with another 1.5TB Seagate fixed that. It could be WD's
> >> TLER/non-TLER (Time-Limited Error Recovery) handling doing. this.  More
> >> info on this at http://www.wdc.com/en/library/other/2579-001098.pdf and
> >> googling 'WD TLER' yields a lot of hits.
> >>
> >> Another possibility is that the idle timer is set up on the disk; I
> >> would think that it would hit you sooner, though, if it was that issue.
> >> I ran into that sort of issue with an eSATA Seagate a long time ago,
> >> where throughput was good but after a while it would error out.  For
> >> some reason the standard Linux write caching and the timeout interacted
> >> badly.  There's more about the WD RED and GREEN drives and this idle
> >> timer at
> >> http://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/
> >> with some open source tool at http://idle3-tools.sourceforge.net/
> > A note:
> >
> > hdparm -I /dev/sda | grep -i pow
> >             *    Power Management feature set
> >                  Power-Up In Standby feature set
> >             *    SET_FEATURES required to spinup after power up
> >             *    Host-initiated interface power management
> >                  Device-initiated interface power management
> >
> > I cannot access the power levels through the USB interface. I'll check the eSATA connection tomorrow.
> >
> > I restarted copying again, and in a minute the CPU hung again with 100% I/O wait. The "iotop" output shows absolutely nothing, as if there was no load on the disks at all. Interrupt and context switch is around 20-50, so almost nothing (dstat output). Disk operation is zero. Load is at 5.01. The rsync processes that I'm using for the copy cannot be killed or force killed.
> >
> > Any idea? Thanks.
> >
> >
> > Andras
> 
> Circling back around to the "is it spinning" question, for externals in 
> a workable enclosure, I've found the "Jurassic Park" test to be rather 
> trustworthy.[1]
> 
> Does dmesg report anything interesting?
> 
> Pat
> 
> 
> [1] https://www.youtube.com/watch?v=1koa2xAxCAw

This part of the dmesg output repeats forever:

sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 41 19 e0 00 00 f0 00
__ratelimit: 20 callbacks suppressed
Buffer I/O error on device sda1, logical block 46670396
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670397
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670398
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670399
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670400
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670401
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670402
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670403
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670404
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670405
lost page write due to I/O error on sda1
usb 1-4: reset high speed USB device number 2 using ehci_hcd
usb 1-4: reset high speed USB device number 2 using ehci_hcd
usb 1-4: reset high speed USB device number 2 using ehci_hcd

I'll have physical access to the disk only tomorrow. Will report back.


Andras

ATOM RSS1 RSS2