On Tue, 1 Jul 2014 14:24:48 -0500
Pat Riehecky <[log in to unmask]> wrote:
> On 07/01/2014 01:29 PM, Andras Horvath wrote:
> > On Mon, 30 Jun 2014 16:23:45 -0400
> > Lamar Owen <[log in to unmask]> wrote:
> >
> >> On 06/30/2014 03:52 PM, Andras Horvath wrote:
> >>> Actually the drive has its own power so it is not USB powered. I
> >>> cannot tell if the drive spins down (did not get the idea to check
> >>> it), but the CPU is in 100% I/O wait all the time after this happens.
> >>> I was told the disk is a WD RED, but I'll check the power mode later
> >>> with hdparm.
> >> The only time I've personally run into the 100% I/O wait issue with EL6
> >> was when I was trying to RAID a Seagate 1.5TB internal SATA drive with a
> >> WD GREEN 1.5TB SATA drive. The system was basically unusable, with
> >> frequent and long forays into 100% iowait territory. Replacing the WD
> >> GREEN drive with another 1.5TB Seagate fixed that. It could be WD's
> >> TLER/non-TLER (Time-Limited Error Recovery) handling doing. this. More
> >> info on this at http://www.wdc.com/en/library/other/2579-001098.pdf and
> >> googling 'WD TLER' yields a lot of hits.
> >>
> >> Another possibility is that the idle timer is set up on the disk; I
> >> would think that it would hit you sooner, though, if it was that issue.
> >> I ran into that sort of issue with an eSATA Seagate a long time ago,
> >> where throughput was good but after a while it would error out. For
> >> some reason the standard Linux write caching and the timeout interacted
> >> badly. There's more about the WD RED and GREEN drives and this idle
> >> timer at
> >> http://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/
> >> with some open source tool at http://idle3-tools.sourceforge.net/
> > A note:
> >
> > hdparm -I /dev/sda | grep -i pow
> > * Power Management feature set
> > Power-Up In Standby feature set
> > * SET_FEATURES required to spinup after power up
> > * Host-initiated interface power management
> > Device-initiated interface power management
> >
> > I cannot access the power levels through the USB interface. I'll check the eSATA connection tomorrow.
> >
> > I restarted copying again, and in a minute the CPU hung again with 100% I/O wait. The "iotop" output shows absolutely nothing, as if there was no load on the disks at all. Interrupt and context switch is around 20-50, so almost nothing (dstat output). Disk operation is zero. Load is at 5.01. The rsync processes that I'm using for the copy cannot be killed or force killed.
> >
> > Any idea? Thanks.
> >
> >
> > Andras
>
> Circling back around to the "is it spinning" question, for externals in
> a workable enclosure, I've found the "Jurassic Park" test to be rather
> trustworthy.[1]
>
> Does dmesg report anything interesting?
>
> Pat
>
>
> [1] https://www.youtube.com/watch?v=1koa2xAxCAw
This part of the dmesg output repeats forever:
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 16 41 19 e0 00 00 f0 00
__ratelimit: 20 callbacks suppressed
Buffer I/O error on device sda1, logical block 46670396
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670397
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670398
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670399
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670400
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670401
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670402
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670403
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670404
lost page write due to I/O error on sda1
Buffer I/O error on device sda1, logical block 46670405
lost page write due to I/O error on sda1
usb 1-4: reset high speed USB device number 2 using ehci_hcd
usb 1-4: reset high speed USB device number 2 using ehci_hcd
usb 1-4: reset high speed USB device number 2 using ehci_hcd
I'll have physical access to the disk only tomorrow. Will report back.
Andras
|