SCIENTIFIC-LINUX-USERS Archives

July 2014

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Andras Horvath <[log in to unmask]>
Reply To:
Andras Horvath <[log in to unmask]>
Date:
Tue, 1 Jul 2014 22:28:23 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (41 lines)
On Tue, 1 Jul 2014 13:19:10 -0700
"Patrick J. LoPresti" <[log in to unmask]> wrote:

> On Tue, Jul 1, 2014 at 11:29 AM, Andras Horvath <[log in to unmask]> wrote:
> >
> > I restarted copying again, and in a minute the CPU hung again with 100% I/O wait. The "iotop" output shows absolutely nothing, as if there was no load on the disks at all. Interrupt and context switch is around 20-50, so almost nothing (dstat output). Disk operation is zero. Load is at 5.01. The rsync processes that I'm using for the copy cannot be killed or force killed.
> 
> How much RAM does your system have? What does "/sbin/sysctl -a | grep
> vm.dirty" say?

3 GB.

# /sbin/sysctl -a | grep vm.dirty
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000

> On machines with lots of RAM, I have seen disk subsystems act a bit
> squirrelly when Linux decides to buffer (say) a few gigabytes of
> writes. The traditional "vm.dirty_background_ratio" and
> "vm.dirty_ratio" settings are a percentage of RAM, which gives
> ludicrous behavior on modern big-memory boxes.
> 
> Something like this is more sane:
> 
>   sysctl -w vm.dirty_background_bytes=67108864
>   sysctl -w vm.dirty_background_bytes=134217728
> 
> This is not terribly likely to help, but it is worth a shot. If the
> problem really is the disk spinning down, at least this will be more
> likely to keep it busy...
> 
>  - Pat

Tested it, didn't work unfortunately.

Now my CPU is in 50% idle and 50% wait (dstat output). it's 2 cores, so it means 1 core is in 100% wait. Weird.

ATOM RSS1 RSS2