SCIENTIFIC-LINUX-USERS Archives

August 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Steven Timm <[log in to unmask]>
Reply To:
Steven Timm <[log in to unmask]>
Date:
Mon, 30 Aug 2010 21:59:11 -0500
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (92 lines)
Hi Doug--I have seen the same message on some of our machines but
so far it hasn't caused any real performance problems up until now.
It's not so much if you are running SL5.5 but just as long as you
are running some of the latest errata kernels.. we only
saw it show up on SL5.3 but with the latest errata kernel.

Steve



Steve


On Mon, 30 Aug 2010, Doug Johnson wrote:

> Greetings,
>
> I am seeing the following messsage when an SL5.5 (all of the most recent
> updates are installed) is under load writing data to an NFS disk:
>
> NOTE: It occurs for other processes than kswapd0, so I don't think that
> has anything to do with the issue.
>
> Aug 30 18:25:21 se kernel: INFO: task kswapd0:220 blocked for more than 120 seconds.
> Aug 30 18:25:21 se kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 30 18:25:21 se kernel: kswapd0       D ffff810003336420     0   220     36           221   219 (L-TLB)
> Aug 30 18:25:21 se kernel:  ffff810003be19e0 0000000000000046 ffff810037c9c200 ffff8100ae3c4000
> Aug 30 18:25:21 se kernel:  0000000000000003 000000000000000a ffff810037f2a860 ffffffff80308b60
> Aug 30 18:25:21 se kernel:  00000a919f5c3fe1 00000000002d7d53 ffff810037f2aa48 00000000c770f5f8
> Aug 30 18:25:21 se kernel: Call Trace:
> Aug 30 18:25:21 se kernel:  [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90
> Aug 30 18:25:21 se kernel:  [<ffffffff886646e5>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
> Aug 30 18:25:21 se kernel:  [<ffffffff800637ea>] io_schedule+0x3f/0x67
> Aug 30 18:25:21 se kernel:  [<ffffffff886646ee>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
> Aug 30 18:25:21 se kernel:  [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
> Aug 30 18:25:21 se kernel:  [<ffffffff886646e5>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
> Aug 30 18:25:21 se kernel:  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
> Aug 30 18:25:21 se kernel:  [<ffffffff800a0a06>] wake_bit_function+0x0/0x23
> Aug 30 18:25:21 se kernel:  [<ffffffff88668106>] :nfs:nfs_wait_on_requests_locked+0x70/0xca
> Aug 30 18:25:21 se kernel:  [<ffffffff88669146>] :nfs:nfs_sync_inode_wait+0x60/0x1db
> Aug 30 18:25:21 se kernel:  [<ffffffff8865f234>] :nfs:nfs_release_page+0x2c/0x4d
> Aug 30 18:25:21 se kernel:  [<ffffffff800caea8>] shrink_inactive_list+0x511/0x8d8
> Aug 30 18:25:21 se kernel:  [<ffffffff800ca39b>] isolate_lru_pages+0x98/0xbf
> Aug 30 18:25:21 se kernel:  [<ffffffff80047e98>] __pagevec_release+0x19/0x22
> Aug 30 18:25:21 se kernel:  [<ffffffff800ca876>] shrink_active_list+0x4b4/0x4c4
> Aug 30 18:25:21 se kernel:  [<ffffffff800130f5>] shrink_zone+0x127/0x18d
> Aug 30 18:25:21 se kernel:  [<ffffffff80057b94>] kswapd+0x323/0x46c
> Aug 30 18:25:21 se kernel:  [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e
> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
> Aug 30 18:25:21 se kernel:  [<ffffffff80057871>] kswapd+0x0/0x46c
> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
> Aug 30 18:25:21 se kernel:  [<ffffffff8003287b>] kthread+0xfe/0x132
> Aug 30 18:25:21 se kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4
> Aug 30 18:25:21 se kernel:  [<ffffffff8003277d>] kthread+0x0/0x132
> Aug 30 18:25:21 se kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
>
> I have seen this error with both an Intel Pro1000 and a Realtek Ethernet
> card.
>
> I am doing work with 2 other different Universities (completely
> different hardware) and they have all seen this message. Prior to 5.5,
> this would result in the machine locking up. Now with 5.5 it appears
> that the load level on the machine slowly rises (I assume due to D wait
> state blocked processes), but the machine is somewhat responsive. Also
> once these messages occur, ps will hang and that session becomes
> unusable.
>
> I don't what this means, but a similarly configured machine with
> identical hardware running SL4.7 does not produce these errors and the
> NFS throughput is pretty darn good.
>
> 	Any help or pointers in some direction will be appreciated,
> 	Thanks,
> 	doug
>
> ----------------------------------------------------------------------------
>   Doug Johnson                    email: [log in to unmask]
>   B390, Duane Physics             (303)-492-4506 Office
>   Boulder, CO 80309               (303)-492-5119 FAX
>                                   http://www.aaccchildren.org
>   Tully, baby. Look around. It's a cage with golden bars.
> ----------------------------------------------------------------------------
>

-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[log in to unmask]  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

ATOM RSS1 RSS2