SCIENTIFIC-LINUX-USERS Archives

August 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Yannick Perret <[log in to unmask]>
Reply To:
Yannick Perret <[log in to unmask]>
Date:
Tue, 31 Aug 2010 12:41:16 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (125 lines)
Steven Timm a écrit :
> Hi Doug--I have seen the same message on some of our machines but
> so far it hasn't caused any real performance problems up until now.
> It's not so much if you are running SL5.5 but just as long as you
> are running some of the latest errata kernels.. we only
> saw it show up on SL5.3 but with the latest errata kernel.
>
Hello,

here at CC-IN2P3 we met this problem. It appears 3 or 4 release back, 
and was never corrected in latests updates.
The problem seems to come from mainline kernel stuff (I guess backported 
with other features), as I saw some similar bug reports on mainline 
kernel lists.

We also met the "task blocked" messages, and / or increasing of system 
load, and deeply locked tasks that make 'ps' or similar commands to hang.

The only way we found to reduce the problem was to tune the kernel 
parameters (in particular parameters related to VM). But it only reduced 
the problem.

--
Y.

>
>
> On Mon, 30 Aug 2010, Doug Johnson wrote:
>
>> Greetings,
>>
>> I am seeing the following messsage when an SL5.5 (all of the most recent
>> updates are installed) is under load writing data to an NFS disk:
>>
>> NOTE: It occurs for other processes than kswapd0, so I don't think that
>> has anything to do with the issue.
>>
>> Aug 30 18:25:21 se kernel: INFO: task kswapd0:220 blocked for more 
>> than 120 seconds.
>> Aug 30 18:25:21 se kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Aug 30 18:25:21 se kernel: kswapd0       D ffff810003336420     0   
>> 220     36           221   219 (L-TLB)
>> Aug 30 18:25:21 se kernel:  ffff810003be19e0 0000000000000046 
>> ffff810037c9c200 ffff8100ae3c4000
>> Aug 30 18:25:21 se kernel:  0000000000000003 000000000000000a 
>> ffff810037f2a860 ffffffff80308b60
>> Aug 30 18:25:21 se kernel:  00000a919f5c3fe1 00000000002d7d53 
>> ffff810037f2aa48 00000000c770f5f8
>> Aug 30 18:25:21 se kernel: Call Trace:
>> Aug 30 18:25:21 se kernel:  [<ffffffff8006e1db>] 
>> do_gettimeofday+0x40/0x90
>> Aug 30 18:25:21 se kernel:  [<ffffffff886646e5>] 
>> :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
>> Aug 30 18:25:21 se kernel:  [<ffffffff800637ea>] io_schedule+0x3f/0x67
>> Aug 30 18:25:21 se kernel:  [<ffffffff886646ee>] 
>> :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
>> Aug 30 18:25:21 se kernel:  [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
>> Aug 30 18:25:21 se kernel:  [<ffffffff886646e5>] 
>> :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
>> Aug 30 18:25:21 se kernel:  [<ffffffff80063ab0>] 
>> out_of_line_wait_on_bit+0x6c/0x78
>> Aug 30 18:25:21 se kernel:  [<ffffffff800a0a06>] 
>> wake_bit_function+0x0/0x23
>> Aug 30 18:25:21 se kernel:  [<ffffffff88668106>] 
>> :nfs:nfs_wait_on_requests_locked+0x70/0xca
>> Aug 30 18:25:21 se kernel:  [<ffffffff88669146>] 
>> :nfs:nfs_sync_inode_wait+0x60/0x1db
>> Aug 30 18:25:21 se kernel:  [<ffffffff8865f234>] 
>> :nfs:nfs_release_page+0x2c/0x4d
>> Aug 30 18:25:21 se kernel:  [<ffffffff800caea8>] 
>> shrink_inactive_list+0x511/0x8d8
>> Aug 30 18:25:21 se kernel:  [<ffffffff800ca39b>] 
>> isolate_lru_pages+0x98/0xbf
>> Aug 30 18:25:21 se kernel:  [<ffffffff80047e98>] 
>> __pagevec_release+0x19/0x22
>> Aug 30 18:25:21 se kernel:  [<ffffffff800ca876>] 
>> shrink_active_list+0x4b4/0x4c4
>> Aug 30 18:25:21 se kernel:  [<ffffffff800130f5>] shrink_zone+0x127/0x18d
>> Aug 30 18:25:21 se kernel:  [<ffffffff80057b94>] kswapd+0x323/0x46c
>> Aug 30 18:25:21 se kernel:  [<ffffffff800a09d8>] 
>> autoremove_wake_function+0x0/0x2e
>> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] 
>> keventd_create_kthread+0x0/0xc4
>> Aug 30 18:25:21 se kernel:  [<ffffffff80057871>] kswapd+0x0/0x46c
>> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] 
>> keventd_create_kthread+0x0/0xc4
>> Aug 30 18:25:21 se kernel:  [<ffffffff8003287b>] kthread+0xfe/0x132
>> Aug 30 18:25:21 se kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>> Aug 30 18:25:21 se kernel:  [<ffffffff800a07c0>] 
>> keventd_create_kthread+0x0/0xc4
>> Aug 30 18:25:21 se kernel:  [<ffffffff8003277d>] kthread+0x0/0x132
>> Aug 30 18:25:21 se kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
>>
>> I have seen this error with both an Intel Pro1000 and a Realtek Ethernet
>> card.
>>
>> I am doing work with 2 other different Universities (completely
>> different hardware) and they have all seen this message. Prior to 5.5,
>> this would result in the machine locking up. Now with 5.5 it appears
>> that the load level on the machine slowly rises (I assume due to D wait
>> state blocked processes), but the machine is somewhat responsive. Also
>> once these messages occur, ps will hang and that session becomes
>> unusable.
>>
>> I don't what this means, but a similarly configured machine with
>> identical hardware running SL4.7 does not produce these errors and the
>> NFS throughput is pretty darn good.
>>
>>     Any help or pointers in some direction will be appreciated,
>>     Thanks,
>>     doug
>>
>> ---------------------------------------------------------------------------- 
>>
>>   Doug Johnson                    email: [log in to unmask]
>>   B390, Duane Physics             (303)-492-4506 Office
>>   Boulder, CO 80309               (303)-492-5119 FAX
>>                                   http://www.aaccchildren.org
>>   Tully, baby. Look around. It's a cage with golden bars.
>> ---------------------------------------------------------------------------- 
>>
>>
>

ATOM RSS1 RSS2