SCIENTIFIC-LINUX-USERS Archives

April 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stuart Anderson <[log in to unmask]>
Reply To:
Stuart Anderson <[log in to unmask]>
Date:
Sun, 1 Apr 2012 22:45:28 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (67 lines)
A compiled Matlab process has been observed stuck in the D-state trying to
run sync_page,

[root@ldas-pcdev1 ~]# cat /proc/15478/wchan
sync_page

This is on an SL6.1 system running,
[root@ldas-pcdev1 ~]# uname -a
Linux ldas-pcdev1 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST
2012 x86_64 x86_64 x86_64 GNU/Linux


At this point the process is un-killable, though there are no problems
accessing any of the 150 files currently open by the processes (all mounted
over NFS using v3).

This has happened a few times with different 2.6.32 kernels from SL6.1 over
the last few months and requires a reboot to clear the process table. It
also seems to be very similar to the following RHEL report and any help in
fixing this problem would be appreciated,
https://access.redhat.com/knowledge/solutions/60557


Here is the process stack trace for my most recent hung process,

[root@ldas-pcdev1 ~]# cat /proc/15478/stack
[<ffffffff81110afd>] sync_page+0x3d/0x50
[<ffffffff81110a97>] __lock_page+0x67/0x70
[<ffffffff81128a4c>] truncate_inode_pages_range+0x44c/0x460
[<ffffffff81128a75>] truncate_inode_pages+0x15/0x20
[<ffffffff8119169e>] generic_delete_inode+0x18e/0x1d0
[<ffffffff81191745>] generic_drop_inode+0x65/0x80
[<ffffffff811905c2>] iput+0x62/0x70
[<ffffffffa03be1ce>] nfs_dentry_iput+0x3e/0x60 [nfs]
[<ffffffff8118d10c>] dentry_iput+0x7c/0x100
[<ffffffff8118d281>] d_kill+0x31/0x60
[<ffffffff8118ecac>] dput+0x7c/0x150
[<ffffffff811836fa>] path_put+0x1a/0x40
[<ffffffffa03c5552>] __put_nfs_open_context+0xc2/0xf0 [nfs]
[<ffffffffa03c5630>] put_nfs_open_context+0x10/0x20 [nfs]
[<ffffffffa03cebfc>] nfs_clear_request+0x5c/0x80 [nfs]
[<ffffffffa03cec3a>] nfs_free_request+0x1a/0x30 [nfs]
[<ffffffff8126e257>] kref_put+0x37/0x70
[<ffffffffa03ceb99>] nfs_release_request+0x19/0x20 [nfs]
[<ffffffffa03d32d3>] nfs_find_and_lock_request+0xb3/0xd0 [nfs]
[<ffffffffa03d33a1>] nfs_migrate_page+0x41/0xf0 [nfs]
[<ffffffff81163e1a>] move_to_new_page+0xaa/0x1a0
[<ffffffff8116434f>] migrate_pages+0x43f/0x4b0
[<ffffffff81159e84>] compact_zone+0x4f4/0x770
[<ffffffff8115a3a1>] compact_zone_order+0xa1/0xe0
[<ffffffff8115a4fc>] try_to_compact_pages+0x11c/0x190
[<ffffffff81123d35>] __alloc_pages_nodemask+0x5f5/0x940
[<ffffffff81158c2a>] alloc_pages_vma+0x9a/0x150
[<ffffffff81171b65>] do_huge_pmd_anonymous_page+0x145/0x370
[<ffffffff8113c4da>] handle_mm_fault+0x25a/0x2b0
[<ffffffff81042b79>] __do_page_fault+0x139/0x480
[<ffffffff814f253e>] do_page_fault+0x3e/0xa0
[<ffffffff814ef8f5>] page_fault+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

This is running on a 32-core Opteron Sun Fire X4600 M2 system with 128GByte
of memory, and the NFS files being accessed where mounted with,

[root@ldas-pcdev1 ~]# mount | grep omega
thumper1:/home/omega on /mnt/zfs1/omega type nfs
(rw,noacl,timeo=150,retrans=5,vers=3,sloppy,addr=10.14.0.22)

ATOM RSS1 RSS2