Subject: | |
From: | |
Reply To: | |
Date: | Sun, 1 Apr 2012 22:45:28 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
A compiled Matlab process has been observed stuck in the D-state trying to
run sync_page,
[root@ldas-pcdev1 ~]# cat /proc/15478/wchan
sync_page
This is on an SL6.1 system running,
[root@ldas-pcdev1 ~]# uname -a
Linux ldas-pcdev1 2.6.32-220.7.1.el6.x86_64 #1 SMP Tue Mar 6 15:45:33 CST
2012 x86_64 x86_64 x86_64 GNU/Linux
At this point the process is un-killable, though there are no problems
accessing any of the 150 files currently open by the processes (all mounted
over NFS using v3).
This has happened a few times with different 2.6.32 kernels from SL6.1 over
the last few months and requires a reboot to clear the process table. It
also seems to be very similar to the following RHEL report and any help in
fixing this problem would be appreciated,
https://access.redhat.com/knowledge/solutions/60557
Here is the process stack trace for my most recent hung process,
[root@ldas-pcdev1 ~]# cat /proc/15478/stack
[<ffffffff81110afd>] sync_page+0x3d/0x50
[<ffffffff81110a97>] __lock_page+0x67/0x70
[<ffffffff81128a4c>] truncate_inode_pages_range+0x44c/0x460
[<ffffffff81128a75>] truncate_inode_pages+0x15/0x20
[<ffffffff8119169e>] generic_delete_inode+0x18e/0x1d0
[<ffffffff81191745>] generic_drop_inode+0x65/0x80
[<ffffffff811905c2>] iput+0x62/0x70
[<ffffffffa03be1ce>] nfs_dentry_iput+0x3e/0x60 [nfs]
[<ffffffff8118d10c>] dentry_iput+0x7c/0x100
[<ffffffff8118d281>] d_kill+0x31/0x60
[<ffffffff8118ecac>] dput+0x7c/0x150
[<ffffffff811836fa>] path_put+0x1a/0x40
[<ffffffffa03c5552>] __put_nfs_open_context+0xc2/0xf0 [nfs]
[<ffffffffa03c5630>] put_nfs_open_context+0x10/0x20 [nfs]
[<ffffffffa03cebfc>] nfs_clear_request+0x5c/0x80 [nfs]
[<ffffffffa03cec3a>] nfs_free_request+0x1a/0x30 [nfs]
[<ffffffff8126e257>] kref_put+0x37/0x70
[<ffffffffa03ceb99>] nfs_release_request+0x19/0x20 [nfs]
[<ffffffffa03d32d3>] nfs_find_and_lock_request+0xb3/0xd0 [nfs]
[<ffffffffa03d33a1>] nfs_migrate_page+0x41/0xf0 [nfs]
[<ffffffff81163e1a>] move_to_new_page+0xaa/0x1a0
[<ffffffff8116434f>] migrate_pages+0x43f/0x4b0
[<ffffffff81159e84>] compact_zone+0x4f4/0x770
[<ffffffff8115a3a1>] compact_zone_order+0xa1/0xe0
[<ffffffff8115a4fc>] try_to_compact_pages+0x11c/0x190
[<ffffffff81123d35>] __alloc_pages_nodemask+0x5f5/0x940
[<ffffffff81158c2a>] alloc_pages_vma+0x9a/0x150
[<ffffffff81171b65>] do_huge_pmd_anonymous_page+0x145/0x370
[<ffffffff8113c4da>] handle_mm_fault+0x25a/0x2b0
[<ffffffff81042b79>] __do_page_fault+0x139/0x480
[<ffffffff814f253e>] do_page_fault+0x3e/0xa0
[<ffffffff814ef8f5>] page_fault+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
This is running on a 32-core Opteron Sun Fire X4600 M2 system with 128GByte
of memory, and the NFS files being accessed where mounted with,
[root@ldas-pcdev1 ~]# mount | grep omega
thumper1:/home/omega on /mnt/zfs1/omega type nfs
(rw,noacl,timeo=150,retrans=5,vers=3,sloppy,addr=10.14.0.22)
|
|
|