SCIENTIFIC-LINUX-USERS Archives

August 2011

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Kelsey Cummings <[log in to unmask]>
Reply To:
Kelsey Cummings <[log in to unmask]>
Date:
Fri, 19 Aug 2011 14:01:00 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (41 lines)
I'm taking a total stab in the dark that someone else has seen similar 
issues before and can save me a bunch of time.

I have a few SL6 web servers running mediawiki off a shared netapp nfs 
export.  One of the hosts had been exhibiting periodic delays while 
loading pages.  I finally tracked this down to lock contention issues on 
the sessions.  The netapp shows the lock requests stack but strace shows 
some pretty suspect numbers.

======== NLM host wiki.a.foo
39583 0x00030c66:0x56082409 0:0 1 GWAITING (0x43316bd8)
39581 0x00030c66:0x56082409 0:0 1 GWAITING (0x245aa408)
39579 0x00030c66:0x56082409 0:0 1 GWAITING (0x64ea6ef8)
39578 0x00030c66:0x56082409 0:0 1 GRANTED (0x7e719a48)

...
http-trace.30965:13:27:01.709708 flock(11, LOCK_EX) = 0 <0.000185>
http-trace.30970:13:27:01.732142 flock(11, LOCK_EX) = 0 <0.016265>
http-trace.30963:13:27:01.747946 flock(11, LOCK_EX) = 0 <30.041287>
http-trace.30962:13:27:01.754564 flock(11, LOCK_EX) = 0 <60.085116>
http-trace.30961:13:28:17.877626 flock(11, LOCK_EX) = 0 <60.040963>
http-trace.30963:13:28:17.872813 flock(11, LOCK_EX) = 0 <0.044700>
http-trace.30962:13:28:17.873601 flock(11, LOCK_EX) = 0 <90.047198>
http-trace.30967:13:28:17.708213 flock(11, LOCK_EX) = 0 <0.000467>
http-trace.30967:13:28:17.849123 flock(11, LOCK_EX) = 0 <0.000251>
http-trace.30968:13:28:17.863273 flock(11, LOCK_EX) = 0 <30.056047>
...

So, while some lock requests get through just fine, others hang for no 
apparent reason.  As best I can tell, the locks are all be released 
promptly so this appears to be more an issue where a LOCK_EX is held 
when another LOCK_EX is queued it isn't actually granted until some 
timer expires and the request is tried again.

Any ideas where to look?

-- 
Kelsey Cummings - [log in to unmask]      sonic.net, inc.
System Architect                          2260 Apollo Way
707.522.1000                              Santa Rosa, CA 95407

ATOM RSS1 RSS2