SCIENTIFIC-LINUX-USERS Archives

December 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Joseph Areeda <[log in to unmask]>
Reply To:
Joseph Areeda <[log in to unmask]>
Date:
Sat, 1 Dec 2012 06:18:43 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (337 lines)
Hi David,

I am certainly no expert but this looks to me like the classic NFS 
symptoms when the server gets overloaded, or a disk or the network gets 
flaky.

If it were me, I'd try to get the class to do more local i/o (if 
possible).  Perhaps a scratch area on the local disk would solve the 
problem.

I think you could reproduce the problem by writing a test script that 
does heavy i/o to the network folders and then running on more and more 
machines and watch the i/o throughput approach zero with the machines 
hung while waiting for NFS.

Again, I'm no expert feel free to ignore me.

Joe

On 11/29/2012 10:49 AM, David Fitzgerald wrote:
> Last night during class time I had a chance to check some of the machines with the frozen displays, and I am not sure what to make of what I found.  Running 'lsof -p $PID'  with (PID being 5044) on one of the affected machines, gave this which, doesn't tell me much:
>
> 10.10.10 5044 root  cwd       DIR    8,7     4096    2 /
> 10.10.10 5044 root  rtd       DIR    8,7     4096    2 /
> 10.10.10 5044 root  txt   unknown                      /proc/5044/exe
>
>
> I also ran pstree and I will put that output below, but I think I may be barking up the wrong tree.  While some of my clients were freezing up, I saw that my NFS server was getting very high 'top' loads.  Fortunately I  have sysstat running on the server and after class 'sar -u' showed that %iowait went from less than 1 before class to a high of 53 after class began, and stayed high until class ended.  Here is the relevant 'chunk' of the sar -u  output:
>
> 05:20:01 PM     all      0.03      0.00      0.07      0.17      0.00     99.73
> 05:30:01 PM     all      0.03      0.00      0.03      0.11      0.00     99.83
> 05:40:01 PM     all      0.18      0.00      0.50      1.88      0.00     97.44
> 05:50:01 PM     all      0.16      0.00      1.12      6.93      0.00     91.78
> 06:00:01 PM     all      0.73      0.00      5.23     32.61      0.00     61.43
> 06:10:01 PM     all      0.77      0.00      6.55     53.67      0.00     39.01
> 06:20:01 PM     all      0.13      0.00      4.81     27.81      0.00     67.25
> 06:30:01 PM     all      0.13      0.00      6.69     21.71      0.00     71.47
> 06:40:01 PM     all      0.11      0.00      3.47     33.34      0.00     63.08
> 06:50:01 PM     all      0.11      0.00      3.20     31.02      0.00     65.67
> 07:00:01 PM     all      0.24      0.00      3.93     30.79      0.00     65.05
> 07:10:01 PM     all      0.16      0.00      3.63     20.51      0.00     75.71
> 07:20:01 PM     all      0.18      0.00      5.23      1.45      0.00     93.13
> 07:30:01 PM     all      0.10      0.00      5.72      0.70      0.00     93.48
> Average:        all      0.06      0.01      0.46      2.13      0.00     97.34
>
>
>   The NFS server is a virtual machine in running ESXI 4.1 and VMware tools IS installed.  Could this be slow disk access, and thus a VMware misconfiguration?  I hate to admit it, but I am at a loss.
>
> I can run other sar reports on yesterday's (Wednesday's) data if anyone thinks there may be something in there to help.
>
> For what its worth, here is the output from pstree from one of the affected clients, and I do NOT see the PID that I was looking for:
>
> init(1)-+-NetworkManager(1782)-+-dhclient(1808)
>          |                      `-{NetworkManager}(1809)
>          |-abrtd(2341)
>          |-acpid(2039)
>          |-anacron(3615)
>          |-atd(2413)
>          |-atieventsd(2421)---authatieventsd.(4134)
>          |-auditd(1547)-+-audispd(1549)-+-sedispatch(1550)
>          |              |               `-{audispd}(1551)
>          |              `-{auditd}(1548)
>          |-automount(2134)-+-{automount}(2135)
>          |                 |-{automount}(2136)
>          |                 |-{automount}(2139)
>          |                 |-{automount}(2142)
>          |                 |-{automount}(2143)
>          |                 `-{automount}(2144)
>          |-avahi-daemon(1794)---avahi-daemon(1795)
>          |-bonobo-activati(4549)---{bonobo-activat}(4550)
>          |-cachefilesd(1597)
>          |-certmonger(2435)
>          |-clock-applet(4644)
>          |-console-kit-dae(2521)-+-{console-kit-da}(2522)
>          |                       |-{console-kit-da}(2523)
>          |                       |-{console-kit-da}(2524)
>          |                       |-{console-kit-da}(2525)
>          |                       |-{console-kit-da}(2526)
>          |                       |-{console-kit-da}(2527)
>          |                       |-{console-kit-da}(2528)
>          |                       |-{console-kit-da}(2529)
>          |                       |-{console-kit-da}(2530)
>          |                       |-{console-kit-da}(2531)
>          |                       |-{console-kit-da}(2532)
>          |                       |-{console-kit-da}(2533)
>          |                       |-{console-kit-da}(2534)
>          |                       |-{console-kit-da}(2535)
>          |                       |-{console-kit-da}(2536)
>          |                       |-{console-kit-da}(2537)
>          |                       |-{console-kit-da}(2538)
>          |                       |-{console-kit-da}(2539)
>          |                       |-{console-kit-da}(2540)
>          |                       |-{console-kit-da}(2541)
>          |                       |-{console-kit-da}(2542)
>          |                       |-{console-kit-da}(2543)
>          |                       |-{console-kit-da}(2544)
>          |                       |-{console-kit-da}(2545)
>          |                       |-{console-kit-da}(2546)
>          |                       |-{console-kit-da}(2547)
>          |                       |-{console-kit-da}(2548)
>          |                       |-{console-kit-da}(2549)
>          |                       |-{console-kit-da}(2550)
>          |                       |-{console-kit-da}(2551)
>          |                       |-{console-kit-da}(2552)
>          |                       |-{console-kit-da}(2553)
>          |                       |-{console-kit-da}(2554)
>          |                       |-{console-kit-da}(2555)
>          |                       |-{console-kit-da}(2556)
>          |                       |-{console-kit-da}(2557)
>          |                       |-{console-kit-da}(2558)
>          |                       |-{console-kit-da}(2559)
>          |                       |-{console-kit-da}(2560)
>          |                       |-{console-kit-da}(2561)
>          |                       |-{console-kit-da}(2562)
>          |                       |-{console-kit-da}(2563)
>          |                       |-{console-kit-da}(2564)
>          |                       |-{console-kit-da}(2565)
>          |                       |-{console-kit-da}(2566)
>          |                       |-{console-kit-da}(2567)
>          |                       |-{console-kit-da}(2568)
>          |                       |-{console-kit-da}(2569)
>          |                       |-{console-kit-da}(2570)
>          |                       |-{console-kit-da}(2571)
>          |                       |-{console-kit-da}(2572)
>          |                       |-{console-kit-da}(2573)
>          |                       |-{console-kit-da}(2574)
>          |                       |-{console-kit-da}(2575)
>          |                       |-{console-kit-da}(2576)
>          |                       |-{console-kit-da}(2577)
>          |                       |-{console-kit-da}(2578)
>          |                       |-{console-kit-da}(2579)
>          |                       |-{console-kit-da}(2580)
>          |                       |-{console-kit-da}(2581)
>          |                       |-{console-kit-da}(2582)
>          |                       |-{console-kit-da}(2583)
>          |                       `-{console-kit-da}(2585)
>          |-crond(2402)
>          |-cupsd(1955)
>          |-dbus-daemon(1772)
>          |-dbus-daemon(2883)
>          |-dbus-launch(2591)
>          |-dbus-launch(2882)
>          |-devkit-power-da(2602)
>          |-fcoemon(1760)
>          |-firefox(4968)
>          |-gconf-im-settin(4534)
>          |-gconfd-2(3175)
>          |-gdm-binary(2449)---gdm-simple-slav(2490)-+-Xorg(2492)
>          |                                          `-gdm-session-wor(2671)---tcsh(2849)---gnome-session(4148)-+-bluetooth-apple(436+
>          |                                                                                                     |-gdu-notificatio(432+
>          |                                                                                                     |-gnome-panel(4253)
>          |                                                                                                     |-gnome-power-man(434+
>          |                                                                                                     |-gnome-volume-co(432+
>          |                                                                                                     |-gpk-update-icon(430+
>          |                                                                                                     |-krb5-auth-dialo(435+
>          |                                                                                                     |-metacity(4244)
>          |                                                                                                     |-nautilus(4276)
>          |                                                                                                     |-nm-applet(4342)
>          |                                                                                                     |-polkit-gnome-au(432+
>          |                                                                                                     |-python(4294)
>          |                                                                                                     `-{gnome-session}(422+
>          |-gdm-user-switch(4640)
>          |-gedit(4779)-+-{gedit}(4894)
>          |             |-{gedit}(5037)
>          |             |-{gedit}(5038)
>          |             `-{gedit}(5039)
>          |-gnome-keyring-d(2831)-+-{gnome-keyring-}(2832)
>          |                       `-{gnome-keyring-}(4237)
>          |-gnome-screensav(4665)
>          |-gnome-settings-(4235)---{gnome-settings}(4248)
>          |-gnote(4635)
>          |-gvfs-afc-volume(4573)---{gvfs-afc-volum}(4574)
>          |-gvfs-gdu-volume(4569)
>          |-gvfs-gphoto2-vo(4571)
>          |-gvfsd(3168)
>          |-gvfsd-burn(4754)
>          |-gvfsd-metadata(4794)
>          |-gvfsd-trash(4656)
>          |-hald(2048)---hald-runner(2049)-+-hald-addon-acpi(2096)
>          |                                |-hald-addon-inpu(2088)
>          |                                `-hald-addon-stor(2097)
>          |-im-settings-dae(4371)
>          |-lldpad(1734)
>          |-master(2332)-+-pickup(2347)
>          |              `-qmgr(2348)
>          |-mingetty(2454)
>          |-mingetty(2456)
>          |-mingetty(2458)
>          |-mingetty(2460)
>          |-mingetty(2462)
>          |-modem-manager(1789)
>          |-notification-ar(4642)
>          |-ntpd(2249)
>          |-pcscd(2114)---{pcscd}(2129)
>          |-polkitd(2647)
>          |-pulseaudio(4331)-+-gconf-helper(4563)
>          |                  |-{pulseaudio}(4535)
>          |                  `-{pulseaudio}(4539)
>          |-qpidd(2356)-+-{qpidd}(2357)
>          |             |-{qpidd}(2358)
>          |             `-{qpidd}(2359)
>          |-rpc.idmapd(1864)
>          |-rpc.mountd(2190)
>          |-rpc.rquotad(2175)
>          |-rpc.statd(1818)
>          |-rpcbind(1648)
>          |-rsyslogd(1574)-+-{rsyslogd}(1575)
>          |                |-{rsyslogd}(1576)
>          |                `-{rsyslogd}(1578)
>          |-rtkit-daemon(2661)-+-{rtkit-daemon}(2662)
>          |                    `-{rtkit-daemon}(2663)
>          |-seahorse-agent(3155)
>          |-seahorse-daemon(4243)
>          |-sshd(2233)---sshd(5003)---bash(5005)---pstree(5057)
>          |-sssd(2216)-+-sssd_be(2281)
>          |            |-sssd_nss(2286)
>          |            `-sssd_pam(2287)
>          |-stap-serverd(1927)---{stap-serverd}(1932)
>          |-udevd(542)-+-udevd(1166)
>          |            `-udevd(1745)
>          |-udisks-daemon(4373)---udisks-daemon(4374)
>          |-wpa_supplicant(1813)
>          `-xinetd(2241)
>
>
>
>
> ________________________________________
> From: Christopher Tooley [[log in to unmask]]
> Sent: Wednesday, November 28, 2012 1:00 PM
> To: David Fitzgerald
> Cc: [log in to unmask]
> Subject: Re: clients slow down due to unknown process
>
> If/when you find out what it is, would you kindly report back to the list what you find? This has got me really curious now. :D
>
> -Chris
>
> On 2012-11-28, at 5:51 AM, David Fitzgerald<[log in to unmask]>  wrote:
>
>> Thank you everyone for all the good ideas.  I have class this evening and will be able to use your suggestions.  I'll let you know what I find.
>>
>> Dave
>>
>> -----Original Message-----
>> From: Robert Blair [mailto:[log in to unmask]]
>> Sent: Tuesday, November 27, 2012 11:56 AM
>> To: Sergio Ballestrero
>> Cc: David Fitzgerald; [log in to unmask]
>> Subject: Re: clients slow down due to unknown process
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> "/usr/sbin/lsof -p $PID" will also list all of the resources it uses which is often a big help in figuring out wtf it is all about.
>>
>> On 11/27/2012 10:52 AM, Sergio Ballestrero wrote:
>>> Hello David,
>>> I'm not familiar with freeIPA, but anyway you can start by better
>>> identifying the process.
>>> In top, get the PID and look under /proc/$PID - in particular  exe
>>> will be a link to the binary, like lrwxrwxrwx 1 root root 0 Nov 27
>>> 01:41 /proc/1/exe ->  /sbin/init
>>>
>>> pstree -p -H $PID
>>> will help you identify the parent process, if there's one.
>>>
>>> Cheers,
>>>   Sergio
>>>
>>> On 27 Nov 2012, at 16:21, David Fitzgerald wrote:
>>>
>>>> Hello,
>>>>
>>>> Sorry for the length of this post, but I want to make sure I give all
>>>> the information needed for someone to help.
>>>>
>>>> I have a lab of 25 workstations running Scientific Linux 6.2.  User
>>>> accounts are authenticated via freeIPA, and auto mounted to an NFS
>>>> server and the users use Gnome 2.8.  The NFS and freeIPA servers are
>>>> located on the same server (IP 10.10.10.10) which is also running
>>>> Scientific Linux 6.2 and is a virtual guest in VMware ESXI 4.1.
>>>>
>>>> During class when the workstations are most heavily in use, the
>>>> students are writing Fortran programs with gedit and usually have
>>>> firefox up as well.  Here is my predicament.  During class some of
>>>> the workstation screens will freeze with no mouse or keyboard input.
>>>> This can last for varying lengths of time, sometimes a few minutes,
>>>> some other times for the full length of the class.  I can ssh  in to
>>>> the frozen machines and top will show load averages of up to 4 or more.
>>>> The process taking up the most CPU is one I don't recognize named
>>>> 10.10.10.10-ma.  The 10.10.10.10 being the IP address of my server.
>>>> I have no idea what that process is related to, whether it's freeIPA,
>>>> NFS, Gnome or something else.  Killing the process doesn't help as it
>>>> simply restarts with a new PID.   Note that the freezing does NOT
>>>> happen when only a few people are using the lab, so reproducing the
>>>> problem outside of class time is difficult.
>>>>
>>>> Can anyone help me track down this problem and fix it?
>>>>
>>>> I appreciate any help you can give.
>>>>
>>>> Thanks!
>>>>
>>>> Dave
>>>>
>>>>
>>>> +++++++++++++++++++++++
>>>> David Fitzgerald
>>>> Department of Earth Sciences
>>>> Millersville University
>>>> Millersville, PA 17551
>>>>
>>>> Phone: 717-871-2394
>>>>
>>> --
>>> Sergio Ballestrero  - http://physics.uj.ac.za/psiwiki/Ballestrero
>>> University of Johannesburg, Physics Department  ATLAS TDAQ sysadmin
>>> team - Office:75282 OnCall:164851
>>>
>>>
>>>
>>>
>>>
>>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.5 (GNU/Linux)
>>
>> iQEUAwUBULTwmfQM1KNWz8QaAQLU0Qf2JXa29RVDhJALq2TD72Nis4wAmxlqFIYP
>> rIo5sHBUI+o/bebsDit9qoC+hWuCK3+xDai9fzF2jUQqXfhRZiPHjdQRpCViMurY
>> Wp+aVZWCD1U3KusuWMSWlv6Xdx0QmaMQr8Nh8JRRWUi8cNEgAO2Th1txwdu3auJb
>> LssTFmwUjLUEC0mKhgx6520hisirfOHNTnF3rQCN5ilZGEYEZ2vMm/lcm5yI0Sqc
>> wdqWUXVYGNsBepFf4bRWaWPX0Hbf6sbLgoJNUHJOJ2pGpc3MUp3SiGsIIUGkZwPW
>> xT6kS523J+nItY/odmvdl+ibHRVa7TgDx0xhuqISarr39g00yvvx
>> =RQky
>> -----END PGP SIGNATURE-----

ATOM RSS1 RSS2