SCIENTIFIC-LINUX-USERS Archives

February 2010

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Linder <[log in to unmask]>
Reply To:
Larry Linder <[log in to unmask]>
Date:
Sat, 20 Feb 2010 09:12:36 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (60 lines)
A thing to try is to run an installation under VMware so you can examine stuff 
external to the OS.
On a Linux Box, install VMware, in VMware install the test copy of Linux.  You 
can run the Test copy as usual and from the outside you can examine the 
hardware usage.
Larry Linder

On Saturday 20 February 2010 07:57, Sergio Ballestrero wrote:
>  Dear Linuxers,
> we  are using SL CERN 5.4 for the ATLAS Control Room at CERN, and we are
> experiencing a problem with the Xorg server that is proving very hard to
> track down. I'm hoping someone in the SL community will have the patience
> to read all this and offer some suggestion...
>
>  The desktop systems show a very slow (but not uniformly slow) memory leak
> in Xorg, growing up to 3GB, sometimes even 5GB, and finally bringing the
> systems to some kind of crash - usually just the GUI freezes, but sometimes
> OOM Killer gets badly in the way and the whole system is left in a bad
> state and needs to be rebooted. Sometimes we can see the problem before it
> becomes critical and request the users to restart X11, but this is not a
> welcome procedure. Simply closing the applications (either gently or by
> killing) does not let Xorg release the occupied memory. Even logging out
> (without restarting X11) does not free the memory allocated by Xorg.
>
>  It takes anything between one week and more than 4 weeks for this to
> happen (depending on how heavily the specific desk is used and which
> applications are ran on it), so it's very hard to correlate to a specific
> application or usage pattern, and we are not finding a way to reproduce it
> in a shorter time, to de able to debug it.
>
>  xrestop only shows <20 entries with 10~20 MB pixmaps allocated, nowhere
> near to justifying the 3GB or more. The memory map from /proc/<Xorg
> pid>/smaps does show a heap of >650MB (not very different from a freshly
> started Xorg) and many allocated memory blocks, some as large as 800MB, but
> these are unlabeled and I don't see a way to correlate them with something
> useful. As you can imagine running Xorg under Valgrind on a production
> system is basically out of question, and doing it on a test system without
> knowing what to try and test seems quite pointless.
>
>  The systems are dual quad-core Xeon systems, with 8 to 12GB RAM, 4GB swap,
> dual nVidia cards (NVS285 or FX370), quad screens, from 4 to 12 virtual
> desktops, and they now run KDE 3.5.10 (from kde-redhat.sourceforge.net) on
> SLC 5.4, x86_64, kernel 2.6.18-164.11.1, with nVidia drivers packaged by
> CERN IT (kernel-module-nvidia-2.6.18-164.11.1.el5-185.18.36-1.slc5.x86_64)
> . We had been seeing the same behavior with SLC 5.3 and standard KDE 3.5.6.
> The most used applications are Konqueror, PVSS (detector control system),
> plus a variety of CERN or ATLAS specific applications, mostly Java or
> Python. The issue appears also on desks where no 3D/OpenGL app is used.
>
>  While this must be, at the bottom, a bug in Xorg, we could already be
> happy with identifying one or more specific applications which trigger
> this, and try to add workarounds / mitigations in the applications, if the
> Xorg bug can't be pinned down or is untreatable.
>
>  Any help or suggestion of tools or procedures that may help us debug this
> issue would be most welcome.
>
>  Thanks, and cheers,
>    Sergio

ATOM RSS1 RSS2