Troy, when will this kernel be available for SL testing?
What they are saying is not exactly the same as what we are seeing
but it could be worth a try.
This is now 3 errata kernels released by the upstream vendor in
less than a month.

Steve


On Thu, 21 Jan 2010, Troy Dawson wrote:

> Steven Timm wrote:
>> Since the errata kernel release 2.6.18-164.6.1 we have been
>> seeing Xen domU's that will occasionally jump forward in time by
>> 40-80 minutes.  the behavior is such that the clock will jump
>> forward and then just sit there until the clock of the underlying
>> dom0 catches up to it again.
>> 
>> At first we were running ntpd on our domU's but then disabled it
>> in response to suggestions in several howtos.  So now we know
>> that the problem has nothing to do with rogue ntp packets but
>> could very well be something in xen or kernel-xen that is causing it.
>> There's a report of something very similar in the CentOS forum
>> to which I've appended more details of this bug.
>> 
>> https://www.centos.org/modules/newbb/viewtopic.php?topic_id=23402
>> 
>> Nothing in the upstream vendor bugzilla about this that I can find,
>> or nothing in the Xen mailing lists that's obvious.
>> 
>> Any help is appreciated.
>> 
>> Steve Timm
>> 
>> 
>
> Hi Steve,
> With the new kernel (2.6.18-164.11.1.el5) that just was released, there were 
> lots of time bug fixes.
>
> http://www.redhat.com/docs/en-US/errata/RHSA-2010-0046/Kernel_Security_Update/index.html
>
> Here are the time related ones
>
> * Scientific Linux 5.4 SMP guests running on a Scientific Linux Hypervisor 
> may have experienced inconsistent time, for example, the time going 
> backwards. This could have caused some applications to hang.
>
> * In rare cases, a system management interrupt (SMI) could occur during
> CPU frequency calibration (during boot), resulting in the frequency
> being calculated to a value larger than the CPU's specification. This
> could have resulted in timer values being miscalculated and firing at
> incorrect times. Note: This fix is optional. To enable the fix, the
> system must be booted with the avoid_smi kernel parameter.
>
> * A KVM pvclock fix in the kernel-2.6.18-164.2.1.el5 update introduced a
> bug: Some SMP guest operating systems experienced time drift. This could
> cause problems for time-sensitive applications.
>
> * Scientific Linux 5.4 guests using KVM pvclock, calling the
> clock_gettime(CLOCK_REALTIME) and gettimeofday() functions in sequence
> could have, in rare cases, caused clock_gettime() to return a smaller
> value than gettimeofday(). If the sequence was reversed, gettimeofday()
> could return a smaller value than clock_gettime(CLOCK_REALTIME). This
> could cause applications to hang and use large amounts of CPU (up to
> 100%), or cause problems for applications that depend on timestamps to
> order events. Note: This update only resolves this issue for Intel 64
> and AMD64 systems. The issue can still present on i386 systems.
>
> I am not positive that it will fix your problem, but it sure looks like this 
> kernel they did alot of work on time and virtulization.
>
> Troy
>

-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[log in to unmask]  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.