LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

January 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS January 2012

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: An upstream kernel bug that causes a system crash in SL-6
From:	Akemi Yagi <[log in to unmask]>
Reply To:	Akemi Yagi <[log in to unmask]>
Date:	Fri, 6 Jan 2012 10:25:47 -0800
Content-Type:	text/plain
Parts/Attachments:	text/plain (48 lines)

On Fri, Jan 6, 2012 at 8:00 AM, Akemi Yagi <[log in to unmask]> wrote:
> Hi,
>
> Is there anyone who has/had SL-6 machines running > 200 days ?
>
> There is a kernel bug that causes a system crash when the uptime goes
> over 208.5 days. This was noted by an Scientific Linux user on the SL
> Japanese mailing list [1].
>
> According to available info, the patch [2] is now in kernel 3.1.5.
> RHEL/SL 6 is affected in the sense that the buggy code is there. SL 6
> has been out long enough to see this bug in action and so I wondered
> if someone has already encountered a crash. I searched TUV's bugzilla
> but have not been able to find one that looks related.

No wonder I did not see it; it is private. :(

Here's a copy of the reply from a RH guy to my post on the RHEL-6 mailing list:

From: Robin Price II <rprice redhat com>
Date: Fri, 06 Jan 2012 11:55:08 -0500

Bugzilla:  https://bugzilla.redhat.com/show_bug.cgi?id=765720

This is private due to private information from customer use cases. If
you need further details, I would highly encourage you to contact Red
Hat support or your TAM.

Here is the initial information opened in the BZ:

"The following patch is in urgent fix for Linus branch, which avoid the

unnecessary overflow in sched_clock otherwise kernel will crash after
209~250 days.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=patch;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

In hundreds of days, the __cycles_2_ns calculation in sched_clock

has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing any
precision. We can decompose TSC into quotient and remainder of
division by the scale factor, and then use this to convert TSC into
nanoseconds."

~rp

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV