SCIENTIFIC-LINUX-DEVEL Archives

November 2007

SCIENTIFIC-LINUX-DEVEL@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephan Wiesand <[log in to unmask]>
Reply To:
Date:
Tue, 20 Nov 2007 19:46:09 +0100
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (109 lines)
Hi Troy,

On Mon, 19 Nov 2007, Troy Dawson wrote:

> [log in to unmask] wrote:
>> Hallo Thomas,
>> 
>> On Thu, 15 Nov 2007, Thomas Mueller wrote:
>> 
>>> Hi Stephan,
>>> 
>>> On Thu, 15 Nov 2007, [log in to unmask] wrote:
>>> 
>>>> How about updating openafs to the new 1.4.5 release? I put up an SRPM in
>>>> http://www-zeuthen.desy.de/~wiesand/SL5/
>>> There is an open issue with the fileserver.
>>> Jeff Altmann is about to track this down - see
>>> http://rt.central.org/rt/Ticket/Display.html?id=74708
>>> 
>>> It seems the problem is really strange and will not often occur -
>>> but anyway ...
>> 
>> thanks for the heads-up! Yes, I noticed that issue, and the fileserver
>> patches in this SRPM are in there because it scares me. I had some hope
>> that it was sorted out already, but that was before I just read your last
>> reply in RT...
>> 
>> If this problem still persists when SL5 is about to be released, I
>> can provide a 1.4.4 package with most of the bug fixes that went into
>> 1.4.5 (see the changelog). This one had serious testing in our cell,
>> on many clients and a dozen production fileservers, with not a single
>> fileserver problem on record - but then we don't have 200
>> "XtendedProblems" clients rebooting at the same time...
>> 
>> Cheers,
>>         Stephan
>> 
>
> So ... we're waiting to see if the bug get fixed?  If it does, we update, if 
> it doesn't, we get the more patched version of 1.4.4?
> Is this correct?  Just working on what goes in and what doesn't.  Hopefully 
> we can get this release out fairly quick.

Correct ... but maybe a bit more complicated.

First, I'm not convinced the bug isn't present in 1.4.4 or earlier 
releases - after all it hasn't been identified yet, so we don't know. 
Thomas, do you have evidence (and if you have a feeling only, that would 
be sufficient to me) that the bug is 1.4.5-only?

Then, there are clearly bugs fixed in 1.4.5. And probably (I haven't had 
time to check yet), some of the post-1.4.5 fixes in the srpm I put up 
apply to 1.4.4 as well.

And then, unfortunately, 1.4.5 is not just a bug fix release. My 
comprehension of the changes w.r.t. 1.4.4 is pretty incomplete, but I 
think they basically fall into three categories:

  1) relatively minor fixes all over the place, adjustment to current
     linux kernels, compilers etc., enhancements extremely unlikely to
     break anything
  2) changes that help the fileserver survive (inadvertent) DOS attacks,
     from misbehaving clients, typically from not-so-well-administered
     Windows clients - I'm not sure those should be called bug fixes
     in the SL context
  3) performance enhancements, mainly for those running file servers
     on zfs, by making [the formlerly synchronous] fsyncs [the fileserver
     executes abundantly] asynchronous [by moving them into a separate
     thread]

Maybe we should agree on and write down a policy for what we put into SL 
releases. But all the SRPMS I offered to the project so far followed 
this rule: "Provide the latest "stable" openafs release, with all 
post-release patches from the project's cvs added that belong into 
category 1 to the very best of my knowledge. Try hard to give the
client and the fileserver some serious testing before the SL release."

In this case, we're in a hurry to get 5.1 out a few days after 
openafs-1.4.5 was released, and there's a serious bug reported against the 
fileserver. And while that bug is very actively being chased right now,
and it's not too likely to bite the typical SL site, it may be better to 
roll out 1.4.4 with "category 1" type patches applied in SL5.1 .

Hence my proposal: Let's put the 1.4.5 build I offered into the 5.1 beta.
If additional fixes come up in the course of tracking Thomas' bug, let's 
add them. If SL5.1 has to be released and we're not convinced that the 
then current openafs build is good enough, let's fall back to a 
"1.4.4+cat1" build that had serious real life testing.

As such, I put up my last 1.4.4 build (openafs.SLx-1.4.4-52.src.rpm) in 
http://www-zeuthen.desy.de/~wiesand/SL5/ . This is what I currently 
deploy on production fileservers if I have to touch them. It differs from 
1.4.4-51 (found in the same directory) only by a rather tiny patch (by 
Rainer Toebbicke from CERN, and pulled into the 1.4.5 release without any 
discussion). But just to be completely honest, it's the 1.4.4-51 build 
that has run on many, and the most critical, fileservers here (at DESY's 
Zeuthen Site) for months.

Pick your poison ;-)

Cheers,
 	Stephan

-- 
Stephan Wiesand
   DESY - DV -
   Platanenallee 6
   15738 Zeuthen, Germany

ATOM RSS1 RSS2