SCIENTIFIC-LINUX-USERS Archives

January 2012

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Konstantin Olchanski <[log in to unmask]>
Reply To:
Konstantin Olchanski <[log in to unmask]>
Date:
Sat, 21 Jan 2012 08:36:45 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (58 lines)
On Sat, Jan 21, 2012 at 03:11:36PM +1100, Steven Haigh wrote:
> Just a few comments to be taken as constructive feedback...
> 
> On 21/01/2012 1:51 PM, Konstantin Olchanski wrote:
> > So every SL6.1 machine has to be repaired manually (yum reinstall nfs-utils).
> 
> This should be easily scriptable with remote ssh commands...
> ... no longer than 10 minutes to do this to an entire subnet of machines.

Correct. Time consumed by this bug:
30 minutes to discover /core* files, figure out where they come from
30 minutes to answer email and phone calls from users asking about /core* files
30 minutes to search SL mailing lists, read and understand discussion of problem
30 minutes to develop solution (solution given in retraction notice is wrong as it modifies non-affected machines)
30 minutes to run the script that pushes the solution to all machines
- repeat above twice - once in December, second time now.
30 minutes to travel to each affected dead machine, revive it run the script manually
- repeat above for each machine killed by /core* files
15 minutes per machine in the future to fix machines that were offline today
- repeat above for each machine found in the future

So yes, no longer than 10 minutes. No sweat.


> >- remember to post a notice to sl-errata when you push something into
> >   every SL machine out there.
> 
> I get where you are coming from here. I think the correct course of
> action should have been:
> 	1) Post link to test RPMs on mailing list and collect feedback.
> 	2) If issue is resolved, push the RPMs to sl-fastbugs
> 	3) If no issues reported in 7-14 days, push the package to sl-security.


I believe this procedure was followed - a test package was issued, people who
tested it tested it, found no problems, packages is issued to the world,
the world is filled with core dumps.


> You are always welcome to subscribe all your machines to RHEL
> licenses. This will give you an avenue to support and a direct feed
> to techs at RH to fix these issues for you. You benefit in this
> situation, and indirectly, all SL and CentOS users benefit too.


Been with Red Hat not quite since day one, then came RHEL, many
discussions between the HEP community and Red Hat about bulk
licensing (neither side wants to deal with managing licenses of
every compute node on every super computer), SL/SLC is the result
and here we are here.


-- 
Konstantin Olchanski
Data Acquisition Systems: The Bytes Must Flow!
Email: olchansk-at-triumf-dot-ca
Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada

ATOM RSS1 RSS2