SCIENTIFIC-LINUX-USERS Archives

June 2009

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jon Peatfield <[log in to unmask]>
Reply To:
Jon Peatfield <[log in to unmask]>
Date:
Wed, 17 Jun 2009 21:27:45 +0100
Content-Type:
TEXT/PLAIN
Parts/Attachments:
TEXT/PLAIN (70 lines)
On Mon, 15 Jun 2009, Dr Andrew C Aitchison wrote:

> What do other groups do about updating applications and machines
> with long running processes ?
>
> My users run two sorts of long running processes, with different
> problems when it comes to updates.
>
> First, I have users who never log off. Thus applications like
> firefox and pdf viewers will be running when they are updated.
> Some time later these applications may try to load and run plugins
> which have been removed/updated.
>
> Second, I have users with long running calculations (often weeks
> or more) which would be interrupted if the machine were rebooted into an 
> updated kernel. User-writing code often check-points, so the actual 
> calculation time lost is not significant, but calculations in
> commercial packages such as Mathematica and Maple are often less good about 
> check-pointing.
>
> How do people balance the disruption of killing user processes
> against the need to update to the latest versions of software ?
>
> Thanks,

For security updates of things like firefox where the user account might 
be compromised by viewing an evil page I tend to err on the side of do the 
updates asap and sort out the complaints later.  This probably applies to 
most stuff where the errata mentions 'critical' or where there is a risk 
of arbitrary code execution or similar.

For stuff which I think won't cause problems for users I also will 
typically do those updates fairly quickly (this assumes I get it right of 
course).

Updates which would be disruptive but I don't think affect us (e.g. a 
security fix for a feature we don't use), I tend to accumulate until 
something more important turns up and then we apply them all together.

For updates which are disruptive (replacing important parts or requiring 
reboots etc), we generally announce them to users about a week in advance, 
and give them a chance to have the updates applied *sooner* if the day we 
have picked would be bad for them.  Typically we do reboots only on 
wednesday mornings unless we think it is sufficiently urgent to justify 
doing it sooner.

In recent years the number of people who are upset by the announced 
reboots has gone down, though a few people clearly don't read our news 
items (and are hence surprised/upset), so we plan to also have an opt-in 
mailing list for 'important' items.

BTW the default reboot/shutdown procedures in el5/sl5 don't give user 
processes very long to checkpoint themselves, and I *think* that 
networking may have been turned off by the time they get signalled.  We 
ended up adding an extra shutdown script which runs fairly early and sends 
sigterm to all user processes and give them a short time to save state 
before carrying on with the shutdown/reboot.

I'm not sure if it was any different in earlier versions but we got more 
complaints after the update to sl5...

-- 
/--------------------------------------------------------------------\
| "Computers are different from telephones.  Computers do not ring." |
|       -- A. Tanenbaum, "Computer Networks", p. 32                  |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP,  University of Cambridge |
| Mail:  [log in to unmask]     Web:  http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/

ATOM RSS1 RSS2