SCIENTIFIC-LINUX-USERS Archives

August 2013

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
John Lauro <[log in to unmask]>
Reply To:
John Lauro <[log in to unmask]>
Date:
Sat, 3 Aug 2013 21:28:20 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (20 lines)
----- Original Message -----
> From: "Nico Kadel-Garcia" <[log in to unmask]>
>
> It's exceedingly dangerous in a production environment. I've helped
> run, and done OS specifications and installers for a system over
> 10,000 hosts. and you *never*, *never*, *never* auto-update them
> without warning or outside the maintenance windows. *Never*. If I
> caught someone else on the team doing that as a matter of policy, I
> would have campaigned to have them fired ASAP.


If you have to manage 10,000 hosts then you are lucky you never had to learn to deal with no maintenance window and 0 downtime, and so most of your maintenance had to be possible outside of a maintenance window.  That is how many IT shops with thousands of machines have to operate these days.  You might even want to read up on Netflix's thoughts on chaos monkey.  Autoupgrades are just another form of random outage you might have to deal with.  As long as you have different hosts upgrading on different days and times, and you have automated routines that test and take servers out of service automatically if things fail, then autogrades is perfectly fine. If things break from the autoupgrades, it becomes real obvious based on the update history of which machines broke from it.

Campaigning to have someone fired without even hearing their reason for upgrading, or even warning them first that at your location is is standard practice not to ever autoupgrade because you have a separate QA process that even critical security patches must go through is a very bad practice on your part.

I am not going to state what patch policy I use, only that different policies work for different environments.  Based on your statement, it sounds like you could be loosing some valuable co-workers by lobbying to get people fired that have a different opinion from you instead of trying to educate and/or learn from each other.  If you feel you can not learn from your peers, you have already proven you are correct in that respect, but you have also shown there is much you don't know by being incapable of learning new things.


(Personally I would hate to use Nagios for 10,000 hosts.  It didn't really scale that well IMHO, but to be honest I haven't bothered looking at it in over 4 years, and maybe it's improved.  Not familiar with Icinga, but I have had good luck with Zabbix for large scale)

ATOM RSS1 RSS2