LISTSERV - CMS_UAF_USERS Archives

CMS_UAF_USERS Archives

November 2019, Week 3

CMS_UAF_USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	CMS_UAF_USERS Home
	CMS_UAF_USERS November 2019, Week 3

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: LPC downtime morning of Weds Nov 20
From:	David A Mason <[log in to unmask]>
Reply To:	David A Mason <[log in to unmask]>
Date:	Wed, 20 Nov 2019 19:54:07 +0000
Content-Type:	text/plain
Parts/Attachments:	text/plain (37 lines)

OK -- we think we're back up.  NFS mounts on the SL7 nodes in particular were challenging after this downtime.  Sorry for the delay -- Let us know if you see any remaining issues!!!

--Dave

> On Nov 18, 2019, at 5:34 PM, David A Mason <[log in to unmask]> wrote:
> 
> Greetings,
> 
> We have a couple updates to this -- most importantly the downtime will need to start an hour earlier due to the current observed duration of filesystem syncs -- so the downtime we now expect to start at 7 AM.  It is probably safe to assume we'll be chasing down issues related to this throughout the morning.
> 
> ALSO -- the /publicweb mount, because it is hosted by a different server, will be out of action starting at 3 AM, and probably not back until systems are rebooted following the home area switch.
> 
> Reminder this downtime does not just affect the LPC interactives, but also desktop nodes which are mounting these filesystems.  
> 
> Thanks for your patience!
> 
> --Dave
> 
>> On Nov 15, 2019, at 9:59 AM, David A Mason <[log in to unmask]> wrote:
>> 
>> Good Morning!
>> 
>> The LPC home areas are being migrated to new hardware, and need a final sync for the migration to be complete.  This is scheduled to happen from 8AM-10AM  FNAL time on Nov 20.  During that time it is expected your home area will still be available, but read only.  Once the sync is complete it is expected we will need to reboot interactive nodes to make sure they're mounting the new storage properly.  After the change home areas are expected to behave the same as before, with the same quotas (but will be shinier and new looking).  
>> 
>> In addition to this we plan a minor EOS version upgrade.  We have begun upgrading the EOS storage nodes behind the scenes in the last few days, with the last step of that upgrade happening during this downtime on the 20th.  Expect some hopefully small amount of unavailability from EOS when that happens.
>> 
>> Also at this time we'll be implementing a limit on the size of the user sandbox transferred to LPC HTCondor jobs (which will still be about 10x the limit CRAB imposes on CRAB jobs).  After the limit is implemented condor_submit will fail for jobs with too large of an input sandbox.  It is recommended if a large sandbox is needed that it be brought in from EOS -- i.e. see the docs here:  https://uscms.org/uscms_at_work/computing/setup/batch_systems.shtml#condor_3
>> 
>> 
>> Thanks for your patience, more noise as the downtime is closer...
>> 
>> 
>> --Dave
>> 
>> 
>

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV