SCIENTIFIC-LINUX-USERS Archives

December 2020

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"~Stack~" <[log in to unmask]>
Reply To:
~Stack~
Date:
Sat, 12 Dec 2020 08:42:02 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (51 lines)
On 12/11/20 10:09 AM, Brett Viren wrote:
> My hope is they (we) take this current situation as a lesson and make a
> radical change that puts all of our computing on more sustainable
> footing as we go into the next decades.

I'm curious about your thoughts on what it means to have that 
sustainable footing going forward.

We have been pushing our users to Singularity images for the last two 
years (we jumped on pretty early). A LOT of our application/code base is 
already Singularity behind the scenes. The users don't know and don't 
care because their applications still run the same on the same HPC 
equipment. However, getting our users to purposefully think in terms of 
Singularity images has been a long hard road and we still have so much 
further to go.

We are on the edge of shifting a few very critical and heavy 
computations to Kubernetes. I'm not yet convinced that it will replace a 
lot of the hard-core traditional HPC workloads anytime soon, but there 
are a surprising amount of workloads that can. Plus, it allows us to 
automate from Code->Gitlab->CI/CD->Kubernetes->results delightfully well.

But one of the absolute greatest things about it from the perspective of 
what CentOS just pulled is that my dev Kubernetes has three OS's. SL7, 
Ubuntu 20.04, and CentOS 8 (I JUST spun this up the Monday before the 
announcement). As an admin, I _don't_ care about the OS at this point of 
the Kubernetes process. I kill a node and rebuild it to anything that 
supports the docker requirements (plus a few other things I need for 
company audit/security) and join it to the cluster. Done! When I killed 
that CentOS 8 node I suffered no loss in the slightest in terms of 
functionality and only about an hour of time where I had to move the 
workload and rebuild the node Ubuntu.

Bigger shops with decent sized teams, these transitions can be done over 
time. But the vast majority of my career I've supported hundreds of 
compute nodes where the entire HPC team was just me plus my manager and 
we had to support the clusters for 5-8 years (especially when I was in 
the university world). I sympathize with the small HPC teams that just 
don't have the time nor flexibility to migrate. Although, I would 
HEAVILY suggest that they make the time to learn Singularity I don't 
expect them to make the transition to Kubernetes without some drastic 
changes.

I'm just curious what you are thinking about what it means to have a 
more sustainable footing within these clusters and what we as a 
community can do to lead the way such that in the next decades it 
matters less what OS is running on the hardware of these long term 
science HPC clusters.

~Stack~

ATOM RSS1 RSS2