Subject: | |
From: | |
Reply To: | ~Stack~ |
Date: | Sat, 12 Dec 2020 08:42:02 -0600 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
On 12/11/20 10:09 AM, Brett Viren wrote:
> My hope is they (we) take this current situation as a lesson and make a
> radical change that puts all of our computing on more sustainable
> footing as we go into the next decades.
I'm curious about your thoughts on what it means to have that
sustainable footing going forward.
We have been pushing our users to Singularity images for the last two
years (we jumped on pretty early). A LOT of our application/code base is
already Singularity behind the scenes. The users don't know and don't
care because their applications still run the same on the same HPC
equipment. However, getting our users to purposefully think in terms of
Singularity images has been a long hard road and we still have so much
further to go.
We are on the edge of shifting a few very critical and heavy
computations to Kubernetes. I'm not yet convinced that it will replace a
lot of the hard-core traditional HPC workloads anytime soon, but there
are a surprising amount of workloads that can. Plus, it allows us to
automate from Code->Gitlab->CI/CD->Kubernetes->results delightfully well.
But one of the absolute greatest things about it from the perspective of
what CentOS just pulled is that my dev Kubernetes has three OS's. SL7,
Ubuntu 20.04, and CentOS 8 (I JUST spun this up the Monday before the
announcement). As an admin, I _don't_ care about the OS at this point of
the Kubernetes process. I kill a node and rebuild it to anything that
supports the docker requirements (plus a few other things I need for
company audit/security) and join it to the cluster. Done! When I killed
that CentOS 8 node I suffered no loss in the slightest in terms of
functionality and only about an hour of time where I had to move the
workload and rebuild the node Ubuntu.
Bigger shops with decent sized teams, these transitions can be done over
time. But the vast majority of my career I've supported hundreds of
compute nodes where the entire HPC team was just me plus my manager and
we had to support the clusters for 5-8 years (especially when I was in
the university world). I sympathize with the small HPC teams that just
don't have the time nor flexibility to migrate. Although, I would
HEAVILY suggest that they make the time to learn Singularity I don't
expect them to make the transition to Kubernetes without some drastic
changes.
I'm just curious what you are thinking about what it means to have a
more sustainable footing within these clusters and what we as a
community can do to lead the way such that in the next decades it
matters less what OS is running on the hardware of these long term
science HPC clusters.
~Stack~
|
|
|