OpenStack as an HPC Provider

  • By Jamie Duncan
  • Thu 01 December 2016
  • Updated on Sat 10 December 2016

At SuperCompute '16 I spent a lot of time talking about OpenStack

OpenStack is the leading open source IaaS model out there right now. It was started in 2010 by NASA Ames and Rackspace Hosting.

OpenStack (and virtualizaton in general) has matured a lot over the last few years. Performance inside a virtual machine is now high enough that it is a legitimate landing zone for an increasing number and type of HPC workloads.

In the Red Hat booth on the floor at SC '16, I had a demo where I deployed multiple Slurm Workload Manager clusters inside an OpenStack instance and had them ready to do work. The demo accented a few key features of OpenStack, and even a little bit of Ansible to help show how they can work together.

Here’s a great interview with Dan McGuan talking about exactly this thing.

Demo Platform

I built out the demo in a Red Hat OpenStack 8 installation we maintain in a lab environment in our headquarters in Raleigh. It is a pretty standard install that is using Nuage Networks for the SDN solution.

With that said, this demo should run on any Openstack platform with minimal, if any, changes.

Slurm

The workload manager I picked for the initial demo was Slurm. It’s widespread and open source. I am also currently working on a version of the demo using Univa Grid Engine.

The Slurm source is easy enough to build, and I found a solid howto that outlined how to build Slurm RPMs and configure it.

I started with the publicly available RHEL 7.2 qcow2 image.

I ended up with two images that I used in my deployment.

Master Node Image

I used the master image as the template for both. I followed the walkthrough and enabled slurmctld along with slurmd via systemctl. A few other highlights:

  • I set up the hosts file with prescripting static IPs for each node

  • I set up ssh keys on the master node so it could ssh to itself and each other node

  • I enabled ntpd so time would effectively sync on each node (massively important for HPC workloads)

Worker Node Image

The worker node image is the exact same as the Master image, with the only difference being that slurmctld is disabled via systemctl.

Code Repository

I created a github repository to hold the other configuration components I need.

Creating the Slurm Cluster

To deploy a cluster, I deploy the Heat Stack, either via the command-line, API, or GUI

Figure 1. Specifying the HOT
Figure 2. Executing the template

The deployment typically takes about 90 seconds, depending on the OpenStack instance you are using.

Figure 3. Heat Template Map
Figure 4. Network Topology

Highlights

Each Slurm cluster is 5 nodes, and have the same hostnames and IP addresses. This is possible because each cluster also creates its own SDN network and router. So you don’t get IP/Hostname collisions, and your jobs can be easier to copy/paster from.

With each cluster having its own router, you can also deploy the same HOT multiple times.

Security Basics

The only system that has a Floating IP is the control node. This allows you to access the cluster easily via ssh. You could easily restrict this access further, depending on your needs. With this being a demo, I wanted to be able to easily access the system.

Figure 5. Instance List for a cluster

Preparing the Slurm Cluster

Initial login

You can log into the master node as cloud-user (remember, it’s the standard RHEL 7.2 image). Inside the heat stack, you can specify the ssh key public key you want supplied to each of the hosts.

Figure 6. Initial login to the master node and running the prep playbook

Lab Prep with Ansible

The github repository is checked out into /root/rhissr on the master node image. From that directory you can run the lab prep playbook

[[email protected] ~]# cd rhissr/
[[email protected] rhissr]# ansible-playbook -i inventory reset_lab.yaml

This will do a few things

  • stop all slurm services

  • clear out the slurm spool directories

  • restart the slurm services

This ensures the systems are functioning cleanly and able to communicate with one another.

Running a job using sbatch

Now that you have a fully functional slurm cluster, let’s run a simple job to ensure we are running correctly.

[[email protected] rhissr]# cat sbatch.sh
#!/usr/bin/env bash

# Usage: sbatch -N5 $this_file

#SBATCH -o slurm.out
#SBATCH -p sc16
#SBATCH -D /tmp

srun hostname |sort
[[email protected] rhissr]# sbatch -N5 sbatch.sh
Submitted batch job 3
[[email protected] rhissr]# squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[[email protected] rhissr]# cat /tmp/slurm.out
node0.example.com
node1.example.com
node2.example.com
node3.example.com
node4.example.com
[[email protected] rhissr]# cat sbatch.sh
#!/usr/bin/env bash

# Usage: sbatch -N5 $this_file

#SBATCH -o slurm.out
#SBATCH -p sc16
#SBATCH -D /tmp

srun hostname |sort
[[email protected] rhissr]# sbatch -N5 sbatch.sh
Submitted batch job 3
[[email protected] rhissr]# squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[[email protected] rhissr]# cat /tmp/slurm.out
node0.example.com
node1.example.com
node2.example.com
node3.example.com
node4.example.com

This script is incredibly simple. It just goes out to each slurm node and grabs its hostname to confirm all is working properly.

Scaling from 5 to 50 to 500

Right now, to add additional nodes, you copy and paste a few stanzas in the HOT.

Next Steps

  • Make a more science-y demo

  • Finish the Univa Grid Engine variant

  • Make scaling easier

Summary

OpenStack is an effective solution for an increasing number of HPC workloads. This demo demonstrates how you can take OpenStack and Ansible and quickly create a viable HPC platform. The inherently multi-tenant nature of OpenStack means that multiple scientists can run their jobs simultaneously. At the end of the day, it’s all about "time-to-science". This can help make that time shorter.

openstack-hpc

tags: hpcsc16