Introduction

Outline

  • Day 1
    • Introduction to Pronghorn
    • SLURM
    • Connecting to Pronghorn

Day 1

Introduction to Pronghorn

Pronghorn is the University of Nevada, Reno’s High-Performance Computing (HPC) cluster. The GPU-accelerated system is designed, built and maintained by the Office of Information Technology’s HPC Team. Pronghorn and the HPC Team supports general research across the Nevada System of Higher Education (NSHE).

Pronghorn is composed of CPU, GPU, Visualization, and Storage subsystems interconnected by a 100Gb/s non-blocking Intel Omni-Path fabric. The CPU partition features 108 nodes, 3,456 CPU cores, and 24.8TiB of memory. The GPU partition features 44 NVIDIA Tesla P100 GPUs, 352 CPU cores, and 2.75TiB of memory. The Visualization partition is composed of three NVIDIA Tesla V100 GPUs, 48 CPU cores, and 1.1TiB of memory. The storage system uses the IBM SpectrumScale file system to provide 2PiB of high-performance storage. The computational and storage capabilities of Pronghorn will regularly expand to meet NSHE computing demands.

Pronghorn is collocated at the Switch Citadel Campus located 25 miles East of the University of Nevada, Reno. Switch is the definitive leader of sustainable data center design and operation. The Switch Citadel is rated Tier 5 Platinum, and will be the largest, most advanced data center campus on the planet.

Pronghorn is available to all University of Nevada, Reno faculty, staff, students, and sponsored affiliates. Priority access to the system is available for purchase.

First up, let’s talk about what a high-performance computer (HPC) is: really, it is a bunch of individual computers (“nodes”), just like the ones you are using, strung together with networking cables, with the ability to deploy “jobs” (some computational task you are trying to accomplish) across multiple nodes easily. As such, we can determine how many cores we have access to by counting the number of cores on each individual node, and summing them all up. Pronghorn has 3,456 CPU cores that (in theory) we have access to! In a perfect world (more on that later), you COULD divide the amount of time it takes to do a job by the number of cores you throw at it. With Pronghorn, you could theoretically do 10 YEARS of sequential calculations in less than one day! Put another way, Pronghorn’s capabilities are 864 times faster than my Windows machine.

Your desktop or laptop is all yours, generally, so you aren’t sharing the resources with anyone else. You’ve effectively pre-paid for ~ 5 years if computational time (warranty!) times the number of cores you have, so I’ve bought about 20 years of CPU-time on my Windows desktop and 40 years of CPU-time on my Mac laptop. Pronghorn, assuming a 5 year lifespan, has 17,280 years (!) of CPU-time, all of which was purchased in advance. While you are probably ok with your laptop/desktop just sitting there idle not doing much, a research computer like Pronghorn is designed to be used at near-capacity! Also, this is a SHARED MACHINE and as such much of the process getting your programs to run on it requires some understanding of how the system shares its resources amongst all the users! Enter SLURM.

SLURM

SLURM is what is known as a workload manager. SLURM’s job is to take the vast number of different jobs sent to it by all users in the system, reserve “resources” (# of nodes per job, # of cores per node, memory per job), and then execute the jobs based on the user or association’s priority.

A “job” is basically the top level of what you are trying to accomplish – a workflow, set of commands/programs to run, etc. Typically we define a single job at a time and submit it to the SLURM system. Within the job are “steps” which can be running sequentially or in parallel depending on the particulars of your workflow. A step consists of one or more “tasks”. Each “task” runs on one or more “cpus” (cpu is the same as a logical core in SLURM parlance). Parallelization can occur at multiple levels: job, step, and task.

SLURM uses a “job script” written in any interpreted language that uses “#” as the comment character– typically we’ll use the “bash” language to create a job. This job script follows a very specific format that you will get familiar with. Your job script 1) tells SLURM what resources you need, and 2) once the resources are allocated, what programs to execute and how to allocate the resources to those programs.

As a general rule, Pronghorn is a BATCH system, which means you will focus on jobs that do not require user interaction, and will often be deferred (run at some time in the future). While you CAN run “interactive jobs” on Pronghorn, this should be minimized wherever possible. Interactive jobs typically idle resources quite a bit.

Introduction to Unix/Linux for Bioinformatics

Pronghorn HPC uses Linux as the underlying operating system. Understanding how to use a unix environment and terminal to interact with files and folders is very important to bioinformatics. A lot of bioinformatic software is meant to be run on the command-line. This training will enable you to feel confident running these command-line tools, moving, copying, and viewing files.

The Unix operating system has been around since 1969. Back then there was no such thing as a graphical user interface. You typed everything. It may seem archaic to use a keyboard to issue commands today, but it’s much easier to automate keyboard tasks than mouse tasks. There are several variants of Unix (including Linux), though the differences do not matter much for most basic functions.

Increasingly, the raw output of biological research exists as in silico data, usually in the form of large text files. Unix is particularly suited to working with such files and has several powerful (and flexible) commands that can process your data for you. The real strength of learning Unix is that most of these commands can be combined in an almost unlimited fashion. So if you can learn just five Unix commands, you will be able to do a lot more than just five things.

Connecting to Pronghorn

In order to connect to pronghorn, we will be using ssh to connect to the remote server. Below is how you would connect using a Linux or Mac OSX computer using a TERMINAL program. Both of these operating systems should have a terminal installed by default.


ssh yournetidhere@pronghorn.rc.unr.edu 

If this is your first time connecting to the server, you may need to accept connecting to the remote server by typeing “yes”.


The authenticity of host 'pronghorn.rc.unr.edu (134.197.76.4)' can't be established.
ED25519 key fingerprint is SHA256:QAFX5eUaSvFi3/+IRuP6Zm8RM6OcGRZb5vySBgq/yZ4.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? Yes

You will then be prompted to type in your netid password. As you type, the cursor will not move/display text in order to keep your password secure.

An alternative terminal that I like on OSX is iTerm https://iterm2.com/

Windows does not have ssh functionality built into the system. You will need to setup a program in order to remotely connect to Pronghorn. Visit one of the websites and download the appropriate installation file for your computer.

PUTTY (installation) https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

WSL (tutorial to enable WSL) https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10

Once installed, you will configure the connection information similar to above.