Using TACC cluster

Useful information

  • TACC getting started guide
  • TACC conda and advanced guide
  • Run Jupyter in browser via TACC Vis portal.
  • Notes on getting started with Python and conda here
  • Notes on using SSH and configuring Lonestar6 here

Log in to your account

ssh USERNAME@ls6.tacc.utexas.edu 

You will be prompted to enter your password and provide a two-factor authentication. Currently, you need to type in a physical six-digit code from your phone; you can’t approve a push notification (as far as I’m aware).

General guidelines and quotas

  • $HOME is your home directory on TACC. This has a strict storage limit of 10 GB, but it is backed up. This is not a suitable place for long-term storage of large files. I mainly just keep basic scripts there. Check your disk usage with du -sh ./*/
    • This directory recieves a full backup every few months, and an incremental backup every few days. You can open a support ticket to retrieve files, if needed.
  • $WORK does not have the same storage limit, but it is also not backed up. For machine learning, I have found it necessary to install conda within $WORK, since environments can get pretty large

Most of the time you want to work in the $WORK folder. This is also the best place to install large programs, due to the file size limits on other partitions.

cd $WORK

Submitting a job

Now submit your job

sbatch FILENAME.sbatch

Check on your job status

squeue -u USERNAME

Using conda on Lonestar

Since conda is not installed by default, you cannot currently use module load conda as you would on Stampede

In your user folder, make a directory called src and then use wget to download the Miniconda 64 bit Linux installer into the folder (go to the Miniconda page, find the first Linux installer, and right click to copy the link address).

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Follow the Miniconda instructions to run the installer bash script. When asked for an install location, specify $WORK

Add your conda install to your path

export PATH="~/miniconda/bin:$PATH"

test that everything is working

conda info --envs

You can activate, install packages, etc as you would for a local conda environment.

As of writing, automatically loading conda environments within batch scripts can be a bit challenging. Currently, the following lines work within a .sbatch file. See full sbatch template here

module load gcc
source ~/work/miniconda3/etc/profile.d/conda.sh
conda init bash
conda activate ~/work/mambaforge/envs/dedalus
python kolmo.py

Using mamba on Lonestar6

Install from the command line. During installation, manually install mambaforge to $WORK instead of the top level

You will need to manually add the condaforge bin to your path

export PATH="~/work/mambaforge/bin:$PATH"

Also try manually running init

~/work/mambaforge/condabin/mamba init

Using PyTorch on Lonestar6

In a clean conda environment, install torch and torchvision

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

To use a GPU, make sure that you load a matching version of CUDA in all of your .sbatch scripts

module load cuda/12.0

Using Jupyter notebooks on TACC

In a web browser, visit the TACC Vis portal.

Create a job with your desired resources and partition. Leave the “reservation” field blank.

Using JAX with GPU on TACC

Make sure that you install in your environment a version of JAX that matches the version of CUDA that you are running. The available JAX versions are listed in the JAX installation guide

For example, installing CUDA 12

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Next, make sure that all of your .sbatch scripts load the matching CUDA version

module load cuda/12.0

You can check that everything is working by opening a Python prompt and running

import jax
jax.devices()

A GPU should appear somewhere

Using zsh shell on TACC

Instructions by Jeffrey Lai

Instructions for downloading zsh + oh-my-zsh on LS6:

First download zsh from source:

wget -O zsh.tar.xz https://sourceforge.net/projects/zsh/files/latest/download
mkdir zsh && unxz zsh.tar.xz && tar -xvf zsh.tar -C zsh --strip-components 1
cd zsh

Compile it into the home directory:

./configure --prefix=$HOME
make
make install

Place the following lines at the bottom of $HOME/.bashrc

rm -rf $HOME/.zcompdump*
exec $HOME/bin/zsh -l

Restart the shell to run zsh (reset or zsh). Then install oh-my-zsh:

sh -c "$(curl -fsSL https://raw.githubusercontent.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"

Then place the following line at the top of the $HOME/.zshrc file

export FPATH=$HOME/share/zsh/5.9/functions:$FPATH

Finally, restart the shell and enjoy the features of oh-my-zsh!

Template for SLURM on TACC {template}

#!/bin/bash
# Job name:
#SBATCH --job-name=myjob
#
# Account to charge:
#SBATCH --account=[put lab account name here]
#
# Pick partition to run on:
#SBATCH --partition=gpu-a100
#
# File where job progress and standard output is written
#SBATCH --output=myjob.out
#
# File where job errors are written
#SBATCH --error=myjob.err      
#
# Request only one node:
#SBATCH --nodes=1
#
# memory per node: (uses full node memory if set to zero)
#SBATCH --mem=0  
#
# number of tasks
#SBATCH --ntasks=1
#
# Processors per task:
#SBATCH --cpus-per-task=2
#
# Wall clock limit: HH:MM:SS. Max is 48 hours on most nodes
#SBATCH --time=05:30:00
#
## Command(s) to run
python scripts/my_program.py 

For this example, you can check standard out and errors while the job is in progress by running

cat myjob.err
cat myjob.out