Cheat Sheet
Version 0.1 • Best Before 2025-12-31
Getting help 🛟¶
Description | |
---|---|
help@sharcnet.ca | For SHARCNET specific issues |
accounts@tech | Questions about accounts |
renewals@tech | Questions about account renewals |
globus@tech | Questions about Globus file transfer services |
cloud@tech | Questions about using Cloud resources |
allocations@tech | Questions about the Resource Allocation Competition |
support@tech | For any other questions or issues |
Training courses 🏫¶
Connecting to nibi 🔗¶
Using cluster nibi 💦¶
Using modules¶
Command | Description |
---|---|
module avail | To list all available modules. Check this link https:// |
module list | To list preloaded modules |
module spider keyword | To search for a module by keyword |
module load foo[/ver] | To load module foo [version ver] |
Most commonly used Linux commands¶
Command | Description |
---|---|
ls | List files and directories in the current directory |
cd | Change directory, e.g. |
cd DIR | Go to directory DIR |
cd | Go back to home directory |
cd .. | Go back to the previous directory |
pwd | Show current directory |
mkdir | Make directories, e.g. |
mkdir dir1[ dir2[ … ]] | Make directories |
mkdir -p path/to/dir | Make directory recursively |
cp source dest | Copy files |
mv source dest | Move or rename files and directories |
find | Find a file or directory that matches certain criteria |
du -sh | Find the disk usage |
man command | See the manual page of command |
quota | Find the disk quota |
Slurm commands¶
The Slurm scheduler has a rich set of commands, one needs to refer to the Slurm documentation for details. The following is a list of commonly used Slurm commands:
Note
One needs to create a job submission script per job. It is a Shell script. The following is a sample script, named submit_ajob.sh
:
- To submit a job using job submission script
submit_ajob.sh
:
sbatch submit_ajob.sh
- To see the history of your jobs, use command
sacct
, with options (check the Slurm documentation or the man page ofsacct
):
sacct -j jobid
sacct -u $USER –starttime t1 –endtime t2
sacct -u $USER -o ReqCPUs,ReqMem,NNodes,Starttime,Endtime
- To cancel a job:
scancel jobid
- To see the system information:
sinfo
- To see your queued jobs:
squeue -u $USER
- To see the fairshare:
sshare
- To see the epilogue of a job:
seff jobid
- To allocate cores and/or nodes and use them interactively:
salloc –account=def-my_group_account –ntasks=32 –time=1:00
salloc –account=def-my_group_account –mem=0 –nnodes=1
Sample script for submitting a serial job¶
#!/bin/bash
#SBATCH --time=00-01:00:00 # DD-HH:MM
#SBATCH --account=my_group_account
module load python/3.6
python simple_job.py 7 output
To submit the job, run the following command
sbatch submit_ajob.sh
Sample script for submitting multiple jobs¶
#!/bin/bash
#SBATCH --time=01:00
#SBATCH --account=my_group_account
#SBATCH --array=1-200
python simple_job.py $SLURM_ARRAY_TASK_ID output
Sample script for submitting multicore threaded jobs¶
#!/bin/bash
#SBATCH --account=my_group_account
#SBATCH --time=0-03:00
#SBATCH --cpus-per-task=32
#SBATCH --ntasks=1
#SBATCH --mem=20G
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./myprog.exe
Sample script for submitting multiprocess parallel jobs¶
#!/bin/bash
#SBATCH --account=my_group_account
#SBATCH --time=5-00:00
#SBATCH --ntasks=100
#SBATCH --mem-per-cpu=4G
srun ./mympiprog.exe
Sample script for submitting a GPU job¶
#!/bin/bash
#SBATCH --account=my_group_account
#SBATCH --time=0-03:00
#SBATCH --gpus-per-node=h100:2
#SBATCH --mem=20G
./myprog.exe
Sample script for submitting a hybrid MPI-threaded job¶
#!/bin/bash
#SBATCH --account=my_group_account
#SBATCH --time=0-03:00
#SBATCH –ntasks=16 # MPI ranks
#SBATCH –cpus-per-task=4 # threads
#SBATCH --mem=20G
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun –cpus-per-task=$SLURM_CPUS_PER_TASK ./myprog
Requesting scheduling jobs by node¶
A sample submission script:
#!/bin/bash
#SBATCH --account=my_group_account
#SBATCH --time=0-03:00
#SBATCH –ntasks=16 # MPI ranks
XXXXXXXXXXXXXXXXXXXX
Using Python 🐍¶
To create a virtual environment with NumPy as example Python package
module load python/3.12
virtualenv --no-download ~/ENV
source ~/ENV/bin/activate
pip install --no-index --upgrade pip
pip install --no-index numpy
If you need a specific version of a module, then install it like this:
pip install --no-index numpy==1.26.4
To flag --no-index
will install a package from our wheelhouse. These are always preferable to installing from the internet as they will be tuned to run on our systems.
To see the available wheels for a particular version of Python, use
avail_wheels numpy --all_versions -p 3.12
or see https://
Using Apptainer 🚢¶
Some packages are difficult to install in our Linux environment. The alternative is to install them in a container. Here is an example with Anaconda and Numpy.
Create file image.def
with
Bootstrap: docker
From: mambaorg/micromamba:latest
%post
micromamba install -c conda-forge numpy
Build image with
module load apptainer
apptainer build image.sif image.def
Run python in image with
apptainer run image.sif python
DRAC clusters across Canada 🌐¶
Cluster | Cores | GPUs | Max memory | Storage |
---|---|---|---|---|
fir | 165,120 | 640 | 49TB | |
nibi | 25TB | |||
trillium | ||||
rorqual | ||||
narval |
nibi specs 📈¶
Nodes | Cores | Memory | CPU | GPU |
---|---|---|---|---|
700 | 192 | 768GB DDR5 | 2 x Intel 6972P @ 2.4 GHz, 384MB cache L3 | - |
10 | 192 | 6TB DDR5 | 2 x Intel 6972P @ 2.4 GHz, 384MB cache L3 | - |
36 | 192 | 1.5TB | 1 x Intel 8570 @ 2.1 GHz, 300MB cache L3 | 8 x Nvidia H100 SXM (80 GB memory) |
6 | 96 | 512GB | 4 x AMD MI300A @ 2.1GHz | 4 x AMD CDNA 3 (128 GB HBM3 memory - unified memory model) |