Determining Resources
One of the challenging parts of using a computing cluster is identifying the resources you'll need to request when you submit a job, such as memory and cpu cores. The goal of this page is to help provide some guidance on various resources, and tips you can use to 'find the sweet spot' between performance and time that best works for you.
"Between performance and time" (Click to open)
When determining resources it can be easy to want to just request the most possible so your program runs in the least amount of time.
However, not all software is able to take advantage of additional resources, or scales perfectly before changes start to have minimal impact on time.
Additionally, if you request more resources than what is currently available on the cluster (such as requesting eight nodes but only four are open), your job may have to wait a long time in the queue before it is able to start.
All of these resource requests are controlled through the #SBATCH --field=value options that you specify in your Slurm script. You can also append these to your Slurm commands to only take effect for a single job, such as sbatch --field=value my-script.sh or sinteract --field=value. These will override any defaults, or what is specified in your sbatch script.
Partition
Clusters are broken up into what is known as 'partitions'. This is a feature in job scheduling systems, like Slurm, to group together machines based on aspects such as hardware support, time limits, or machines dedicated to a certain group.
Determining A Partition
The majority of jobs that use our computing infrastructure fall under the week and GPU partitions, but some may need to run longer or have access to more memory.
| Partition | Usage |
|---|---|
| week | For most CPU-based jobs that can run up to a week - this is the default if not specified |
| GPU | For jobs that want to use a graphics card - see GPU (Graphics Processing Unit) |
| month | For jobs that need to run up to a month ('batch' on BGSC) |
| highmemory | (BOSE Only) For jobs that need more than 250 GB of memory, for a max of 1 TB per node |
Each cluster has its own set of partitions, which you can view in more detail by clicking the button below.
Setting A Partition
Partition are set through the --partition=X option in Slurm.
Batch (sbatch) Mode:
In your slurm script:
#SBATCH --partition=GPU
Temporary - only for a single job:
sbatch --partition=gpu my-script.sh
Interactive Mode:
sinteract --partition=GPU
Time (Walltime)
Walltime, which could be considered a time limit, is a set of time that a job is able to run for. Once a job runs past its set walltime, it'll automatically terminate, or requeue if enabled. When determining what time a pending/queued job is going to start, Slurm looks at the walltime of all of the other submitted jobs to provide a best estimate. To assist with scheduling, it's important to have a job's walltime be as accurate as possible.
Most of our partitions are set to a default maximum walltime of 7 days.
Determining A Walltime
If you are unsure how long your script will run for, feel free to let it use the default walltime of 7 days on the 'week' or 'GPU' partition. This is done by not specifying a time in your Slurm script.
Setting A Walltime
Setting your jobs walltime is done using the --time option in Slurm. It's typical format is as followed:
#SBATCH --time=DD-HH:MM:SS
Tracking Walltime
Once your job completes, you can either view the email notification you received, if enabled, or run myjobreport jobidhere to view the Elapsed Time and compare it to the initially requested Time Limit.
Use this number to better inform your next run of similar jobs.
Need An Extension?
If you have a running job that is nearing its time limit (run myjobs) and it is not able to be restarted without significant loss of work, you can contact us to request an extension.
Please note that requests are subject to approval of the HPC Team and may be limited to accommodate other users on the cluster. This is especially true for limited resources such as GPUs and high-memory nodes.
CPUs and Cores
Determining Number of Cores
Some keywords to look out for are "parallel", "multiprocessing", "multicore", "number of processes", and "MPI".
If you see any of those terms, then these are indications that your program may work well with increasing the number of CPU cores.
Setting Number of Cores
You can set the number of cores available to your job with #SBATCH --ntasks=x (where x is the number of cores you want) or with sinteract sinteract --ntasks=x.
Tracking CPU Usage
You can track how much of the available CPUs are used by your job with myjobreport.
Memory
Determining Memory Usage
Memory is less determinant in the software you are running, as a small, simple program can use as much memory as a very large and complex one. Memory usage is usually allocated based on what kind of data you're processing.
When a program loads data into memory, it usually takes up more memory space than disk space (because of how the program formats it for use). If you're processing a dataset which is 10 GB, you likely need to allocate 10 GB + memory used by the program, so try starting with 12 GB and go from there.
Setting Memory
You can set the amount of RAM memory available to your job with #SBATCH --mem=x (where x is the amount of memory you want) or with sinteract sinteract --mem=x.
Memory is formatted as the two letter representation for the unit, so
--mem=1MB= 1 megabyte--mem=1GB= 1 gigabyte--mem=1TB= 1 terabyte (unavailable except for on high-memory nodes)
Tracking Memory Usage
You can track how much of the available memory is used by your job with myjobreport.
GPU (Graphics Processing Unit)
Limited Resource
This is a limited resource and may result in some wait time before your job begins. Use scontrol show node gpu[01-04] and look for AllocTRES (Used) and CfgTRES (Total) to see current available resources per node.
savail will also show you available CPU cores and memory, but it does not currently take into account the number of GPU cards available.
Determining GPU Usage
To take advantage of a GPU, programs need to be specifically built to support using them. Keywords to look out for is CUDA or GPU accelerated.
Setting GPU Usage
To request a GPU, use the GPU partition #SBATCH --partition=GPU, and request how many GPUs you want to use #SBATCH --gpus=1.
By default, choose one GPU unless you know your program supports and benefits from distributing work across multiple devices. Users are able to request up to three GPUs on a single machine.
Tracking GPU Usage
You can track how the available GPUs are used by your job with myjobreport.
Still Not Sure?
Figuring out what you'll need is not always easy, and may take many attempts to see what does or does not work. If you get stuck, we encourage you to reach out to us. We'd be glad to work with you to see what we can do to make your research or project a success.
Just let us know:
- Your plans for the project -- What are you doing?
- What software you are using -- Every program is different, so this will give us a starting point
- What you have and have not tried -- Run into any errors? Something not work the way you expected?