Skip to content

Determining Resources

In Progress

This guide is still under active development and will be completed before official launch of website.

One of the challenging parts of using a computing cluster is identifying your resource needs for when you submit a job, such as memory and cpu cores. The goal of this page is to help provide some guidance on various resources, and tips you can use to 'find the sweet spot' that best works for you.

All of these resource requests are controlled through the #SBATCH --field=value options that you specify in your Slurm script. You can also append these to your Slurm commands to only take effect for a single job, such as sbatch --field=value my-script.sh or sinteract --field=value. These will override any defaults, or what is specified in your sbatch script.

Partition

Clusters are broken up into what is known as 'partitions'. This is a feature in job scheduling systems, like Slurm, to group together machines based on aspects such as hardware support, time limits, or machines dedicated to a certain group.

Determining A Partition

The majority of jobs that use our computing infrastructure fall under the "week" and "GPU" partitions, but some may need more memory or time.

Partition Usage
week For most CPU-based jobs that can run up to a week - this is the default if not specified
GPU For jobs that need a graphics card
month For jobs that need to run up to a month ('batch' on BGSC)
highmemory (BOSE Only) For jobs that need more than 250GB of memory

Each cluster has its own set of partitions, which you can view in more detail by clicking the button below.

View Full List

Setting A Partition

Partition are set through the --partition=X option in Slurm.

Batch (sbatch) Mode:

In your slurm script:

#SBATCH --partition=GPU

Temporary - only for a single job:

sbatch --partition=gpu my-script.sh

Interactive Mode:

sinteract --partition=GPU


Time (Walltime)

Walltime, which could be considered a time limit, is a set of time that a job is able to run for. Once a job runs past its set walltime, it'll automatically terminate, or requeue if enabled. When determining what time a pending/queued job is going to start, Slurm looks at the walltime of all of the other submitted jobs to provide a best estimate. To assist with scheduling, it's important to have a job's walltime be as accurate as possible.

Most of our partitions are set to a default maximum walltime of 7 days.

Determining A Walltime

If you are unsure how long your script will run for, feel free to let it use the default walltime of 7 days on the 'week' or 'GPU' partition. This is done by not specifying a time in your Slurm script.

Setting A Walltime

Setting your jobs walltime is done using the --time option in Slurm. It's typical format is as followed:

#SBATCH --time=DD-HH:MM:SS

Tracking Walltime

Once your job completes, you can either view the email notification you received for the elapsed time, if enabled, or use the seff jobidhere command to show the "Job Wall-click time".

Use this number to better inform your next run of similar jobs.


CPUs and Cores

Determining Number of Cores

Some keywords to look out for are "parallel", "multiprocessing", "multicore", "number of processes", and "MPI".

If you see any of those terms, then these are indications that your program may work well with increasing the number of CPU cores #SBATCH --ntasks or even number of nodes (if supports MPI) #SBATCH --node.

Setting Number of Cores

Tracking CPU Usage


Memory

Determining Memory Usage

Setting Memory

Tracking Memory Usage

While it differs for each program, a good starting point is to have your memory #SBATCH --mem be at least as large as your biggest file.


GPU (Graphics Processing Unit)

Determining GPU Usage

To take advantage of a GPU, programs need to be specifically built to support using them. Keywords to look out for is CUDA or GPU accelerated.

Setting GPU Usage

To request a GPU, use the GPU partition #SBATCH --partition=GPU, and request how many GPUs you want to use #SBATCH --gpus=1.

By default, choose one GPU unless you know your program supports and benefits from distributing work across multiple devices. Users are able to request up to three GPUs on a single machine.

Tracking GPU Usage


Still Not Sure?

Figuring out what you'll need is not always easy, and may take many attempts to see what does or does not work. If you get stuck, we encourage you to reach out to us. We'd be glad to work with you to see what we can do to make your research or project a success.

Just let us know:

  • Your plans for the project -- What are you doing?
  • What software you are using -- Every program is different, so this will give us a starting point
  • What you have and have not tried -- Run into any errors? Something not work the way you expected?

Contact the HPC Team