H100 Usage Guide

Limited Access

The H100 GPU is currently only available to select groups.

Our H100 node, known as h1gpu01, has two H100 cards in it capable of splitting up into multiple ways to allow for a mix of performance versus capacity. This is done using the Multi-Instance GPU (MIG) feature built into certain NVIDIA cards.

Specifications

Node Specs


CPU	AMD EPYC 9354 32-Core (x2)
CPU Cores (Total)	64
CPU Clock (Base/Boost)	3.25 GHz / 3.75 GHz
Memory	384 GB
Local Scratch	800 GB

GPU+MIG Set Up

Quantity	VRAM	Slurm Setting
7	11GB	--gres=gpu:h100_11gb
1	94GB	--gres=gpu:h100_94gb

Last Updated: 9/08/2025

Capacity vs Performance - Changes Often

This server is always changing based on performance vs capacity needs. Depending on what is needed by faculty for classes and projects, the H100 may be adjusted at any time to match those needs.

Before you use the H100, we recommend viewing the table above to see the current configurationa and adjust your scripts accordingly.

Limited Access

The following accounts are currently approved to use the H100 server. Use myaccounts to see which accounts your user account is under. If you need access to this server for your research, please contact us to discuss options.

gomesr_reu
gomesr_pdac_scans
gomesr_RAG_2

If you are in one of these accounts and your default is not listed above, you may need to add --account=XXX to your sbatch or sinteract command to use the H100s.

sinteract Example

sinteract --account=my_account --partition=h1gpu --gres=gpu:100_11gb

sbatch Example

sbatch --account=my_account my-script.sh

or add it to your Slurm script

#SBATCH --account=my_account

Slurm Settings

Running jobs on the H100 GPU requires a few changes that are different from other GPU nodes. This includes using a special partition (h1gpu0) and specifying which VRAM configuration you'd like to use (gres).

When specifying --gres=gpu:h100_XXX, make sure to review the GPU+MIG Set Up section to know exact memory options as they can change at any time.

sinteract

sinteract --partition=h1gpu --gres=gpu:h100_11gb

See sinteract Guide »

sbatch

Account: #SBATCH --account=ABC
Partition: #SBATCH --partition=h1gpu
GRES: #SBATCH --gres=gpu:h100_11gb (Instead of --gpus=)
Time: Max 2 Days

Slurm-Script.sh

#SBATCH --account=my-research-group
#SBATCH --partition=h1gpu
#SBATCH --gres=gpu:h100_94gb
#SBATCH --time=2-00:00:00

module load python-libs
conda activate my-env
python script.py

See sbatch Guide »

Known Issues

Full usage of the H100 is still in development and requires several changes that are being worked out. Some of these are specific to the use of MIG in the H100, and some of them are related to other components in the compute node that are newer than other servers in our cluster.

The 11gb option is not currently available to other groups automatically. (94GB is to remain restricted)
Metrics dashboard does not currently show usage metrics of GPU cards due to how NVIDIA reports stats about MIG-based cards.
Slurm does not track GPU utilization or memory for statistics.
The high-speed slingshot network connected to the storage servers is currently not available (impacts heavy read/write operations).