H100 Usage Guide
Limited Access
The H100 GPU is currently only available to select groups.
Our H100 node, known as h1gpu01, has two H100 cards in it capable of splitting up into multiple ways to allow for a mix of performance versus capacity. This is done using the Multi-Instance GPU (MIG) feature built into certain NVIDIA cards.
Specifications
Node Specs
| CPU | AMD EPYC 9354 32-Core (x2) |
| CPU Cores (Total) | 64 |
| CPU Clock (Base/Boost) | 3.25 GHz / 3.75 GHz |
| Memory | 384 GB |
| Local Scratch | 800 GB |
GPU+MIG Set Up
| Quantity | VRAM | Slurm Setting |
|---|---|---|
| 7 | 11GB | --gres=gpu:h100_11gb |
| 1 | 94GB | --gres=gpu:h100_94gb |
Last Updated: 9/08/2025
Capacity vs Performance - Changes Often
This server is always changing based on performance vs capacity needs. Depending on what is needed by faculty for classes and projects, the H100 may be adjusted at any time to match those needs.
Before you use the H100, we recommend viewing the table above to see the current configurationa and adjust your scripts accordingly.
Limited Access
The following accounts are currently approved to use the H100 server. Use myaccounts to see which accounts your user account is under. If you need access to this server for your research, please contact us to discuss options.
- gomesr_reu
- gomesr_pdac_scans
- gomesr_RAG_2
If you are in one of these accounts and your default is not listed above, you may need to add --account=XXX to your sbatch or sinteract command to use the H100s.
sinteract Example
sbatch Example
or add it to your Slurm script
Slurm Settings
Running jobs on the H100 GPU requires a few changes that are different from other GPU nodes. This includes using a special partition (h1gpu0) and specifying which VRAM configuration you'd like to use (gres).
When specifying --gres=gpu:h100_XXX, make sure to review the GPU+MIG Set Up section to know exact memory options as they can change at any time.
sinteract
sbatch
Account: #SBATCH --account=ABC
Partition: #SBATCH --partition=h1gpu
GRES: #SBATCH --gres=gpu:h100_11gb (Instead of --gpus=)
Time: Max 2 Days
#SBATCH --account=my-research-group
#SBATCH --partition=h1gpu
#SBATCH --gres=gpu:h100_94gb
#SBATCH --time=2-00:00:00
module load python-libs
conda activate my-env
python script.py
Known Issues
Full usage of the H100 is still in development and requires several changes that are being worked out. Some of these are specific to the use of MIG in the H100, and some of them are related to other components in the compute node that are newer than other servers in our cluster.
- The 11gb option is not currently available to other groups automatically. (94GB is to remain restricted)
- Metrics dashboard does not currently show usage metrics of GPU cards due to how NVIDIA reports stats about MIG-based cards.
- Slurm does not track GPU utilization or memory for statistics.
- The high-speed slingshot network connected to the storage servers is currently not available (impacts heavy read/write operations).