H100 Usage
Work-In-Progress
This guide, and the overall set up of the server, is temporary while the usage and configuration of the H100 node is being fleshed out. Any major changes in usage would be announced in advance.
Current Set Up
Our new H100 node, known as h1gpu01, has two H100 cards in it split in multiple ways to allow for a mix of performance versus capacity. This is done using the Multi-Instance GPU (MIG) feature in certain NVIDIA cards.
Node Specs
| CPU | AMD EPYC 9354 32-Core (x2) |
| CPU Cores (Total) | 64 |
| CPU Clock (Base/Boost) | 3.25 GHz / 3.75 GHz |
| Memory | 384 GB |
| Local Scratch | 800 GB |
GPU+MIG Set Up
| Quantity | VRAM | Slurm Setting |
|---|---|---|
| 7 | 11GB | --gres=gpu:h100_11gb |
| 1 | 94GB | --gres=gpu:h100_94gb |
Last Updated: 9/08/2025
Capacity vs Performance - Changes Often
This server is always changing based on performance vs capacity needs. Depending on what is needed by faculty for classes and projects, the H100 may be adjusted at any time to match those needs.
Before you use the H100, we recommend viewing the table above to see the current configurationa and adjust your scripts accordingly.
Limited Access
The following accounts are currently approved to use the H100 server. Use myaccounts to see which accounts your user account is under. If you need access to this server for your research, please contact us to discuss options.
- gomesr_reu
- gomesr_pdac_scans
- 2261.cs.426.001 (CS 426 - Fall 2025)
Slurm Settings
Using the H100 GPU requires a few changes that differ from other GPU nodes.
Partition: #SBATCH --partition=h1gpu
Account: #SBATCH --account=ABC (Only approved accounts are supported)
GRES: #SBATCH --gres=gpu:h100_11gb (Instead of --gpus=)
Time: Max 2 Days
#SBATCH --account=my-research-group
#SBATCH --partition=h1gpu
#SBATCH --gres=gpu:h100_94gb
#SBATCH --time=2-00:00:00
module load python-libs
conda activate my-env
python script.py
Known Issues
Full usage of the H100 is still in development and requires several changes that are being worked out. Some of these are specific to the use of MIG in the H100, and some of them are related to other components in the compute node that are newer than other servers in our cluster.
- The 11gb option is not currently available to other groups automatically. (94GB is to remain restricted)
- Jupyter support is currently not available, but is being worked on.
- Metrics dashboard does not currently show usage metrics of GPU cards due to how NVIDIA reports stats about MIG-based cards.
- Slurm does not track GPU utilization or memory for statistics.
- The high-speed slingshot network connected to the storage servers is currently not available (impacts heavy read/write operations).