Transitioning from BGSC to BOSE
In the summer of 2021, the brand new NSF and HPE-Funded BOSE cluster came online that is three times the size of BGSC. For the most part, running a job on BOSE is very similar to BGSC, however there are a few differences so this guide will help you with the transitioning. If you do run into any additional questions, please contact us.
Hardware
Below is an overview of the main differences in hardware between the two clusters for nodes that support running jobs via Slurm.
Cluster | # Compute Nodes (Non-GPU) | # GPU Nodes | CPUs | Memory | Interconnect |
---|---|---|---|---|---|
BGSC | 17 Nodes | 3 nodes w/ 4 cards each | Intel Xeon 12 cores x 4 20 cores x 13 32 cores x 3 |
24 GB x 4 60 GB x 3 80 GB x 13 |
Infiniband (40 Gbps) |
BOSE | 64 Nodes | 4 nodes w/ 3 cards each | AMD EPYC 64 cores x 60 |
250 GB x 59 1TB x 2 |
Slingshot (100 Gbps) |
Connecting to BOSE
Unlike BGSC, you must be given special access before you can connect to BOSE. You can request access by filling out this form and choosing "Request Access to BOSE".
Once you have access, for security, BOSE requires using port 50022 (instead of the default port 22) as well as the UWEC VPN to connect. Previously, the VPN was only required for off campus users, but it is now a requirement for all users.
Don't have the VPN yet? Download it here!
Old Connection Info (BGSC/BGSC2) | New Connection Info (BOSE) | |
---|---|---|
Host | bgsc.hpc.uwec.edu or bgsc2.uwec.edu or bgsc2.cs.uwec.edu | bose.hpc.uwec.edu |
Port | 22 (Default - not usually specified) | 50022 |
SSH | ssh user@bgsc.hpc.uwec.edu | ssh user@bose.hpc.uwec.edu -p 50022 |
VPN | Required off-campus | Required off-campus |
Okta | Not required | Required when not on VPN |
Special Access? | No - Open to all UWEC accounts | Yes - Access must be requested |
Using a hardware token or code with Okta?
By default, when you connect to our BOSE cluster over SSH, a push notification will be sent to the Okta Verify app on your phone for you to approve.
If you are unable to use your phone and are using a hardware token or six-digit code, you'll have to put the number you receive on the device directly after your password separated with a comma before you can log in.
Example:
Username: myuser
Password: mypass,123456
Transferring Files (using SCP)
To transfer files between BGSC and BOSE, you must initiate the transfer from BGSC. This is due to the fact that BGSC is physically on Campus and BOSE is not able to directly connect to it.
BGSC → BOSE or Computer → BOSE
BOSE → BGSC or BOSE → Computer
Software
Please view the list of available software to determine if your needed program (and version) is installed on BOSE. If the software or version you need is not available, please fill out this form and select "Software Request".
Note: WebMO is only on BGSC at this current time due to licensing restrictions.
Running Jobs
For the most part, running jobs on BOSE is very similar to running jobs on BGSC. The primary difference between the two is knowing what partitions are available for use.
Partition Changes:
Old Name | New Name | Time Limit | Default | Notes |
---|---|---|---|---|
week | week | 7 Days | Yes | Primary partition that's on the majority of the nodes. This partition is recommended for the majority of jobs unless they cannot finish in a week without any checkpoints or restart files. |
batch | month | 30 Days | No | (Rename)Limited availability partition that's on a few nodes. Meant for jobs that require running for more than a week without stopping. |
GPU | GPU | 7 Days | No | For any job that requries using a GPU. See "GPU Use" section below. |
scavenge | debug | 4 Hours | No | (Rename) Low priority partition available on all nodes that is meant for testing and debugging software. |
extended | - | 104 Days | - | (Removed) This partition is not available on BOSE (please email bgsc.admins@uwec.edu with concerns about long-running jobs) |
- | highmemory | 7 Days | No | (New Partition) For jobs that require using over 250 GB of memory. Only available on a single node that contains 1 TB of RAM. |
- | magma | 7 Days | No | (New Partition) Limited availability partition that is used for jobs that need to run the software "Magma". |
GPU Use
Due to GPUs being a limited resource, please only use a single GPU card unless more is absolutely needed for your work.
New to BOSE is the ability to request a specific number of GPUs for your job, up to a max of 3. Unlike BGSC, any GPU card you use is exclusive to you and will not be shared with other jobs. Another change is that one of the GPU nodes (gpu04) has 1TB of memory for GPU-based jobs that require more than the standard 250GB.
By default, if you do not specify a number of cards, your job will only use a single one.
To request a certain number of cards, add the following line to your sbatch.sh script:
Replace # with the number of GPUs you'd like to request.
To know what GPUs are available for your job, you can use $CUDA_VISIBLE_DEVICES, which will list the cards that your software can use.
Scratch Space
On both BGSC and BOSE we have local and network scratch space available for jobs. With the upgrade to BOSE, we have adjusted the pathing to be more consistent with their purpose and make it easier to switch between the two. In addition, there also is a difference in the amount of space available, which is shared amongst all active jobs on the node. In general, we recommend using network scratch to prevent issues with available space.
Type | BGSC Path | BOSE Path | BGSC Space | BOSE Space |
---|---|---|---|---|
Local | /node/local/scratch | /local/scratch | 900GB | 800GB |
Network | /data/scratch | /network/scratch | 11TB | 64TB |
Automatic Scratch Space
All jobs on BOSE automatically create a dedicated directory for you in the scratch space that is accessible under the environment variable $USE_SCRATCH_DIR
. Pointing to the local scratch space on the node by default, you can change this by prefixing your 'sbatch' or 'sinteract' commands with USE_NETWORK_SCRATCH=1
.
Examples:
Local Scratch:
$USE_SCRATCH_DIR
points to "/local/scratch/job_12345"Network Scratch:
$USE_SCRATCH_DIR
points to "/network/scratch/job_12345"
Open OnDemand
New to the Blugold Center for High-Performance Computing and implemented on BOSE is our Open OnDemand system. This is a web-based platform for anyone to work on the cluster without having to fully go through terminal. This lowers the bar of entry as well as makes it a lot easier to access your files. All you need is access to the VPN (if off campus) and a web browser.
It features:
- File Access (Upload/Download/Editing)
- Shell Access (In the browser)
- Jupyter Notebook (For all your interactive Python needs)
- And more
Website URL: https://ondemand.hpc.uwec.edu