Skip to content

Python Libraries (Conda)

Anaconda Update - Please Read - September 2024

The HPC industry has been facing a lot of questions surrounding licensing changes for the the official Anaconda repositories, which is currently the default channel in our conda environment. Due to this, the HPC Team is reviewing our set up and trying to identify ways to stop the use of the anaconda channel and rely more on channels such as 'conda-forge'.

While we work out a transition plan, we highly recommend you run the following two commands before you install any new packages, which will default you to the community-driven 'conda-forge' channel instead.

conda config --add channels conda-forge
conda config --set channel_priority strict

You can also change your channel when installing packages by specifying -c conda-forge (or other necessary channels).

mamba install numpy -c conda-forge    # Mamba is a faster package installer for conda

Overview

Conda is a package manager most commonly known for its use in Python scripting, however can be used for a variety of different languages and programs. At the Blugold Center for High-Performance Computing we specifically use the "Miniconda" variant and primarily only use it for managing the variety of Python package requests that our researchers want to use in their programs, and their individual dependencies. It is recommended to use this module anytime you want to run Python code on our clusters.

Using virtualenv or pip?

We highly recommend you do not use virtualenv or pip to install any packages unless absolutely necessary. Packages installed using pip has been known to cause compatibility issues with packages installed within our conda system, so we recommend going through the conda system for everything if possible. Some packages are only available on PyPi through pip, so extra care should be taken.

Availability

Cluster Module/Version
BOSE python-libs/3.0
BGSC python-libs/3.0

Note: You can simply use module load python-libs to activate the most recently installed version of this software. We do not anticipate installing any other version of this module.

Environments

To handle the massive variety of Python libraries, dependencies, and use cases, we take advantage of the Conda environment system to group things together. Most libraries we install in the default base/root environment, but we can set up a custom environment for your group that only contains the packags you need.

Using Environments

Whenever a user wants to use one of the environments (in terminal or their sbatch file), they simply use the following commands:

module load python-libs
conda activate <environment-name>
python script-here.py

Current Environments

Below is a select list of environments that the admin team has provided for everyone to use, each with their own set of libraries installed. Use conda env list to view the full list, along with custom environments set up for specific research groups.

Environment Conda Command Purpose
base Default Default enviroment suitable for most projects
rapids-23.02 conda activate rapids-23.02 RAPIDS (rapids.ai) environment for GPU accelerated versions of various data science libraries
tensorflow-2.11-gpu conda activate tensorflow-2.11-gpu Tensorflow 2.11 + GPU/CUDA support - Recommended for programs that need GPU support
tensorflow-2.9-gpu conda activate tensorflow-2.9-gpu Tensorflow 2.9 + GPU/CUDA support
sage conda activate sage Environment for SageMath, a mathematics software system

Note: Any environment that requires GPU acceleration or CUDA will need to be ran on a node that contains a GPU, which is not available by default. View this page for more instructions.

Personal Environments

User-Installed vs Admin-Installed

If a user sets up their own environment, it'll only be available to just their account. Packages can be installed at any time the user needs them.

If an HPC admin installs the environment, it can be made either available globally to all users on the cluster, or restricted to just your research group. We'll also take care of any dependencies and making sure the right versions are installed. Packages will need to be requested and installed by the admin team.

module load python-libs
conda create -n <environment-name>
conda activate <environment-name>    # Always activate your environment before installing a package
mamba install <package> -c conda-forge    # Check anaconda.org for what channels "-c" a package is available on

Example:

module load python-libs
conda create -n test-env-numpy
conda activate test-env-numpy
mamba install numpy -c conda-forge  # mamba is similar to conda and can be significantly faster when installing new packages.

Now, when you activate the "test-env-numpy" environment in your sbatch script, you'll have access to the numpy package you installed.

Using Jupyter?

Depending on the packages you install, your environment may not have been set up to be accessible in Jupyter as a kernel. You may need to also install the ipykernel package as well.

Sharing Your Environment

You can share your current environment with someone else by creating an environment.yml file listing all of your packages and their versions. This helps with reproducibility to ensure everyone has the same set up.

module load python-libs
conda activate name-of-environment
conda env export > environment.yml

Environment.yml / YAML file

If you were provided with a environment.yml file to build a new Conda environment, you can create the environment and install all of the listed libraries using the conda env create command.

module load python-libs
conda env create -f environment.yml  # You can also override the default name with "-n name-of-environment"
conda activate name-of-environment      # This is usually specified at top of the environment.yml file.

Available Python Packages

Once in a conda environment, you can use the conda list command to view all of the currently installed Python packages in that environment.

Missing Packages

Have you received the ModuleNotFoundError: No module named 'abc123' message in your script output? That means the library you are trying to import isn't available in the current Conda environment you are using.

You can check out one of the other environments listed in conda env list or contact the HPC Admin Team and we'll install the package for you.

Sample Slurm Script

submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
conda activate tensorflow-2.11-gpu  # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py        # Run Python script

Due to its limited availability, all work submitted to the cluster will not run on a GPU-supported node unless requested. The following example includes additional changes you have to make to your Slurm script. Click here for more information.

submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --partition=GPU          # Run this script on a GPU node
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request
#SBATCH --gpus=1                 # Request one GPU card (max of three)

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
conda activate tensorflow-2.11-gpu  # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py        # Run Python script
submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
python my-script.py        # Run Python script

Real Example

Has your research group used Python in a project and/or used the Conda system for package management? Contact the HPC Team and we'd be glad to feature your work.

Citation

Please include the following citation in your papers to support continued development of Conda and other associated tools by its parent company Anaconda, Inc.

Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com.

Resources