Python Libraries (Conda)
Anaconda Update - Please Read - September 2024
The HPC industry has been facing a lot of questions surrounding licensing changes for the the official Anaconda repositories, which is currently the default channel in our conda environment. Due to this, the HPC Team is reviewing our set up and trying to identify ways to stop the use of the anaconda channel and rely more on channels such as 'conda-forge'.
While we work out a transition plan, we highly recommend you run the following two commands before you install any new packages, which will default you to the community-driven 'conda-forge' channel instead.
You can also change your channel when installing packages by specifying -c conda-forge
(or other necessary channels).
Overview
Conda is a package manager most commonly known for its use in Python scripting, however can be used for a variety of different languages and programs. At the Blugold Center for High-Performance Computing we specifically use the "Miniconda" variant and primarily only use it for managing the variety of Python package requests that our researchers want to use in their programs, and their individual dependencies. It is recommended to use this module anytime you want to run Python code on our clusters.
Using virtualenv or pip?
We highly recommend you do not use virtualenv or pip to install any packages unless absolutely necessary. Packages installed using pip has been known to cause compatibility issues with packages installed within our conda system, so we recommend going through the conda system for everything if possible. Some packages are only available on PyPi through pip, so extra care should be taken.
Availability
Cluster | Module/Version |
---|---|
BOSE | python-libs/3.0 |
BGSC | python-libs/3.0 |
Note: You can simply use module load python-libs
to activate the most recently installed version of this software. We do not anticipate installing any other version of this module.
Environments
To handle the massive variety of Python libraries, dependencies, and use cases, we take advantage of the Conda environment system to group things together. Most libraries we install in the default base/root environment, but we can set up a custom environment for your group that only contains the packags you need.
Using Environments
Whenever a user wants to use one of the environments (in terminal or their sbatch file), they simply use the following commands:
Current Environments
Below is a select list of environments that the admin team has provided for everyone to use, each with their own set of libraries installed. Use conda env list
to view the full list, along with custom environments set up for specific research groups.
Environment | Conda Command | Purpose |
---|---|---|
base | Default | Default enviroment suitable for most projects |
rapids-23.02 | conda activate rapids-23.02 |
RAPIDS (rapids.ai) environment for GPU accelerated versions of various data science libraries |
tensorflow-2.11-gpu | conda activate tensorflow-2.11-gpu |
Tensorflow 2.11 + GPU/CUDA support - Recommended for programs that need GPU support |
tensorflow-2.9-gpu | conda activate tensorflow-2.9-gpu |
Tensorflow 2.9 + GPU/CUDA support |
sage | conda activate sage |
Environment for SageMath, a mathematics software system |
Note: Any environment that requires GPU acceleration or CUDA will need to be ran on a node that contains a GPU, which is not available by default. View this page for more instructions.
Personal Environments
User-Installed vs Admin-Installed
If a user sets up their own environment, it'll only be available to just their account. Packages can be installed at any time the user needs them.
If an HPC admin installs the environment, it can be made either available globally to all users on the cluster, or restricted to just your research group. We'll also take care of any dependencies and making sure the right versions are installed. Packages will need to be requested and installed by the admin team.
module load python-libs
conda create -n <environment-name>
conda activate <environment-name> # Always activate your environment before installing a package
mamba install <package> -c conda-forge # Check anaconda.org for what channels "-c" a package is available on
Example:
module load python-libs
conda create -n test-env-numpy
conda activate test-env-numpy
mamba install numpy -c conda-forge
Now, when you activate the "test-env-numpy" environment in your sbatch script, you'll have access to the numpy package you installed.
Using Jupyter?
Depending on the packages you install, your environment may not have been set up to be accessible in Jupyter as a kernel. You may need to also install the ipykernel
package as well.
Have an environment.yml / YAML file?
If you were provided with a environment.yml
file to build a new Conda environment, you can create the environment and install all of the listed libraries using the conda env create
command.
module load python-libs
conda env create -n environment.yml
conda activate name-of-environment # This is usually specified at top of the environment.yml file.
Available Python Packages
Once in a conda environment, you can use the conda list
command to view all of the currently installed Python packages in that environment.
Missing Packages
Have you received the ModuleNotFoundError: No module named 'abc123'
message in your script output? That means the library you are trying to import isn't available in the current Conda environment you are using.
You can check out one of the other environments listed in conda env list
or contact the HPC Admin Team and we'll install the package for you.
Sample Slurm Script
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
conda activate tensorflow-2.11-gpu # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py # Run Python script
Due to its limited availability, all work submitted to the cluster will not run on a GPU-supported node unless requested. The following example includes additional changes you have to make to your Slurm script. Click here for more information.
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --partition=GPU # Run this script on a GPU node
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
#SBATCH --gpus=1 # Request one GPU card (max of three)
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
conda activate tensorflow-2.11-gpu # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py # Run Python script
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
python my-script.py # Run Python script
Real Example
Has your research group used Python in a project and/or used the Conda system for package management? Contact the HPC Team and we'd be glad to feature your work.
Citation
Please include the following citation in your papers to support continued development of Conda and other associated tools by its parent company Anaconda, Inc.
Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com.