Skip to content

Python Libraries

Anaconda Update - Please Read - September 2024

The HPC industry has been facing a lot of questions surrounding licensing changes for the the official Anaconda repositories, which is currently the default channel in our conda environment. Due to this, the HPC Team is reviewing our set up and trying to identify ways to stop the use of the anaconda channel and rely more on channels such as 'conda-forge'.

While we work out a transition plan, we highly recommend you run the following two commands before you install any new packages, which will default you to the community-driven 'conda-forge' channel instead.

conda config --add channels conda-forge
conda config --set channel_priority strict

You can also change your channel when installing packages by specifying -c conda-forge (or other necessary channels).

mamba install numpy -c conda-forge    # Mamba is a faster package installer for conda

Overview

Software like Python comes with Package Managers, or software tools which can install, delete, update, and otherwise manage additional software which can be installed to supplement another.

There are two main python package managers, which can provides external libraries for use by python programs. Both are available on the cluster, and both work in fundamentally different ways. It is common to use both in tandem and leverage the strengths from both.

Conda is a package manager most commonly known for its use in Python scripting, however can be used for a variety of different languages and programs. At the Blugold Center for High-Performance Computing we specifically use the "Miniconda" variant and primarily only use it for managing the variety of Python package requests that our researchers want to use in their programs, and their individual dependencies. It is recommended to use this module anytime you want to run Python code on our clusters.

Using virtualenv or pip?

We highly recommend you do not use virtualenv or pip to install any packages unless absolutely necessary. Packages installed using pip has been known to cause compatibility issues with packages installed within our conda system, so we recommend going through the conda system for everything if possible. Some packages are only available on PyPi through pip, so extra care should be taken.

Pip is popular for many tasks, and there are close to 600,000 different packages available at it's website, pypi. Pip is the standard way to install packages, but doesn't work well when you have many complex packages which each need different versions of different packages. For that reason, it is not recommended to use pip alone for installing packages.

Both pip and conda are available in the python-libs module.

Availability

Cluster Module/Version
BOSE python-libs/3.0
BGSC python-libs/3.0

Note: You can simply use module load python-libs to activate the most recently installed version of this software. We do not anticipate installing any other version of this module.

Conda

Environments

To handle the massive variety of Python libraries, dependencies, and use cases, we take advantage of the Conda environment system to group things together. Most libraries we install in the default base/root environment, but we can set up a custom environment for your group that only contains the packags you need.

Using Environments

Whenever a user wants to use one of the environments (in terminal or their sbatch file), they simply use the following commands:

module load python-libs
conda activate <environment-name>
python script-here.py

Current Environments

Below is a select list of environments that the admin team has provided for everyone to use, each with their own set of libraries installed. Use conda env list to view the full list, along with custom environments set up for specific research groups.

Environment Conda Command Purpose
base Default Default enviroment suitable for most projects
rapids-23.02 conda activate rapids-23.02 RAPIDS (rapids.ai) environment for GPU accelerated versions of various data science libraries
tensorflow-2.11-gpu conda activate tensorflow-2.11-gpu Tensorflow 2.11 + GPU/CUDA support - Recommended for programs that need GPU support
tensorflow-2.9-gpu conda activate tensorflow-2.9-gpu Tensorflow 2.9 + GPU/CUDA support
sage conda activate sage Environment for SageMath, a mathematics software system

Note: Any environment that requires GPU acceleration or CUDA will need to be ran on a node that contains a GPU, which is not available by default. View this page for more instructions.

Personal Environments

User-Installed vs Admin-Installed

If a user sets up their own environment, it'll only be available to just their account. Packages can be installed at any time the user needs them.

If an HPC admin installs the environment, it can be made either available globally to all users on the cluster, or restricted to just your research group. We'll also take care of any dependencies and making sure the right versions are installed. Packages will need to be requested and installed by the admin team.

module load python-libs
conda create -n <environment-name>
conda activate <environment-name>    # Always activate your environment before installing a package
mamba install <package> -c conda-forge    # Check anaconda.org for what channels "-c" a package is available on

Example:

module load python-libs
conda create -n test-env-numpy
conda activate test-env-numpy
mamba install numpy -c conda-forge  # mamba is similar to conda and can be significantly faster when installing new packages.

Now, when you activate the "test-env-numpy" environment in your sbatch script, you'll have access to the numpy package you installed.

Using Jupyter?

Depending on the packages you install, your environment may not have been set up to be accessible in Jupyter as a kernel. You may need to also install the ipykernel package as well.

Note: Environments you create may show up with the prefix ".conda-" in the list of available kernels. For example, the Conda environment my-test-env may be listed as Python [conda env:.conda-my-test-env] in Jupyter.

Sharing Your Environment

You can share your current environment with someone else by creating an environment.yml file listing all of your packages and their versions. This helps with reproducibility to ensure everyone has the same set up.

module load python-libs
conda activate name-of-environment
conda env export > environment.yml

Environment.yml / YAML file

If you were provided with a environment.yml file to build a new Conda environment, you can create the environment and install all of the listed libraries using the conda env create command.

module load python-libs
conda env create -f environment.yml  # You can also override the default name with "-n name-of-environment"
conda activate name-of-environment      # This is usually specified at top of the environment.yml file.

Available Python Packages

Once in a conda environment, you can use the conda list command to view all of the currently installed Python packages in that environment.

Missing Packages

Have you received the ModuleNotFoundError: No module named 'your_module' message in your script output? That means the library you are trying to import isn't available in the current Conda environment you are using.

You can check out one of the other environments listed in conda env list or contact the HPC Admin Team and we'll install the package for you.

Known issue: sshing into other machines

We are aware of an issue with conda where connecting to another node via ssh from a terminal with python-libs activated could result in an error when trying to use conda. To fix this issue, run module purge while in the new machine to unload python-libs and run module load python-libs to re-load it.

Pip

While using pip is not recommended to manage your python packages on the clusters, it may be useful when reproducing an experiment or if the package you want to install isn't available with conda

Virtual Environments

Like with conda, pip is capable of making multiple environments where different packages can be installed, isolated from other environments and the packages within. This is handled by Python's venv, or Virtual ENVironments. The syntax is similar to conda:

module load python-libs
python3 -m venv <environment-name>  # This creates an environment named "env"
source <environment-name>/bin/activate  # Activates the environment
pip install <package-name>

Unlike with conda, we do not manage any python virtual environments for sharing.

dry-run

New versions of Pip include a feature called dry-run which tells Pip to do all of the steps for installing a package except for the installation step. It lets you know exactly what will be updated and installed before you actually install the package.

For example, pip install --dry-run package_name

Known issue: Pip version

The base version of Pip installed on the cluster and/or in newly created virtual environments can be out of date. If you try to install a package and it fails, try upgrading your Pip install with pip install --upgrade pip and try installing the package again.

Sharing Environments

Virtual environments can be exported with a .txt fie, usually called requirements.txt.

To export an environment, you can use pip freeze

pip freeze > requirements.txt

To installed an exported environment, you can use pip install -r with the exported file (requirements.txt)

pip install -r requirements.txt

Sample Slurm Script

submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
conda activate tensorflow-2.11-gpu  # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py        # Run Python script

Due to its limited availability, all work submitted to the cluster will not run on a GPU-supported node unless requested. The following example includes additional changes you have to make to your Slurm script. Click here for more information.

submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --partition=GPU          # Run this script on a GPU node
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request
#SBATCH --gpus=1                 # Request one GPU card (max of three)

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
conda activate tensorflow-2.11-gpu  # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py        # Run Python script
submit.sh
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]

# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32     # How many CPU cores do you want to request
#SBATCH --nodes=1                # How many nodes do you want to request

# -- SCRIPT COMMANDS -- #

# Load the needed modules
module load python-libs    # Load Conda system
python my-script.py        # Run Python script

Real Example

Has your research group used Python in a project and/or used the Conda system for package management? Contact the HPC Team and we'd be glad to feature your work.

Citation

Please include the following citation in your papers to support continued development of Conda and other associated tools by its parent company Anaconda, Inc.

Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com.

Resources