Python Libraries
Anaconda Update - Please Read - September 2024
The HPC industry has been facing a lot of questions surrounding licensing changes for the the official Anaconda repositories, which is currently the default channel in our conda environment. Due to this, the HPC Team is reviewing our set up and trying to identify ways to stop the use of the anaconda channel and rely more on channels such as 'conda-forge'.
While we work out a transition plan, we highly recommend you run the following two commands before you install any new packages, which will default you to the community-driven 'conda-forge' channel instead.
You can also change your channel when installing packages by specifying -c conda-forge (or other necessary channels).
Overview
Software like Python comes with Package Managers, or software tools which can install, delete, update, and otherwise manage additional software which can be installed to supplement another.
There are two main python package managers, which can provides external libraries for use by python programs. Both are available on the cluster, and both work in fundamentally different ways. It is common to use both in tandem and leverage the strengths from both.
Conda is a package manager most commonly known for its use in Python scripting, however can be used for a variety of different languages and programs. At the Blugold Center for High-Performance Computing we specifically use the "Miniconda" variant and primarily only use it for managing the variety of Python package requests that our researchers want to use in their programs, and their individual dependencies. It is recommended to use this module anytime you want to run Python code on our clusters.
Using virtualenv or pip?
We highly recommend you do not use virtualenv or pip to install any packages unless absolutely necessary. Packages installed using pip has been known to cause compatibility issues with packages installed within our conda system, so we recommend going through the conda system for everything if possible. Some packages are only available on PyPi through pip, so extra care should be taken.
Pip is popular for many tasks, and there are close to 600,000 different packages available at it's website, pypi. Pip is the standard way to install packages, but doesn't work well when you have many complex packages which each need different versions of different packages. For that reason, it is not recommended to use pip alone for installing packages.
Both pip and conda are available in the python-libs module.
Availability
| Cluster | Module/Version |
|---|---|
| BOSE | python-libs/3.0 |
| BGSC | python-libs/3.0 |
Note: You can simply use module load python-libs to activate the most recently installed version of this software. We do not anticipate installing any other version of this module.
Conda
Environments
To handle the massive variety of Python libraries, dependencies, and use cases, we take advantage of the Conda environment system to group things together. Most libraries we install in the default base/root environment, but we can set up a custom environment for your group that only contains the packags you need.
Using Environments
Whenever a user wants to use one of the environments (in terminal or their sbatch file), they simply use the following commands:
Current Environments
Below is a select list of environments that the admin team has provided for everyone to use, each with their own set of libraries installed. Use conda env list to view the full list, along with custom environments set up for specific research groups.
| Environment | Conda Command | Purpose |
|---|---|---|
| base | Default | Default enviroment suitable for most projects |
| rapids-23.02 | conda activate rapids-23.02 |
RAPIDS (rapids.ai) environment for GPU accelerated versions of various data science libraries |
| tensorflow-2.11-gpu | conda activate tensorflow-2.11-gpu |
Tensorflow 2.11 + GPU/CUDA support - Recommended for programs that need GPU support |
| tensorflow-2.9-gpu | conda activate tensorflow-2.9-gpu |
Tensorflow 2.9 + GPU/CUDA support |
| sage | conda activate sage |
Environment for SageMath, a mathematics software system |
Note: Any environment that requires GPU acceleration or CUDA will need to be ran on a node that contains a GPU, which is not available by default. View this page for more instructions.
Personal Environments
User-Installed vs Admin-Installed
If a user sets up their own environment, it'll only be available to just their account. Packages can be installed at any time the user needs them.
If an HPC admin installs the environment, it can be made either available globally to all users on the cluster, or restricted to just your research group. We'll also take care of any dependencies and making sure the right versions are installed. Packages will need to be requested and installed by the admin team.
module load python-libs
conda create -n <environment-name>
conda activate <environment-name> # Always activate your environment before installing a package
mamba install <package> -c conda-forge # Check anaconda.org for what channels "-c" a package is available on
Example:
module load python-libs
conda create -n test-env-numpy
conda activate test-env-numpy
mamba install numpy -c conda-forge # mamba is similar to conda and can be significantly faster when installing new packages.
Now, when you activate the "test-env-numpy" environment in your sbatch script, you'll have access to the numpy package you installed.
Using Jupyter?
Depending on the packages you install, your environment may not have been set up to be accessible in Jupyter as a kernel. You may need to also install the ipykernel package as well.
Note: Environments you create may show up with the prefix ".conda-" in the list of available kernels. For example, the Conda environment my-test-env may be listed as Python [conda env:.conda-my-test-env] in Jupyter.
Sharing Your Environment
You can share your current environment with someone else by creating an environment.yml file listing all of your packages and their versions. This helps with reproducibility to ensure everyone has the same set up.
Environment.yml / YAML file
If you were provided with a environment.yml file to build a new Conda environment, you can create the environment and install all of the listed libraries using the conda env create command.
module load python-libs
conda env create -f environment.yml # You can also override the default name with "-n name-of-environment"
conda activate name-of-environment # This is usually specified at top of the environment.yml file.
Available Python Packages
Once in a conda environment, you can use the conda list command to view all of the currently installed Python packages in that environment.
Missing Packages
Have you received the ModuleNotFoundError: No module named 'your_module' message in your script output? That means the library you are trying to import isn't available in the current Conda environment you are using.
You can check out one of the other environments listed in conda env list or contact the HPC Admin Team and we'll install the package for you.
Known issue: sshing into other machines
We are aware of an issue with conda where connecting to another node via ssh from a terminal with python-libs activated could result in an error when trying to use conda. To fix this issue, run module purge while in the new machine to unload python-libs and run module load python-libs to re-load it.
Pip
While using pip is not recommended to manage your python packages on the clusters, it may be useful when reproducing an experiment or if the package you want to install isn't available with conda
Virtual Environments
Like with conda, pip is capable of making multiple environments where different packages can be installed, isolated from other environments and the packages within. This is handled by Python's venv, or Virtual ENVironments. The syntax is similar to conda:
module load python-libs
python3 -m venv <environment-name> # This creates an environment named "env"
source <environment-name>/bin/activate # Activates the environment
pip install <package-name>
Unlike with conda, we do not manage any python virtual environments for sharing.
dry-run
New versions of Pip include a feature called dry-run which tells Pip to do all of the steps for installing a package except for the installation step. It lets you know exactly what will be updated and installed before you actually install the package.
For example, pip install --dry-run package_name
Known issue: Pip version
The base version of Pip installed on the cluster and/or in newly created virtual environments can be out of date. If you try to install a package and it fails, try upgrading your Pip install with pip install --upgrade pip and try installing the package again.
Sharing Environments
Virtual environments can be exported with a .txt fie, usually called requirements.txt.
To export an environment, you can use pip freeze
To installed an exported environment, you can use pip install -r with the exported file (requirements.txt)
Sample Slurm Script
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
conda activate tensorflow-2.11-gpu # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py # Run Python script
Due to its limited availability, all work submitted to the cluster will not run on a GPU-supported node unless requested. The following example includes additional changes you have to make to your Slurm script. Click here for more information.
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --partition=GPU # Run this script on a GPU node
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
#SBATCH --gpus=1 # Request one GPU card (max of three)
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
conda activate tensorflow-2.11-gpu # Load the TensorFlow 2.11 - GPU Conda environment
python my-script.py # Run Python script
#!/bin/bash
# -- SLURM SETTINGS -- #
# [..] other settings here [..]
# The following settings are for the overall request to Slurm
#SBATCH --ntasks-per-node=32 # How many CPU cores do you want to request
#SBATCH --nodes=1 # How many nodes do you want to request
# -- SCRIPT COMMANDS -- #
# Load the needed modules
module load python-libs # Load Conda system
python my-script.py # Run Python script
Real Example
Has your research group used Python in a project and/or used the Conda system for package management? Contact the HPC Team and we'd be glad to feature your work.
Citation
Please include the following citation in your papers to support continued development of Conda and other associated tools by its parent company Anaconda, Inc.
Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. https://anaconda.com.