Skip to content

Qiime 2 Workflow

Note

These commands are for Qiime2-2024.5. See Qiime2 Docs or use qiime2 --help for most up-to-date flags.

The latest version of 2024.10 includes recent releases of SILVA/GTDB databases. It's available on the cluster as a test version and can be loaded through module load test-qiime2 instead of module load qiime2/2024.5.

1. Set up read directory

Make a specific directory in a project folder to contain reads and barcode files. Copy raw reads into this directory.

mkdir reads
cp Undetermined_S0_L001_R1_001.fastq.gz ./reads
cp Undetermined_S0_L001_R2_001.fastq.gz ./reads
cp Undetermined_S0_L001_I1_001.fastq.gz ./reads

2. Rename forward, reverse, and barcode fastq.gz files

Sometimes sequencing centers give us weirdly named files. This makes it easier to remember which file is which.

cd reads
mv Undetermined_S0_L001_I1_001.fastq.gz barcodes.fastq.gz
mv Undetermined_S0_L001_R1_001.fastq.gz forward.fastq.gz
mv Undetermined_S0_L001_R2_001.fastq.gz reverse.fastq.gz
cd ..

3. Import Sequence files into Qiime 2

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=300
#SBATCH --partition=week

module load qiime2/2024.5

qiime tools import \
--type EMPPairedEndSequences \
--input-path reads/ \
--output-path emp-paired-end-sequences.qza

4. Demultiplex the sequences

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=300
#SBATCH --partition=week

module load qiime2/2024.5

qiime demux emp-paired \
--m-barcodes-file metadata.txt \
--m-barcodes-column barcode-sequence \ --p-no-rev-comp-mapping-barcodes \
--p-no-golay-error-correction \
--i-seqs emp-paired-end-sequences.qza \ --o-per-sample-sequences demux.qza \ --o-error-correction-details demux-details.qza

Breakdown:

Flag Description
--m-barcodes-file a text file that contains barcodes for the samples we want in the feature table. It can also include other important metadata. Make sure the barcode column is named barcode-sequence (or matches whatever is indicted in next --m-barcodes-column)
--m-barcodes-column barcode-sequence indicates that in the mapping file, barcodes are in a column labeled barcode-sequence
--p-no-rev-comp-mapping-barcodes specifies barcodes don't need to be reverse complemented. This option will vary by sequencing center. If unsure, process reads with the flag and if sequence counts are very low, run again without and compare. This flag will be used for reads from CU Boulder sequencing center.
--p-no-golay-error-correction doesn't perform 12nt Golay error correction on barcode reads. This option will vary by sequencing center. This flag will be used for reads from CU Boulder sequencing center.
--i-seqs .qza file generated in step 3
--o-per-sample-sequences output file containing reads separated by sample
--o-error-correction-details provides stats about the error correction process on your samples

5. Visualize sequence reads and sequence quality

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1G
#SBATCH --partition=week

module load qiime2/2024.5

qiime demux summarize \
--i-data demux.qza \
--o-visualization demux.qzv

You can visualize demux.qzv in the QIIME2 view tool interface or

through qiime tools view demux.qzv on Desktop mode - Open OnDemand.

  • The overview tab allows you to see the number of forward and reverse reads for each sample.
  • Interactive Quality Plot allows you to see the quality scores of the forward and reverse reads based on sequence base pairs. This is useful for deciding where to trim in the following step. Frequently, quality scores will drop before the end of the sequencing run. You will want to choose trim lengths to remove low-quality reads but still allow enough overlap that forward and reverse reads can be merged.

If you are unsure of your sequencing length, this visualization allows you to see how long the forward and reverse reads are as well as look at the average quality of each base pair based on its position in the sequence. We generally see poorer read quality at the beginning and end of reads so we trim that portion away.

6. Run DADA2 to denoise and merge sequence reads

DADA2 is a pipeline that detects and removes Illumina sequencing errors when possible and merges forward and reverse reads. The output is called a feature table (table.qza) consisting of amplicon sequence variants (ASVs) and their count per sample. Additionally, the rep-seqs.qza file contains each ASV and its entire sequence (this file does not have any count or sample info, only sequences matched with ASV id)

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1G
#SBATCH --partition=week

module load qiime2/2024.5

mkdir sample_stats

qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux.qza \
--p-trim-left-f 0 \
--p-trim-left-r 0 \
--p-trunc-len-f 250 \
--p-trunc-len-r 250 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats sample_stats/denoising-stats.qza

Breakdown:

Flag Description
--p-trim-left-f the trim length at the beginning of the forward read
--p-trim-left-r the trim length at the beginning of the reverse read
--p-trunc-len-f the truncation length at the end of the forward read
--p-trunc-len-r the truncation length at the end of the reverse read
--o-table output table that has the number of each ASV per sample
--o-representative-sequences a file that contains every unique ASV

Optional: Run the following code to get a qzv table of the output. This will allow you to see how many reads were retained during the dada2 filtering step.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1G
#SBATCH --partition=week

module load qiime2/2024.5

qiime metadata tabulate \
--m-input-file sample_stats/denoising-stats.qza \
--o-visualization sample_stats/denoising-stats.qzv

You can visualize denoising-stats.qzv in the QIIME2 view tool interface or

through qiime tools view denoising-stats.qzv on Desktop mode - Open OnDemand.

7. Assign taxonomy to reads

Note

This requires a trained classifier. See below for how to train a classifier using a database and the Earth Microbiome Primers (EMP)(515f/806R).

Classifying with Silva 138
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1G
#SBATCH --partition=week

module load qiime2/2024.5

qiime feature-classifier classify-sklearn \
--i-classifier SILVA_138.1_SSURef_NR99_tax_515F806R_CLASSIFIER.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy_silva138_16S.qza

8. Export taxonomy to a table

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:10:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=125
#SBATCH --partition=week

module load qiime2/2024.5

qiime tools export \
--input-path taxonomy_silva138_16S.qza \
--output-path feature_taxonomy_out
There will now be a tsv table in the output directory of ASV/taxonomy

9. Export ASV table to a table

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=0-00:30:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=125
#SBATCH --partition=week

module load qiime2/2024.5

qiime tools export \
--input-path table.qza \
--output-path feature_table_out
There will now be a tsv table in the output directory of ASV/sample1/sample2/sample3... with read counts as the values

Training a classifier with EMP primers

1. Get Database Files

The default version/target is SILVA 138.1 SSUREF_NR99 on v2024.5, SILVA 138.2 SSUREF_NR99 for v2024.10.

Note

In case you're using a Slurm script, it is recommended to input 2G for memory allocation.

qiime rescript get-silva-data \
--o-silva-sequences SILVA_138.1_SSURef_NR99_RNAseq.qza \
--o-silva-taxonomy taxmap_slv_ssu_ref_nr_138.1.qza

--o-silva-sequences gets FeatureData[RNASequence], which needs to be transcribed into FeatureData[Sequence]

Note

In case you're using a Slurm script, it is recommended to input 1G for memory allocation.

Reverse transcribe RNA to DNA sequences
qiime rescript reverse-transcribe \
--i-rna-sequences SILVA_138.1_SSURef_NR99_RNAseq.qza \
--o-dna-sequences SILVA_138.1_SSURef_NR99_tax_silva.qza

The default version/target is GTDB 214.1 Bacteria / Archaea SpeciesReps for v2024.5, GTDB 220.0 Bacteria / Archaea SpeciesReps for v2024.10

Note

In case you're using a Slurm script, it is recommended to input 1G for memory allocation.

qiime rescript get-gtdb-data \
--o-gtdb-taxonomy gtdb_r214.1_bac_arc_taxonomy.qza \
--o-gtdb-sequences gtdb_r214.1_bac_arc_ssu_reps.qza

2. Extract Primer regions from the sequences

Note

In case you're using a Slurm script, it is recommended to input 1G for memory allocation.

qiime feature-classifier extract-reads \
--i-sequences SILVA_138.1_SSURef_NR99_tax_silva.qza \
--p-f-primer GTGYCAGCMGCCGCGGTAA \
--p-r-primer GGACTACNVGGGTWTCTAAT \
--o-reads SILVA_138.1_SSURef_NR99_tax_515F806R.qza

Note

In case you're using a Slurm script, it is recommended to input 1G for memory allocation.

qiime feature-classifier extract-reads \
--i-sequences gtdb_r214.1_bac_arc_ssu_reps.qza \
--p-f-primer GTGYCAGCMGCCGCGGTAA \
--p-r-primer GGACTACNVGGGTWTCTAAT \
--o-reads gtdb_r214.1_bac_arc_ssu_reps_515F806R.qza

3. Train the classifier

Note

In case you're using a Slurm script, it is recommended to input 10G for memory allocation.

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads SILVA_138.1_SSURef_NR99_tax_515F806R.qza \
--i-reference-taxonomy taxmap_slv_ssu_ref_nr_138.1.qza \
--o-classifier SILVA_138.1_SSURef_NR99_tax_515F806R_CLASSIFIER.qza

Note

In case you're using a Slurm script, it is recommended to input 30G for memory allocation.

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads gtdb_r214.1_bac_arc_ssu_reps_515F806R.qza \
--i-reference-taxonomy gtdb_r214.1_bac_arc_taxonomy.qza \
--o-classifier gtdb_r214.1_bac_arc_ssu_reps_515F806R_CLASSIFIER.qza