View on GitHub

Getting Started with Qiime2

introductory tutorial for qiime2

Example Workflow

This will follow the steps from importing the files, setting up metadata, and running initial analyses.

The example dataset we will be using has been taken from the Qiime2 Fecal microbiota transplant tutorial (FMT). We will be following some of the steps there, but here we will be combining from a few different tutorials to present a complete example.

Importing data

We will start with sequence files that have already been demultiplexed and merged. The Qiime2 Moving Pictures Tutorial starts with the initial file. In many cases, such as dual-index barcodes, it is not possible to use Qiime for demultiplexing. In this example we will use sequence files that have already been demultiplexed.

To import a group of files, we need to create a manifest file. This is a simple tab-delimited file with the sample ID and path of each file:

sample-id     absolute-filepath
sample-1      $PWD/some/filepath/sample1_R1.fastq
sample-2      $PWD/some/filepath/sample2_R1.fastq

See the Qiime2 Importing Data Tutorial for more details, including importing paired end sequences.

Once you have a manifest file, you can import the sequence files into a Qiime2 Artifact. We will start with the FMT run 2 sequences, as it is smaller and will go a little quicker. There is already a manifest file in your main folder called run_2_manifest.txt. Change the –input-path parameter in the following command, as well as the name of the output file (–output-path)

qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path {MANIFEST_FILE} \
  --output-path {QIIME_DEMUX}.qza \
  --input-format SingleEndFastqManifestPhred33V2

Evaluate data quality

Before we go on, we need to examine the sequence quality to determine how we should trim the sequences. If you have run FastQC on your raw sequence files, you can examine the per base sequence quality category. Qiime also provides a plugin for graphing sequence quality. Add the input and output names to the following commands and run:

qiime demux summarize \
  --i-data {QIIME_DEMUX.qza} \
  --o-visualization {DEMUX_SEQ_SUMMARY_VIZ.qzv}

You can download the visualisation file to your computer and open it using Qiime2View, or run qiime tools view {DEMUX_SEQ_SUMMARY_VIZ}.qzv on the file if you have Qiime2 installed. For the sake of time, click here to view the visualisation:

VISUALISATION: Run 2 demux summary

Denoise sample sequences

Now we are ready to denoise the samples. In Qiime2, you can use either Dada2 or Deblur, with most options. (Later we will show you how to import samples that have been denoised using other programs–or with additional parameters not found in the plugins)

From evaluating the sequence quality, we can observe that the quality is lower in the first ten or so bases, and stays relatively high right to the end. Therefore, we will trim the first 13 bases using the –p-trim-left argument, and truncate the sequences at 150 bp using the –p-trun-len argument. Change the names of input and output files as before.

qiime dada2 denoise-single \
  --p-trim-left 13 \
  --p-trunc-len 150 \
  --i-demultiplexed-seqs {QIIME_DEMUX}.qza \
  --o-representative-sequences {REP-SEQS}.qza \
  --o-table {TABLE}.qza \
  --o-denoising-stats {STATS}.qza

You can then create visualisations from the output files to examine the results of denoising:

qiime metadata tabulate \
  --m-input-file {STATS}.qza \
  --o-visualization {STATS_VIZ}.qzv

qiime feature-table summarize \
  --i-table {TABLE}.qza \
  --o-visualization {TABLE_VIZ}.qzv \
  --m-sample-metadata-file sample-metadata.tsv

qiime feature-table tabulate-seqs \
  --i-data {REP-SEQS}.qza \
  --o-visualization {REP-SEQS_VIZ}.qzv

Again, download the visualisations and compare them to here:

VISUALISATION: denoising stats

VISUALISATION: feature table summary

VISUALISATION: rep seq tabulation

Taxonomy assignment

Now that we have representative sequences from the denoising process (e.g. ZOTUs, ASVs, ESVs), we can make a taxonomic assignment of them. There are several methods to do this. See the Qiime2 Overview for a discussion of them (Also see this paper). We will start with the machine-learning based classification method, as that is generally favoured by the Qiime group.

Use Naive Bayes (machine learning) to classify

In order to use the Naive Bayes (NB) method to assign taxonomy, it is necessary to train the sequence database first. Because this can take a great deal of time, a pre-trained classifier has been made available for you. The Qiime2 Data Resources page provides some pre-trained classifiers for common primer combinations, as well as links to the Greengenes and Silva databases for 16S and 18S gene studies. For additional primer combinations, or other gene references, there is a tutorial for training feature classifiers.

Use the command below, changing the name of the rep-seqs artifact that you have created:

qiime feature-classifier classify-sklearn \
  --i-classifier /var/DB/greengenes/gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads {REP-SEQS}.qza \
  --o-classification {TAXONOMY}.qza

You can then create a visualisation of the classification:

qiime metadata tabulate \
  --m-input-file {TAXONOMY}.qza \
  --o-visualization {TAXONOMY_VIZ}.qzv

VISUALISATION: taxonomy table

Visualise the taxonomy assignment

You can now use the frequency table created from the denoising, along with the taxonomy output, to create a barplot graph for the taxonomy assignment.

qiime taxa barplot \
  --i-table {TABLE}.qza \
  --i-taxonomy {TAXONOMY}.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization {TAXA-BAR-PLOTS_VIZ}.qzv

VISUALISATION: taxonomy bar plots

Run through steps again with Run 1 data

Now that you have seen the process through to taxonomic classification, you can repeat these steps for run 1. Every separate sequencing run has to be run through denoising separately, so this is not a redundant step. Do all the steps up to Taxonomy assignment, and then you will merge the tables and rep-seqs from both runs into combined files. Then run the NB taxonomy assignment with the combined files.

merge tables from both runs

qiime feature-table merge \
  --i-tables {TABLE-1}.qza \
  --i-tables {TABLE-1}.qza\
  --o-merged-table {COMBINED-TABLE}.qza

qiime feature-table merge-seqs \
  --i-data {REP-SEQS-1}.qza \
  --i-data {REP-SEQS-2}.qza \
  --o-merged-data {COMBINED-REP-SEQS}.qza

qiime feature-table summarize \
  --i-table {COMBINED-TABLE}.qza \
  --o-visualization {COMBINED-TABLE_VIZ}.qzv \
  --m-sample-metadata-file sample-metadata.tsv