Graphing metabarcoding data with R

In this lesson there are several examples for plotting the results of your analysis. As mentioned in the previous lesson, here are some links with additional graphing examples:

plotting trees

We will start by plotting a tree with the plot_tree function

It is easy to customise the tree plot to add color and change the shape, etc.

You can also remove information to focus on a certain aspect. In the two examples below, we are looking at what OTUs are in which samples

You can change the coloring as well

You can also add taxonomic information to the tips (note: it won't perfectly match relationships among the taxa, as the phylogeny was constructed with just the OTUs, which are short fragments.

Plotting taxonomy

Using the plot_bar function, we can plot the relative abundance of taxa across samples

Plotting the taxonomy with the rarefied dataset can help to compare abundance across samples

You can collapse the frequency counts across metadata categories, such as location

You can assign the output of the plot_bar function to a variable, and then add additional customisation with ggplot

Often there are so many taxonomic groups represented, that it is useful to create a subset of the data that contains only OTUs belonging to a taxonomic rank. This is done with the subset_taxa function. In the example dataset provided, there are not too many taxa included, but for larger datasets this can be very useful

Community ecology

The Phyloseq package has many functions for exploring diversity within and between samples. This includes multiple methods of calculating distance matrices between samples.

An ongoing debate with metabarcoding data is whether relative frequency of OTUs should be taken into account, or whether they should be scored as presence or absence. Phyloseq allows you to calculate distance using both qualitative (i.e. presence/absence) and quantitative (abundance included) measures, taken from community ecology diversity statistics.

The gridExtra package has a function called grid.arrance that allows us to view the two graphs next to each other (Note: this is different than the ggplot facet_wrap function, which presents multiple plots from the same dataset.

Incorporate phylogeny into diversity measure

It is also possible to incorporate the phylogeny of the OTUs into the distance methods, using Unifrac distance measure.

You can use Unifrac distances with either qualitative (Unweighted Unifrac) or quantitative (Weighted Unifrac) measures.

Below, we will calculate both unweighted and weighted unifrace distances, and then use grid.arrange to compare them.

Phyloseq allows you to view multiple characteristics on a plot

Permutational ANOVA

Phyloseq includes the adonis function to run a PERMANOVA to determine if OTUs from specific metadata fields are significantly different from each other.

Saving plots

Just as you can save a plot to a variable for later visualisation, you can write a plot variable to a file. This allows you to modify the parameters (e.g. size, colors), before writing to a file