In this lesson there are several examples for plotting the results of your analysis. As mentioned in the previous lesson, here are some links with additional graphing examples:
The Phyloseq web page has many good tutorials
The Micca pipeline has a good intro tutorial for Phyloseq
# load the libraries you will need
library('phyloseq')
library('ggplot2')
library('ape')
library("vegan")
library("gridExtra")
Loading required package: permute Loading required package: lattice This is vegan 2.5-7
# go to the plots directory
setwd('../plots')
# first the original
physeq <- readRDS('fish_phyloseq.rds')
print(physeq)
phyloseq-class experiment-level object otu_table() OTU Table: [ 33 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 33 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 33 tips and 32 internal nodes ]
# Now the rarefied version
physeq.rarefied <- readRDS('fish_phyloseq_rarefied.rds')
print(physeq.rarefied)
phyloseq-class experiment-level object otu_table() OTU Table: [ 33 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 33 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 33 tips and 32 internal nodes ]
We will start by plotting a tree with the plot_tree
function
plot_tree(physeq)
It is easy to customise the tree plot to add color and change the shape, etc.
plot_tree(physeq, color="sample", ladderize="left", plot.margin=0.3)
You can also remove information to focus on a certain aspect. In the two examples below, we are looking at what OTUs are in which samples
plot_tree(physeq, color="sample", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
plot_tree(physeq, color="sample",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
You can change the coloring as well
plot_tree(physeq, color="temperature",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
plot_tree(physeq, color="salinity",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
You can also add taxonomic information to the tips (note: it won't perfectly match relationships among the taxa, as the phylogeny was constructed with just the OTUs, which are short fragments.
plot_tree(physeq, color="family",shape="location", label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
Using the plot_bar
function, we can plot the relative abundance of taxa across samples
plot_bar(physeq, fill='genus')
Plotting the taxonomy with the rarefied dataset can help to compare abundance across samples
plot_bar(physeq.rarefied, fill='genus')
You can collapse the frequency counts across metadata categories, such as location
plot_bar(physeq.rarefied, x='location', fill='genus')
You can assign the output of the plot_bar
function to a variable, and then add additional customisation with ggplot
barplot1 <- plot_bar(physeq.rarefied, fill='order')
barplot1 + facet_wrap(~location, scales="free_x",nrow=1)
barplot1 + facet_wrap(~temperature, scales="free_x",nrow=1)
barplot2 <- plot_bar(physeq.rarefied, x="temperature", fill='order') +
facet_wrap(~location, scales="free_x", nrow=1)
barplot2
Often there are so many taxonomic groups represented, that it is useful to create a subset of the data that contains only OTUs belonging to a taxonomic rank. This is done with the subset_taxa
function. In the example dataset provided, there are not too many taxa included, but for larger datasets this can be very useful
scomb <- subset_taxa(physeq, order=='Scombriformes')
plot_bar(scomb, fill='species')
# the subsetted database is still a valid Phyloseq object
scomb
phyloseq-class experiment-level object otu_table() OTU Table: [ 3 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 3 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 3 tips and 2 internal nodes ]
plot_tree(scomb)
The Phyloseq package has many functions for exploring diversity within and between samples. This includes multiple methods of calculating distance matrices between samples.
An ongoing debate with metabarcoding data is whether relative frequency of OTUs should be taken into account, or whether they should be scored as presence or absence. Phyloseq allows you to calculate distance using both qualitative (i.e. presence/absence) and quantitative (abundance included) measures, taken from community ecology diversity statistics.
# qualitative distance: Jaccard
jac_dist <- distance(physeq.rarefied, method = "jaccard", binary = TRUE)
jac_dist
AM1 AM2 AM3 AM4 AM5 AM6 AS2 AM2 0.7692308 AM3 0.6923077 0.8181818 AM4 0.5714286 0.6666667 0.7857143 AM5 0.6666667 0.8235294 0.7647059 0.6666667 AM6 0.4285714 0.8000000 0.7333333 0.6250000 0.5555556 AS2 0.8461538 0.7777778 0.9090909 0.7500000 0.8823529 0.8666667 AS3 0.8666667 0.8181818 0.9230769 0.8666667 0.7647059 0.8823529 0.6666667 AS4 0.8333333 0.7500000 0.9000000 0.7272727 0.8750000 0.8571429 0.2000000 AS5 0.8571429 0.8000000 0.9166667 0.7692308 0.8888889 0.8750000 0.1666667 AS6 0.8750000 0.8333333 0.9285714 0.8000000 0.9000000 0.8888889 0.5555556 AS3 AS4 AS5 AM2 AM3 AM4 AM5 AM6 AS2 AS3 AS4 0.6250000 AS5 0.7000000 0.3333333 AS6 0.7500000 0.5000000 0.6000000
# quantitative distance: Bray-Curtis
bc_dist <- distance(physeq.rarefied, method = "bray", binary = FALSE)
bc_dist
AM1 AM2 AM3 AM4 AM5 AM6 AS2 AM2 0.8106776 AM3 0.7399042 0.7127995 AM4 0.5237509 0.5537303 0.7055441 AM5 0.6639288 0.5456537 0.7561944 0.5842574 AM6 0.6090349 0.3268994 0.6695414 0.3704312 0.4313484 AS2 0.7552361 0.6821355 0.9570157 0.6276523 0.8640657 0.6939083 AS3 0.6442163 0.8403833 0.9813826 0.7712526 0.8572211 0.7885010 0.6600958 AS4 0.7519507 0.6108145 0.9705681 0.6276523 0.8640657 0.6939083 0.2369610 AS5 0.6511978 0.6245038 0.9130732 0.6276523 0.8640657 0.6939083 0.3292266 AS6 0.8518823 0.5939767 0.8039699 0.6317591 0.8995209 0.6939083 0.3935661 AS3 AS4 AS5 AM2 AM3 AM4 AM5 AM6 AS2 AS3 AS4 0.6568104 AS5 0.5560575 0.3741273 AS6 0.7567420 0.3141684 0.4165640
qual_ord <- ordinate(physeq.rarefied, method="PCoA", distance=jac_dist)
plot_jac <- plot_ordination(physeq.rarefied, qual_ord, color="location", title='Jaccard') + theme(aspect.ratio=1) + geom_point(size=4)
plot_jac
quant_ord <- ordinate(physeq.rarefied, method="PCoA", distance=bc_dist)
plot_bc <- plot_ordination(physeq.rarefied, quant_ord, color="location", title='Bray-Curtis') + theme(aspect.ratio=1) + geom_point(size=4)
plot_bc
The gridExtra package has a function called grid.arrance
that allows us to view the two graphs next to each other (Note: this is different than the ggplot facet_wrap
function, which presents multiple plots from the same dataset.
grid.arrange(plot_jac, plot_bc, nrow=2)
It is also possible to incorporate the phylogeny of the OTUs into the distance methods, using Unifrac distance measure.
You can use Unifrac distances with either qualitative (Unweighted Unifrac) or quantitative (Weighted Unifrac) measures.
Below, we will calculate both unweighted and weighted unifrace distances, and then use grid.arrange
to compare them.
# unweighted
uni_dist <- distance(physeq.rarefied, method= "uunifrac")
uni_dist
AM1 AM2 AM3 AM4 AM5 AM6 AM2 0.71039237 AM3 0.56507028 0.68628784 AM4 0.57723563 0.38691212 0.56274604 AM5 0.60416790 0.72216107 0.62751127 0.59368140 AM6 0.50451941 0.60516091 0.50638993 0.43935884 0.33941190 AS2 0.77896540 0.55382422 0.74684537 0.52339990 0.79460180 0.68475093 AS3 0.76643229 0.68320554 0.73557657 0.64906412 0.58627532 0.63929444 AS4 0.77102083 0.51988750 0.73354418 0.49088070 0.78684199 0.67062487 AS5 0.78281265 0.56936519 0.75313701 0.53845140 0.79834972 0.69150583 AS6 0.75544085 0.63792088 0.71701801 0.60607619 0.71718622 0.70195679 AS2 AS3 AS4 AS5 AM2 AM3 AM4 AM5 AM6 AS2 AS3 0.52367651 AS4 0.11761433 0.49067179 AS5 0.05664695 0.53893062 0.16759878 AS6 0.41333783 0.40420255 0.36357102 0.43586043
# weighted unifrac
wuni_dist <- distance(physeq.rarefied, method= "wunifrac")
uni_ord <- ordinate(physeq.rarefied, method="PCoA", distance = uni_dist)
wuni_ord <- ordinate(physeq.rarefied, method="PCoA", distance = wuni_dist)
plot_uni <- plot_ordination(physeq.rarefied, uni_ord, color="location", title="Unweighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=4)
plot_uni
plot_wuni <- plot_ordination(physeq.rarefied, wuni_ord, color="location", title="Weighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=4)
plot_wuni
grid.arrange(plot_jac, plot_bc, plot_uni,plot_wuni, ncol=2, nrow=2)
Phyloseq allows you to view multiple characteristics on a plot
plot_wuniT <- plot_ordination(physeq.rarefied, wuni_ord, color="location", shape='temperature', title="Weighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=5)
plot_wuniT
Phyloseq includes the adonis
function to run a PERMANOVA to determine if OTUs from specific metadata fields are significantly different from each other.
adonis(wuni_dist ~ sample_data(physeq.rarefied)$location)
Call: adonis(formula = wuni_dist ~ sample_data(physeq.rarefied)$location) Permutation: free Number of permutations: 999 Terms added sequentially (first to last) Df SumsOfSqs MeanSqs F.Model R2 sample_data(physeq.rarefied)$location 1 0.13144 0.131438 5.938 0.39751 Residuals 9 0.19921 0.022135 0.60249 Total 10 0.33065 1.00000 Pr(>F) sample_data(physeq.rarefied)$location 0.002 ** Residuals Total --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
adonis(wuni_dist ~ sample_data(physeq.rarefied)$temperature)
Call: adonis(formula = wuni_dist ~ sample_data(physeq.rarefied)$temperature) Permutation: free Number of permutations: 999 Terms added sequentially (first to last) Df SumsOfSqs MeanSqs F.Model R2 sample_data(physeq.rarefied)$temperature 2 0.06606 0.033031 0.99869 0.19979 Residuals 8 0.26459 0.033074 0.80021 Total 10 0.33065 1.00000 Pr(>F) sample_data(physeq.rarefied)$temperature 0.441 Residuals Total
Just as you can save a plot to a variable for later visualisation, you can write a plot variable to a file. This allows you to modify the parameters (e.g. size, colors), before writing to a file
plot_bc <- plot_ordination(physeq.rarefied, quant_ord, color="location", title='Bray-Curtis') + theme(aspect.ratio=1)+ geom_point(size=4)
plot_bc
pdf('ordination_plot.pdf')
## enter saved plot
plot_bc
## close the file
dev.off()