In this lesson there are several examples for plotting the results of your analysis. As mentioned in the previous lesson, here are some links with additional graphing examples:
The Phyloseq web page has many good tutorials
The Micca pipeline has a good intro tutorial for Phyloseq
# load the libraries you will need
library('phyloseq')
library('ggplot2')
library('ape')
library("vegan")
library("gridExtra")
Loading required package: permute Loading required package: lattice This is vegan 2.5-7
# go to the plots directory
setwd('../plots')
# first the original
physeq <- readRDS('fish_phyloseq.rds')
print(physeq)
phyloseq-class experiment-level object otu_table() OTU Table: [ 33 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 33 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 33 tips and 32 internal nodes ]
# Now the rarefied version
physeq.rarefied <- readRDS('fish_phyloseq_rarefied.rds')
print(physeq.rarefied)
phyloseq-class experiment-level object otu_table() OTU Table: [ 33 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 33 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 33 tips and 32 internal nodes ]
We will start by plotting a tree with the plot_tree function
plot_tree(physeq)
It is easy to customise the tree plot to add color and change the shape, etc.
plot_tree(physeq, color="sample", ladderize="left", plot.margin=0.3)
You can also remove information to focus on a certain aspect. In the two examples below, we are looking at what OTUs are in which samples
plot_tree(physeq, color="sample", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
plot_tree(physeq, color="sample",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
You can change the coloring as well
plot_tree(physeq, color="temperature",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
plot_tree(physeq, color="salinity",label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
You can also add taxonomic information to the tips (note: it won't perfectly match relationships among the taxa, as the phylogeny was constructed with just the OTUs, which are short fragments.
plot_tree(physeq, color="family",shape="location", label.tips="taxa_names", ladderize="left", plot.margin=0.3, nodelabf=nodeplotblank)
Using the plot_bar function, we can plot the relative abundance of taxa across samples
plot_bar(physeq, fill='genus')
Plotting the taxonomy with the rarefied dataset can help to compare abundance across samples
plot_bar(physeq.rarefied, fill='genus')
You can collapse the frequency counts across metadata categories, such as location
plot_bar(physeq.rarefied, x='location', fill='genus')
You can assign the output of the plot_bar function to a variable, and then add additional customisation with ggplot
barplot1 <- plot_bar(physeq.rarefied, fill='order')
barplot1 + facet_wrap(~location, scales="free_x",nrow=1)
barplot1 + facet_wrap(~temperature, scales="free_x",nrow=1)
barplot2 <- plot_bar(physeq.rarefied, x="temperature", fill='order') +
facet_wrap(~location, scales="free_x", nrow=1)
barplot2
Often there are so many taxonomic groups represented, that it is useful to create a subset of the data that contains only OTUs belonging to a taxonomic rank. This is done with the subset_taxa function. In the example dataset provided, there are not too many taxa included, but for larger datasets this can be very useful
scomb <- subset_taxa(physeq, order=='Scombriformes')
plot_bar(scomb, fill='species')
# the subsetted database is still a valid Phyloseq object
scomb
phyloseq-class experiment-level object otu_table() OTU Table: [ 3 taxa and 11 samples ] sample_data() Sample Data: [ 11 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 3 taxa by 8 taxonomic ranks ] phy_tree() Phylogenetic Tree: [ 3 tips and 2 internal nodes ]
plot_tree(scomb)
The Phyloseq package has many functions for exploring diversity within and between samples. This includes multiple methods of calculating distance matrices between samples.
An ongoing debate with metabarcoding data is whether relative frequency of OTUs should be taken into account, or whether they should be scored as presence or absence. Phyloseq allows you to calculate distance using both qualitative (i.e. presence/absence) and quantitative (abundance included) measures, taken from community ecology diversity statistics.
# qualitative distance: Jaccard
jac_dist <- distance(physeq.rarefied, method = "jaccard", binary = TRUE)
jac_dist
AM1 AM2 AM3 AM4 AM5 AM6 AS2
AM2 0.7692308
AM3 0.6923077 0.8181818
AM4 0.5714286 0.6666667 0.7857143
AM5 0.6666667 0.8235294 0.7647059 0.6666667
AM6 0.4285714 0.8000000 0.7333333 0.6250000 0.5555556
AS2 0.8461538 0.7777778 0.9090909 0.7500000 0.8823529 0.8666667
AS3 0.8666667 0.8181818 0.9230769 0.8666667 0.7647059 0.8823529 0.6666667
AS4 0.8333333 0.7500000 0.9000000 0.7272727 0.8750000 0.8571429 0.2000000
AS5 0.8571429 0.8000000 0.9166667 0.7692308 0.8888889 0.8750000 0.1666667
AS6 0.8750000 0.8333333 0.9285714 0.8000000 0.9000000 0.8888889 0.5555556
AS3 AS4 AS5
AM2
AM3
AM4
AM5
AM6
AS2
AS3
AS4 0.6250000
AS5 0.7000000 0.3333333
AS6 0.7500000 0.5000000 0.6000000
# quantitative distance: Bray-Curtis
bc_dist <- distance(physeq.rarefied, method = "bray", binary = FALSE)
bc_dist
AM1 AM2 AM3 AM4 AM5 AM6 AS2
AM2 0.8106776
AM3 0.7399042 0.7127995
AM4 0.5237509 0.5537303 0.7055441
AM5 0.6639288 0.5456537 0.7561944 0.5842574
AM6 0.6090349 0.3268994 0.6695414 0.3704312 0.4313484
AS2 0.7552361 0.6821355 0.9570157 0.6276523 0.8640657 0.6939083
AS3 0.6442163 0.8403833 0.9813826 0.7712526 0.8572211 0.7885010 0.6600958
AS4 0.7519507 0.6108145 0.9705681 0.6276523 0.8640657 0.6939083 0.2369610
AS5 0.6511978 0.6245038 0.9130732 0.6276523 0.8640657 0.6939083 0.3292266
AS6 0.8518823 0.5939767 0.8039699 0.6317591 0.8995209 0.6939083 0.3935661
AS3 AS4 AS5
AM2
AM3
AM4
AM5
AM6
AS2
AS3
AS4 0.6568104
AS5 0.5560575 0.3741273
AS6 0.7567420 0.3141684 0.4165640
qual_ord <- ordinate(physeq.rarefied, method="PCoA", distance=jac_dist)
plot_jac <- plot_ordination(physeq.rarefied, qual_ord, color="location", title='Jaccard') + theme(aspect.ratio=1) + geom_point(size=4)
plot_jac
quant_ord <- ordinate(physeq.rarefied, method="PCoA", distance=bc_dist)
plot_bc <- plot_ordination(physeq.rarefied, quant_ord, color="location", title='Bray-Curtis') + theme(aspect.ratio=1) + geom_point(size=4)
plot_bc
The gridExtra package has a function called grid.arrance that allows us to view the two graphs next to each other (Note: this is different than the ggplot facet_wrap function, which presents multiple plots from the same dataset.
grid.arrange(plot_jac, plot_bc, nrow=2)
It is also possible to incorporate the phylogeny of the OTUs into the distance methods, using Unifrac distance measure.
You can use Unifrac distances with either qualitative (Unweighted Unifrac) or quantitative (Weighted Unifrac) measures.
Below, we will calculate both unweighted and weighted unifrace distances, and then use grid.arrange to compare them.
# unweighted
uni_dist <- distance(physeq.rarefied, method= "uunifrac")
uni_dist
AM1 AM2 AM3 AM4 AM5 AM6
AM2 0.71039237
AM3 0.56507028 0.68628784
AM4 0.57723563 0.38691212 0.56274604
AM5 0.60416790 0.72216107 0.62751127 0.59368140
AM6 0.50451941 0.60516091 0.50638993 0.43935884 0.33941190
AS2 0.77896540 0.55382422 0.74684537 0.52339990 0.79460180 0.68475093
AS3 0.76643229 0.68320554 0.73557657 0.64906412 0.58627532 0.63929444
AS4 0.77102083 0.51988750 0.73354418 0.49088070 0.78684199 0.67062487
AS5 0.78281265 0.56936519 0.75313701 0.53845140 0.79834972 0.69150583
AS6 0.75544085 0.63792088 0.71701801 0.60607619 0.71718622 0.70195679
AS2 AS3 AS4 AS5
AM2
AM3
AM4
AM5
AM6
AS2
AS3 0.52367651
AS4 0.11761433 0.49067179
AS5 0.05664695 0.53893062 0.16759878
AS6 0.41333783 0.40420255 0.36357102 0.43586043
# weighted unifrac
wuni_dist <- distance(physeq.rarefied, method= "wunifrac")
uni_ord <- ordinate(physeq.rarefied, method="PCoA", distance = uni_dist)
wuni_ord <- ordinate(physeq.rarefied, method="PCoA", distance = wuni_dist)
plot_uni <- plot_ordination(physeq.rarefied, uni_ord, color="location", title="Unweighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=4)
plot_uni
plot_wuni <- plot_ordination(physeq.rarefied, wuni_ord, color="location", title="Weighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=4)
plot_wuni
grid.arrange(plot_jac, plot_bc, plot_uni,plot_wuni, ncol=2, nrow=2)
Phyloseq allows you to view multiple characteristics on a plot
plot_wuniT <- plot_ordination(physeq.rarefied, wuni_ord, color="location", shape='temperature', title="Weighted Unifrac") +
theme(aspect.ratio=1) +
geom_point(size=5)
plot_wuniT
Phyloseq includes the adonis function to run a PERMANOVA to determine if OTUs from specific metadata fields are significantly different from each other.
adonis(wuni_dist ~ sample_data(physeq.rarefied)$location)
Call:
adonis(formula = wuni_dist ~ sample_data(physeq.rarefied)$location)
Permutation: free
Number of permutations: 999
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2
sample_data(physeq.rarefied)$location 1 0.13144 0.131438 5.938 0.39751
Residuals 9 0.19921 0.022135 0.60249
Total 10 0.33065 1.00000
Pr(>F)
sample_data(physeq.rarefied)$location 0.002 **
Residuals
Total
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
adonis(wuni_dist ~ sample_data(physeq.rarefied)$temperature)
Call:
adonis(formula = wuni_dist ~ sample_data(physeq.rarefied)$temperature)
Permutation: free
Number of permutations: 999
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2
sample_data(physeq.rarefied)$temperature 2 0.06606 0.033031 0.99869 0.19979
Residuals 8 0.26459 0.033074 0.80021
Total 10 0.33065 1.00000
Pr(>F)
sample_data(physeq.rarefied)$temperature 0.441
Residuals
Total
Just as you can save a plot to a variable for later visualisation, you can write a plot variable to a file. This allows you to modify the parameters (e.g. size, colors), before writing to a file
plot_bc <- plot_ordination(physeq.rarefied, quant_ord, color="location", title='Bray-Curtis') + theme(aspect.ratio=1)+ geom_point(size=4)
plot_bc
pdf('ordination_plot.pdf')
## enter saved plot
plot_bc
## close the file
dev.off()