<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id><journal-title-group>
<journal-title>PLoS ONE</journal-title></journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PONE-D-12-31789</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0061217</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biology</subject><subj-group><subject>Computational biology</subject><subj-group><subject>Genomics</subject><subj-group><subject>Metagenomics</subject></subj-group></subj-group><subj-group><subject>Biological data management</subject><subject>Sequence analysis</subject></subj-group></subj-group><subj-group><subject>Genomics</subject><subj-group><subject>Metagenomics</subject></subj-group></subj-group><subj-group><subject>Microbiology</subject><subj-group><subject>Applied microbiology</subject><subject>Microbial ecology</subject></subj-group></subj-group><subj-group><subject>Population biology</subject><subj-group><subject>Population ecology</subject><subject>Population genetics</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer science</subject><subj-group><subject>Programming languages</subject><subj-group><subject>High level languages</subject></subj-group></subj-group></subj-group></article-categories>
<title-group>
<article-title>phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data</article-title>
<alt-title alt-title-type="running-head">An R Package for Microbiome Census Data</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>McMurdie</surname><given-names>Paul J.</given-names></name><xref ref-type="aff" rid="aff1"/></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Holmes</surname><given-names>Susan</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><addr-line>Department of Statistics, Stanford University, Stanford, California, United States of America</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Watson</surname><given-names>Michael</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/></contrib>
</contrib-group>
<aff id="edit1"><addr-line>The Roslin Institute, University of Edinburgh, United Kingdom</addr-line></aff>
<author-notes>
<corresp id="cor1">* E-mail: <email xlink:type="simple">susan@stat.stanford.edu</email></corresp>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
<fn fn-type="con"><p>Designed and wrote the software described: PJM. Conceived and designed the experiments: PJM SH. Performed the experiments: PJM SH. Analyzed the data: PJM SH. Contributed reagents/materials/analysis tools: PJM SH. Wrote the paper: PJM SH.</p></fn>
</author-notes>
<pub-date pub-type="collection"><year>2013</year></pub-date>
<pub-date pub-type="epub"><day>22</day><month>4</month><year>2013</year></pub-date>
<volume>8</volume>
<issue>4</issue>
<elocation-id>e61217</elocation-id>
<history>
<date date-type="received"><day>17</day><month>10</month><year>2012</year></date>
<date date-type="accepted"><day>6</day><month>3</month><year>2013</year></date>
</history>
<permissions>
<copyright-year>2013</copyright-year>
<copyright-holder>McMurdie, Holmes</copyright-holder><license xlink:type="simple"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions>
<abstract><sec>
<title>Background</title>
<p>The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.</p>
</sec><sec>
<title>Results</title>
<p>Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.</p>
</sec><sec>
<title>Conclusions</title>
<p>The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.</p>
</sec></abstract>
<funding-group><funding-statement>This work was supported by grant NIH-R01GM086884. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement></funding-group><counts><page-count count="11"/></counts></article-meta>
</front>
<body><sec id="s1">
<title>Introduction</title>
<sec id="s1a">
<title>Phylogenetic Sequencing</title>
<p>High-throughput (HT) DNA sequencing <xref ref-type="bibr" rid="pone.0061217-Metzker1">[1]</xref> is allowing major advances in microbial ecology studies <xref ref-type="bibr" rid="pone.0061217-Hamady1">[2]</xref>, where our understanding of the presence and abundance of microbial species relies heavily on the observation of their nucleic acids in a “culture independent” manner <xref ref-type="bibr" rid="pone.0061217-Pace1">[3]</xref>. This nucleic acid sequencing based census of the inhabitants of microbiome samples is very often now accompanied with other experimental observations (e.g. clinical, environmental, metabolomic, etc.), in addition to phylogenetic tree reconstruction and/or taxonomic classification of the sequences. Here we refer to this as “phylogenetic sequencing” data if it can be usefully represented as a contingency table of taxonomic units and samples, and integrated with the other aforementioned data types. Importantly, this term – also the namesake of the software here described – is defined so as to not be specific to the method by which the phylogenetically relevant microbial census data was obtained, reflecting the intended level of data abstraction in the software. The following are two examples of common methods for producing phylogenetic sequencing data.</p>
<p>Barcoded <xref ref-type="bibr" rid="pone.0061217-Hamady1">[2]</xref> amplicon sequencing of dozens to hundreds of samples <xref ref-type="bibr" rid="pone.0061217-Liu1">[4]</xref> is a method of phylogenetic sequencing of microbiomes, often targeting the small subunit ribosomal RNA (16S rRNA) gene <xref ref-type="bibr" rid="pone.0061217-Pace1">[3]</xref>, for which there are also convenient tools <xref ref-type="bibr" rid="pone.0061217-DeSantis1">[5]</xref> and large reference databases <xref ref-type="bibr" rid="pone.0061217-DeSantis2">[6]</xref>–<xref ref-type="bibr" rid="pone.0061217-Pruesse1">[8]</xref>. The task of decoding the sample source of each sequence read by its barcode, followed by similarity clustering to define <italic>operational taxonomic units</italic> (OTUs, sometimes referred to as <italic>taxa</italic>) <xref ref-type="bibr" rid="pone.0061217-Li1">[9]</xref>, <xref ref-type="bibr" rid="pone.0061217-Huang1">[10]</xref> can be performed by publicly available packages/pipelines, including QIIME <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, mothur <xref ref-type="bibr" rid="pone.0061217-Schloss1">[12]</xref>, and PANGEA <xref ref-type="bibr" rid="pone.0061217-Giongo1">[13]</xref>; as well as virtual machine (VM) and cloud-based solutions such as the RDP pipeline <xref ref-type="bibr" rid="pone.0061217-Cole1">[7]</xref>, Pyrotagger <xref ref-type="bibr" rid="pone.0061217-Kunin1">[14]</xref>, CLoVR-16S <xref ref-type="bibr" rid="pone.0061217-Angiuoli1">[15]</xref>, Genboree <xref ref-type="bibr" rid="pone.0061217-8th1">[16]</xref>, QIIME EC2 image <xref ref-type="bibr" rid="pone.0061217-QIIME1">[17]</xref>, n3phele <xref ref-type="bibr" rid="pone.0061217-University1">[18]</xref>, and MG-RAST <xref ref-type="bibr" rid="pone.0061217-Meyer1">[19]</xref>.</p>
<p>An alternative experimental method is random “shotgun” sequencing <xref ref-type="bibr" rid="pone.0061217-Venter1">[20]</xref>, <xref ref-type="bibr" rid="pone.0061217-Fleischmann1">[21]</xref> of un-amplified metagenomic DNA <xref ref-type="bibr" rid="pone.0061217-Venter2">[22]</xref>, in which case OTU clustering and counting is based upon one or more detectable phylogenetic markers in the metagenomic sequence fragments, using tools such as phylOTU <xref ref-type="bibr" rid="pone.0061217-Sharpton1">[23]</xref>. It is worth noting that bias from PCR amplification is avoided in this latter approach – at the expense of per-sequence efficiency <xref ref-type="bibr" rid="pone.0061217-Sharpton1">[23]</xref> – and both methods are now commonly used for phylogenetic sequencing (<xref ref-type="fig" rid="pone-0061217-g001">Figure 1</xref>).</p>
<fig id="pone-0061217-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.g001</object-id><label>Figure 1</label><caption>
<title>Example of a phylogenetic sequencing workflow.</title>
<p>A diagram of an experimental and analysis workflow for amplicon or shotgun phylogenetic sequencing. The intended role for phyloseq is indicated.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.g001" position="float" xlink:type="simple"/></fig></sec><sec id="s1b">
<title>The phyloseq Project</title>
<p>Many of the previously mentioned OTU-clustering applications also perform additional downstream analyses (File S1). However, typically an investigator must port the human-unreadable output data files to other software for additional processing and statistical analysis specific to the goals of the investigation. The powerful statistical, ecological, and graphics tools available in R <xref ref-type="bibr" rid="pone.0061217-R1">[24]</xref> make it an attractive option for this post-clustering stage of analysis. While the computational efficiency of compiled languages like <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0061217.e001" xlink:type="simple"/></inline-formula> <xref ref-type="bibr" rid="pone.0061217-Stroustrup1">[25]</xref> make them appropriate for the expensive but well-defined requirements of the initial sequence-processing, the subsequent analysis is vaguely-defined and project specific; requiring instead a broad set of interactive calculations that is often less computationally expensive and for which R is well-suited <xref ref-type="bibr" rid="pone.0061217-Chambers1">[26]</xref>. The public repositories of open-source R extensions (“packages” or “libraries”) include many dedicated ecology and phylogenetic packages. For instance, there are several dozen packages listed in the CRAN Ecology Task View <xref ref-type="bibr" rid="pone.0061217-Simpson1">[27]</xref>, as well as distory <xref ref-type="bibr" rid="pone.0061217-Chakerian1">[28]</xref>, phangorn <xref ref-type="bibr" rid="pone.0061217-Schliep1">[29]</xref>, picante <xref ref-type="bibr" rid="pone.0061217-Kembel1">[30]</xref>, and now phyloseq <xref ref-type="bibr" rid="pone.0061217-McMurdie1">[31]</xref>. Furthermore, R includes infrastructure for documenting an analysis in such a way that it can be easily reproduced and modified by peers <xref ref-type="bibr" rid="pone.0061217-Hardle1">[32]</xref>, <xref ref-type="bibr" rid="pone.0061217-Xie1">[33]</xref>.</p>
<p>In spite of all of these highly relevant tools, we recently described the lack of a satisfactory standard within <italic>Bioconductor</italic> <xref ref-type="bibr" rid="pone.0061217-Gentleman1">[34]</xref> (or R generally) for importing the data files from the most popular OTU-clustering applications, or representing this data in a complete, integrated class <xref ref-type="bibr" rid="pone.0061217-McMurdie1">[31]</xref>. One Bioconductor package, OTUbase <xref ref-type="bibr" rid="pone.0061217-Beck1">[35]</xref>, pursues some of these goals, but has no support for phylogenetic trees in its data class, nor support for importing data from popular/recent OTU-clustering output formats <xref ref-type="bibr" rid="pone.0061217-Beck1">[35]</xref>, <xref ref-type="bibr" rid="pone.0061217-OTUbase1">[36]</xref> (File S1). We have proposed a new Bioconductor package, phyloseq (from “<underline>phylo</underline>genetic <underline>seq</underline>uencing”), dedicated to the object-oriented representation and analysis of phylogenetic sequencing data in R <xref ref-type="bibr" rid="pone.0061217-McMurdie1">[31]</xref>, and supporting common OTU-clustering output formats like QIIME <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, mothur <xref ref-type="bibr" rid="pone.0061217-Schloss1">[12]</xref>, the RDP-pipeline <xref ref-type="bibr" rid="pone.0061217-Cole1">[7]</xref>, Pyrotagger <xref ref-type="bibr" rid="pone.0061217-Kunin1">[14]</xref>, and the biom-format <xref ref-type="bibr" rid="pone.0061217-McDonald1">[37]</xref>.</p>
<p>In this article we describe the conceptual framework and toolbox of a substantially enhanced phyloseq codebase, including especially some advanced ordination and graphics capabilities. We further note that data imported by phyloseq is also accessible to analyses encoded by a large number of freely available R packages, in addition to the capabilities directly supported by phyloseq itself. We will end by discussing the notion of “reproducible research” in the context of phylogenetic sequencing data, and how phyloseq and R can be used in analyses that are more open and reproducible than those found in recent common practice.</p>
</sec></sec><sec id="s2" sec-type="methods">
<title>Methods</title>
<sec id="s2a">
<title>phyloseq Project Key Features</title>
<p>The phyloseq package provides an object-oriented programming infrastructure that simplifies many of the common data management and preprocessing tasks required during analysis of phylogenetic sequencing data. This simplified syntax helps mitigate inconsistency errors and encourages interaction with the data during preprocessing. The phyloseq package also provides a set of powerful analysis and graphics functions, building upon related packages available in R and Bioconductor. It includes or supports some of the most commonly-needed ecology and phylogenetic tools, including a consistent interface for calculating ecological distances and performing dimensional reduction (ordination). The graphics functions allow users to interactively produce annotated publication-quality graphics in just one or two lines of code. The phyloseq package includes extensive documentation in the form of function- and package-level manuals embedded in the package's documentation interface and in a PDF version on Bioconductor <xref ref-type="bibr" rid="pone.0061217-McMurdie2">[38]</xref>, as well as extended reproducible examples on the phyloseq homepage <xref ref-type="bibr" rid="pone.0061217-The1">[39]</xref>, and open collaborative development on GitHub <xref ref-type="bibr" rid="pone.0061217-R2">[40]</xref>.</p>
</sec><sec id="s2b">
<title>Implementation</title>
<p>The phyloseq package adheres to the requirements for standard R packages set forth in the official “Writing R Extensions” manual <xref ref-type="bibr" rid="pone.0061217-R2">[40]</xref>. It also satisfies additional requirements of the Bioconductor Repository <xref ref-type="bibr" rid="pone.0061217-Gentleman1">[34]</xref>, and uses a literate-programming framework based on structured in-source comments, called roxygen2 <xref ref-type="bibr" rid="pone.0061217-Wickham1">[41]</xref>, for (re)building the R documentation (.Rd) files and the namespace specifications. The phyloseq package can be installed on any system on which R is supported, including Mac OS X, Windows, and most Linux distributions.</p>
</sec><sec id="s2c">
<title>Data Availability</title>
<p>R packages can include example data that is documented with the same help system as other package objects <xref ref-type="bibr" rid="pone.0061217-Wilkinson1">[58]</xref>. This data becomes available in the R session by invoking the data function after the package has been loaded. Unless otherwise noted, the examples provided in this manuscript use example data that is included in the phyloseq package.</p>
</sec><sec id="s2d">
<title>Data Infrastructure and Design</title>
<p>The phyloseq project includes an object-oriented class that integrates the heterogeneous components of OTU-clustered phylogenetic sequencing data. Although Bioconductor provides many utilities for efficient manipulation of DNA sequences, phyloseq does not currently re-implement any methods for DNA sequence decoding, processing, or OTU-clustering (<xref ref-type="fig" rid="pone-0061217-g001">Figure 1</xref>, File S1). Instead, phyloseq provides tools to read the output files of the most common OTU-clustering applications <xref ref-type="bibr" rid="pone.0061217-Cole1">[7]</xref>, <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, <xref ref-type="bibr" rid="pone.0061217-Schloss1">[12]</xref>, <xref ref-type="bibr" rid="pone.0061217-Kunin1">[14]</xref>, and represents this data in R as an instance of the main data class. This multi-component “experiment-level” class — named “phyloseq”, and referred to here as “the phyloseq-class” — is a key design feature of the phyloseq project, with subsequent user-accessible functions expecting to operate on an instance of this class as their sole or primary input data. These functions are described in detail in the phyloseq manual <xref ref-type="bibr" rid="pone.0061217-Cole1">[7]</xref>, and are part of a modular workflow summarized in <xref ref-type="fig" rid="pone-0061217-g002">Figure 2</xref>.</p>
<fig id="pone-0061217-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.g002</object-id><label>Figure 2</label><caption>
<title>Analysis workflow using phyloseq.</title>
<p>The workflow starts with the results of OTU clustering and independently-measured sample data (Input, top left), and ends at various analytic procedures available in R for inference and validation. In between are key functions for preprocessing and graphics. Rounded rectangles and diamond shapes represent functions and data objects, respectively, further described in <xref ref-type="fig" rid="pone-0061217-g003">Figure 3</xref>.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.g002" position="float" xlink:type="simple"/></fig>
<p><xref ref-type="fig" rid="pone-0061217-g003">Figure 3</xref> summarizes the structure of the phyloseq-class and its components. Each of the slots are empty (NULL) by default, although an instance missing an otu_table component is invalid. Tools in phyloseq that truncate dimensions of one component (that is, remove samples or OTUs) automatically propagate the change across all relevant components. In general, researchers only need to manipulate their “experiment-level” object, making data (pre)processing less prone to mistakes, and often simplifying analysis commands to just one data argument.</p>
<fig id="pone-0061217-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.g003</object-id><label>Figure 3</label><caption>
<title>The “phyloseq” class.</title>
<p>The phyloseq class is an experiment-level data storage class defined by the phyloseq package for representing phylogenetic sequencing data. Most functions in the phyloseq package expect an instance of this class as their primary argument. See the phyloseq manual <xref ref-type="bibr" rid="pone.0061217-McMurdie2">[38]</xref> for a complete list of functions.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.g003" position="float" xlink:type="simple"/></fig></sec><sec id="s2e">
<title>Analysis Functions</title>
<p>Complementing the data infrastructure, the phyloseq package provides a set of functions that take a phyloseq object as the primary data, and performs an analysis and/or graphics task. <xref ref-type="fig" rid="pone-0061217-g002">Figure 2</xref> summarizes the general workflow within phyloseq, and lists some of the main functions/tools.</p>
<p>Comparisons of the type and quantity of OTUs observed between microbiome samples (“beta diversity”) is often approached through the calculation of pairwise ecological distances <xref ref-type="bibr" rid="pone.0061217-Faith1">[42]</xref>, <xref ref-type="bibr" rid="pone.0061217-Anderson1">[43]</xref>, and through dimensional reduction (ordination) methods. The phyloseq package provides a consistent interface for the most common approaches to distance calculations and ordination. This interface is also the foundation for the custom ordination and heatmap graphics functions described in the next subsection.</p>
<p>In phyloseq the interface for ecological distance calculations is a single function, distance, that takes a phyloseq object as its data argument as well as a character string indicating the distance method, with explicit support for more than 40 ecological distance methods. This includes a R -native, optionally-parallel implementation of Fast UniFrac <xref ref-type="bibr" rid="pone.0061217-Hamady2">[44]</xref> (both weighted <xref ref-type="bibr" rid="pone.0061217-Lozupone1">[45]</xref> and unweighted <xref ref-type="bibr" rid="pone.0061217-Lozupone2">[46]</xref>). The output is a “dist” class distance matrix (lower-triangle) appropriate for standard clustering analysis in core R (e.g. hclust), as well as certain dimensional reduction (ordination) methods.</p>
<p>The interface for performing ordination methods is also a single function, called ordinate, that takes a phyloseq object as its primary data argument and a character string indicating the desired ordination method. For example, the following would perform (unconstrained) correspondence analysis on the included “Global Patterns” dataset <xref ref-type="bibr" rid="pone.0061217-Caporaso2">[47]</xref>.<disp-formula id="pone.0061217.e002"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pone.0061217.e002" xlink:type="simple"/></disp-formula></p>
<p>The ordinate function currently supports correspondence analysis (CA) <xref ref-type="bibr" rid="pone.0061217-Greenacre1">[48]</xref>, constrained correspondence analysis (CCA) <xref ref-type="bibr" rid="pone.0061217-TerBraak1">[49]</xref>, detrended correspondence analysis (DCA) <xref ref-type="bibr" rid="pone.0061217-Hill1">[50]</xref>, redundancy analysis (RDA) <xref ref-type="bibr" rid="pone.0061217-Wollenberg1">[51]</xref>, principal components analysis (PCA) <xref ref-type="bibr" rid="pone.0061217-Hotelling1">[52]</xref>, double principle coordinates analysis (DPCoA) <xref ref-type="bibr" rid="pone.0061217-Pavoine1">[53]</xref>, multidimensional scaling (MDS, PCoA) <xref ref-type="bibr" rid="pone.0061217-Gower1">[54]</xref>, and non-metric multidimensional scaling (NMDS) <xref ref-type="bibr" rid="pone.0061217-Minchin1">[55]</xref>. For CA, CCA, DCA, RDA, and DPCoA, the ordination is based upon an evaluation of abundance values (in the case of DPCoA, the patristic distances between OTUs on the phylogenetic tree is also used), but not an ecological distance. For MDS and NMDS, the ordinate function requires a pre-calculated distance matrix (“dist” object) or the name of a supported ecological distance method. For example, PCoA/MDS can be calculated on an unweighted UniFrac distance matrix <xref ref-type="bibr" rid="pone.0061217-Lozupone2">[46]</xref>, using the following command:<disp-formula id="pone.0061217.e003"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pone.0061217.e003" xlink:type="simple"/></disp-formula></p>
<p>There are many combinations of approaches possible (even extending into time-series of table pairs), and the optimal approach depends on the goals of the experiment and characteristics of the data <xref ref-type="bibr" rid="pone.0061217-Thioulouse1">[56]</xref>. The phyloseq package also includes a specialized function for displaying ordination results in different ways, described in the following section.</p>
</sec><sec id="s2f">
<title>Specialized Graphics</title>
<p>One of the key features of the phyloseq package is a set of graphics functions custom-tailored for phylogenetic sequencing analysis, built using the ggplot2 package <xref ref-type="bibr" rid="pone.0061217-Wickham2">[57]</xref>. The ggplot2 package is an implementation of Wilkinson's <italic>The Grammar of Graphics</italic>, which provides an object-oriented description of analytical graphics that emphasizes the separation of data and its mapping to aesthetic attributes <xref ref-type="bibr" rid="pone.0061217-Wilkinson1">[58]</xref>. In the phyloseq package, functions having names beginning with “plot_” require a phyloseq object as input data, and return a ggplot2 graphics object. These plot_ functions support optional mapping of color, size, and shape aesthetics to sample or OTU variables — usually by providing the name of the variable or taxonomic rank as a character string (E.g. color = “SampleType”). Legends are automatically generated based on the data and aesthetic mappings (not true of the base R graphics), and all features of these graphics can be further modified in R via functions/options in the ggplot2 package.</p>
<p>The following list summarizes the key graphics-producing functions in phyloseq, which are also demonstrated in <xref ref-type="fig" rid="pone-0061217-g004">Figure 4</xref>, and in phyloseq's online tutorials <xref ref-type="bibr" rid="pone.0061217-The1">[39]</xref>. File S2 provides the complete R code for creating <xref ref-type="fig" rid="pone-0061217-g004">Figures 4</xref> and <xref ref-type="fig" rid="pone-0061217-g005">5</xref>. We have also included some additional examples of graphics created by plot_ordination (<xref ref-type="fig" rid="pone-0061217-g005">Figure 5</xref>). They emphasize different aspects of ordination results, and the best choice depends heavily on characteristics of the data and research questions. The provided code also demonstrates a custom modification to the ggplot2 graphic, in this case the addition of a two-dimensional density estimate to the “OTUs-only” plot (File S2).</p>
<fig id="pone-0061217-g004" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.g004</object-id><label>Figure 4</label><caption>
<title>Graphic functions of the phyloseq package.</title>
<p>The phyloseq class is an experiment-level data storage class defined by the phyloseq package for representing phylogenetic sequencing data. Most functions in the phyloseq package expect an instance of this class as their primary argument. See the phyloseq manual The Global Patterns <xref ref-type="bibr" rid="pone.0061217-Caporaso2">[47]</xref> and Enterotypes <xref ref-type="bibr" rid="pone.0061217-Arumugam1">[91]</xref> datasets are included with the phyloseq package. The Global Patterns data was preprocessed such that each sample was transformed to the same total read depth, and OTUs were trimmed that were not observed at least 3 times in 20% of samples or had a coefficient of variation ≤ 3.0 across all samples. For the plot_tree and plot_bar subplots, only the Bacteroidetes phylum is shown. Each subplot title indicates the plot function that produced it. Complete details for reproducing this figure are provided in File S2. All of these functions return a ggplot object that can be further customized/modified by tools in the ggplot2 package <xref ref-type="bibr" rid="pone.0061217-Wickham2">[57]</xref>. See additional descriptions of each function in the body text, and at the phyloseq homepage <xref ref-type="bibr" rid="pone.0061217-The1">[39]</xref>.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.g004" position="float" xlink:type="simple"/></fig><fig id="pone-0061217-g005" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.g005</object-id><label>Figure 5</label><caption>
<title>plot_ordination display methods included in phyloseq.</title>
<p>Each panel uses a “Bacteroidetes-only” subset of the preprocessed “Global Patterns” dataset that was also used in <xref ref-type="fig" rid="pone-0061217-g004">Figure 4</xref>. The coordinates are derived from an unconstrained correspondence analysis <xref ref-type="bibr" rid="pone.0061217-Greenacre2">[62]</xref>. Different panels illustrate different displays of the ordination results using the type argument to the plot_ordination function. (Top Left) Example of a samples-only display, with the “SampleType” mapped to the color aesthetic, and a filled-polygon layer to emphasize plot regions where sample types co-occur. (Top Left Insert) A “scree” plot of the eigenvalues associated with each axis, which indicates the proportion of total variability represented in each axis. (Top Right) Biplot representation in which samples and OTUs ordination results are overlaid. Clumps of OTUs appear to co-occur with different sample types, and some correlation with taxonomic phylum is also evident. (Middle) An OTUs-only plot that has been faceted (separated into panels) by class, with a two-dimensional density estimate overlain in blue. This view shows clearly a lack of association between the Sphingobacteria and Flavobacteria classes with fecal samples, which appear to be enriched in a subset of the Bacteroidia (relative to other OTUs in this Bacteroidetes-only dataset). Meanwhile, subsets of Bacteroidia appear to be enriched within multiple sample types. (Bottom) The “split” type for this graphic, in which both samples-only and OTUs-only plots are created, and shown side-by-side with one legend and shared vertical axis. Both the “biplot” and “split” options allow dual projections of both OTU- and sample-space.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.g005" position="float" xlink:type="simple"/></fig>
<list list-type="order"><list-item>
<p>plot_ordination. This is the main function for plotting the results of an ordination. It currently supports four different representations of the ordination results: samples-only, OTUs-only, “biplot” (combined) representation, and “split”. A demonstration of these different options is provided in <xref ref-type="fig" rid="pone-0061217-g005">Figure 5</xref>. As can be seen in these examples, the “biplot” and “split” options support dual projections of both OTU- and sample-space. Additional parameters easily map the respective sample variable or taxonomic rank to color, size, or shape aesthetics.</p>
</list-item><list-item>
<p>plot_heatmap. This is a special implementation of the ordination-organized heat map similar to the NeatMap package <xref ref-type="bibr" rid="pone.0061217-Rajaram1">[59]</xref>. Briefly, the abundance matrix is represented as a grid of colored tiles, with the color of the tiles mapped to the (usually transformed) abundance value. The ordering of the OTUs and sample indices in this representation is critical for discriminating any patterns. Traditionally, hierarchical clustering methods have been used for this organization; but, as Rajaram and Oono recently pointed out <xref ref-type="bibr" rid="pone.0061217-Rajaram1">[59]</xref>, this has the potential to misrepresent the data when deeply-branching elements are placed next to one another arbitrarily. Instead, the samples (and optionally, OTUs) are reordered based on their radial coordinate angle in the first two axes of an ordination. For the plot_heatmap function, any of the distances/ordinations supported by the distance and ordinate functions can be used, with the default being non-metric multidimensional scaling. Any arbitrary color scale can be selected, as well as any choice of numerical transformation for scaling the mapping of color shades to abundance.</p>
</list-item><list-item>
<p>plot_network. This function plots an igraph-class network <xref ref-type="bibr" rid="pone.0061217-Csardi1">[60]</xref> representing binary relationships between samples or OTUs. The network is calculated using the make_network function with phyloseq data as input and a desired ecological distance and threshold value. Unlike ordination, where most of the data structure is summarized by the relative position in two or more axes, the data is instead summarized by connections between samples (or OTUs) drawn with straight lines. Two samples are considered “connected” if the distance between them is less than a user-defined threshold. The relative position of points is optimized for the visual display of network properties, but is otherwise arbitrary. Any of the ecological distances supported by the distance function can be selected, and this can be a powerful representation of major clusters among samples or OTUs, provided the value of the distance threshold has been chosen carefully.</p>
</list-item><list-item>
<p>plot_tree. This function facilitates easy graphical rendering/investigation of the phylogenetic tree, with sample data overlaid. In some cases an annotated tree can be a powerful representation of an underlying evolutionary structure. The plot_tree function optionally places successive points next to the tips of the tree, indicating the samples in which each OTU was observed. These points can have their color, shape, and size aesthetics mapped to sample variables, revealing the correspondence of environmental variables on specific regions of the evolutionary tree. Standard ggplot2 customizations are supported, and this is, to our knowledge, the only function for ggplot2-based phylogenetic trees currently available in the CRAN/Bioconductor repositories. For phylogenetic sequencing of samples with large richness, some of the options in this function will be prohibitively slow to render or too dense to be interpretable, a drawback to summarizing phylogenetic sequencing data using trees. One suggestion is to either agglomerate or subset the data such that there are not more than 200 or so OTUs (tree tips) on a given plot, sometimes less depending on the complexity of the additional annotations being mapped to the tree. In many modern datasets 200 OTUs (or less) will be insufficient to summarize the entire dataset, in which case one or more of the other plot methods is suggested.</p>
</list-item><list-item>
<p>plot_bar. Although sometimes very complicated, a well-organized bar plot can be an effective graphical means for direct quantitative comparison of abundance values, and we note that statisticians generally discourage the use of pie-charts <xref ref-type="bibr" rid="pone.0061217-Tufte1">[61]</xref>. The plot_bar function takes as input a phyloseq dataset and a collection of arbitrary expressions for grouping the data based upon taxonomic rank and sample variables. The returned graphic represents each abundance value as the height of a rectangular block that is outlined by a thin black line and filled with the corresponding color of the user-specified sample or taxonomic variable, grey by default. Each of these OTU abundance rectangles corresponding to the same horizontal position (usually sample, or sample group) are stacked in order of abundance, such that the aggregate height of the stacked bar is also quantitatively informative.</p>
</list-item><list-item>
<p>plot_richness . This function creates plots of richness estimates of each sample in a phyloseq data object, allowing for horizontal grouping and color shading according to additional sample variables. Differences in richness (alpha diversity) between samples is often one of the first questions asked of phylogenetic sequencing data.</p>
</list-item></list>
</sec><sec id="s2g">
<title>Normalization and Standardization</title>
<p>In multivariate analyses such as PCA, large differences in variances between columns are corrected by standardizing each column; i.e. dividing each column by its standard deviation. Thus each column will have the same weight in the multivariate analysis. For OTU abundance tables, such a procedure is inappropriate as the disparities in column sums can be 100-fold. <xref ref-type="sec" rid="s2">Methods</xref> based on chi-squared distances rather than variances deal with this by comparing weighted column profiles <xref ref-type="bibr" rid="pone.0061217-Greenacre2">[62]</xref>, computed as relative abundances for each OTU within a column, with the overall column sum retained as a weighting factor. However, chi-square distances are sums of squares and can be overly sensitive to outliers and sequencing “jackpot” effects such as those occurring in pyrosequencing data <xref ref-type="bibr" rid="pone.0061217-Pinto1">[63]</xref>. Bray-Curtis distances can be a useful alternative, as it is based on the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0061217.e004" xlink:type="simple"/></inline-formula> distance between profiles, as long as the differences in actual column sums are also accounted for in the final study. The other approach to the problem of disparities between column sums has been to subsample the over-abundant columns down to the same number as the smaller ones. However this results in a loss of information, rarely an optimal procedure in statistical contexts. This subsampling procedure is inspired by the popular idea of rarefaction in coverage studies first invented by Sanders <xref ref-type="bibr" rid="pone.0061217-Sanders1">[64]</xref>, but has yet to be proved beneficial for all microbial community structures. The parallels between gene expression microarray analyses and microbial abundance analyses was mentioned in <xref ref-type="bibr" rid="pone.0061217-Holmes1">[65]</xref>, which proposed several expression-inspired strategies for robustifying abundance measurements. The main points were that rankings and thresholding are important in the presence of noise and high variability in sequence depths. As in gene expression analysis filtering the OTUs is beneficial, especially in the latter multiple testing adjustments. The phyloseq package enables easy filtering and rank transformations in the same vein as robust multi-array averaging (rma) <xref ref-type="bibr" rid="pone.0061217-Allison1">[66]</xref>. We provide further details in (McMurdie and Holmes, <xref ref-type="bibr" rid="pone.0061217-Holmes2">[67]</xref>).</p>
</sec><sec id="s2h">
<title>Confirmatory Analyses</title>
<p>Although useful for exploring and summarizing microbiome data, many of the graphics and ordination methods discussed here are not formal tests of any particular hypothesis. The most common framework for testing in microbiome studies is the comparisons of samples from different categories (e.g. healthy and obese; control and treated; different environments). Standard test statistics include the t-test, the paired permutation t-test, and ANOVA type tests based on F or pseudo-F statistics. However, microbiome data have two particularities. First, the raw abundance counts are never normally distributed, so the preferred methods are nonparametric. Second, there is contiguous information available about the relationships between OTUs, as well as for variables measured on the samples, so testing is sometimes more elaborate than a two-sample test. The hypergeometric test, also known as Fisher's exact test, is used in cases when we have a test statistic for each of the different OTUs. The goal is to confirm that a certain property of these significant OTUs is overrepresented compared to the general population of OTUs, often called “the universe”. For instance in Holmes et al <xref ref-type="bibr" rid="pone.0061217-Holmes1">[65]</xref> and Nelson et al <xref ref-type="bibr" rid="pone.0061217-Nelson1">[68]</xref> several phyla were shown to be significantly over-abundant in IBS rats as compared to healthy controls using this hypergeometric test.</p>
<p>An organizing principle in many nonparametric testing protocols is that the repetition of an analysis multiple times enables the user to control for multiple testing, or to evaluate the quality of estimators or the optimal values of tuning parameters. Modern confirmatory analyses currently depend on these repeated analyses under various data perturbation schemes, of which resampling, permutations, and Monte Carlo simulations are the most common. For instance the bootstrap uses many thousands of analyses of resampled data to address problems such as statistical stability or bias estimation <xref ref-type="bibr" rid="pone.0061217-Efron1">[69]</xref>, and can even provide confidence regions <xref ref-type="bibr" rid="pone.0061217-Efron1">[69]</xref> for nonstandard parameters, such as phylogenetic trees <xref ref-type="bibr" rid="pone.0061217-Holmes3">[70]</xref>. Repeating analyses on permuted data can allow for control of the probability of encountering 1 or more false positives (falsely rejected nulls) among your group of simultaneous hypotheses, also called the Family Wise Error Rate (FWER). For instance, Westfall and Young's permutation-based <bold>minP</bold> procedure controls the FWER <xref ref-type="bibr" rid="pone.0061217-Westfall1">[71]</xref> and is implemented within the multtest package <xref ref-type="bibr" rid="pone.0061217-Pollard1">[72]</xref>. The phyloseq package interfaces with minP in multtest through a wrapper function, called mt. In the following example code we use the mt wrapper to control the FWER while simultaneously testing whether each OTU correlates with the “Enterotypes” classification of the samples. Note that we first remove samples that were not assigned an enterotype by the original authors (<xref ref-type="table" rid="pone-0061217-t001">Table 1</xref>).<disp-formula id="pone.0061217.e005"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pone.0061217.e005" xlink:type="simple"/></disp-formula></p>
<table-wrap id="pone-0061217-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0061217.t001</object-id><label>Table 1</label><caption>
<title>Results from the mt function on the “Enterotypes” dataset.</title>
</caption><alternatives><graphic id="pone-0061217-t001-1" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0061217.t001" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">genera</td>
<td align="left" rowspan="1" colspan="1">index</td>
<td align="left" rowspan="1" colspan="1">test stat</td>
<td align="left" rowspan="1" colspan="1">raw-p</td>
<td align="left" rowspan="1" colspan="1">adj-p</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Prevotella</td>
<td align="left" rowspan="1" colspan="1">207</td>
<td align="left" rowspan="1" colspan="1">344.73</td>
<td align="left" rowspan="1" colspan="1">0.0001</td>
<td align="left" rowspan="1" colspan="1">0.0158</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Bacteroides</td>
<td align="left" rowspan="1" colspan="1">203</td>
<td align="left" rowspan="1" colspan="1">85.01</td>
<td align="left" rowspan="1" colspan="1">0.0001</td>
<td align="left" rowspan="1" colspan="1">0.0158</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Blautia</td>
<td align="left" rowspan="1" colspan="1">187</td>
<td align="left" rowspan="1" colspan="1">19.52</td>
<td align="left" rowspan="1" colspan="1">0.0001</td>
<td align="left" rowspan="1" colspan="1">0.0158</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Bryantella</td>
<td align="left" rowspan="1" colspan="1">503</td>
<td align="left" rowspan="1" colspan="1">16.38</td>
<td align="left" rowspan="1" colspan="1">0.0001</td>
<td align="left" rowspan="1" colspan="1">0.0158</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Parabacteroides</td>
<td align="left" rowspan="1" colspan="1">205</td>
<td align="left" rowspan="1" colspan="1">12.89</td>
<td align="left" rowspan="1" colspan="1">0.0001</td>
<td align="left" rowspan="1" colspan="1">0.0158</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Alistipes</td>
<td align="left" rowspan="1" colspan="1">208</td>
<td align="left" rowspan="1" colspan="1">8.71</td>
<td align="left" rowspan="1" colspan="1">0.0002</td>
<td align="left" rowspan="1" colspan="1">0.0301</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Bifidobacterium</td>
<td align="left" rowspan="1" colspan="1">240</td>
<td align="left" rowspan="1" colspan="1">9.29</td>
<td align="left" rowspan="1" colspan="1">0.0004</td>
<td align="left" rowspan="1" colspan="1">0.0560</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Holdemania</td>
<td align="left" rowspan="1" colspan="1">201</td>
<td align="left" rowspan="1" colspan="1">7.64</td>
<td align="left" rowspan="1" colspan="1">0.0009</td>
<td align="left" rowspan="1" colspan="1">0.1146</td>
</tr>
</tbody>
</table>
</alternatives><table-wrap-foot><fn id="nt101"><label/><p>The original “Enterotypes” dataset <xref ref-type="bibr" rid="pone.0061217-Arumugam1">[91]</xref> (included in phyloseq) with OTU-wise testing of enterotype groups. Tests are a permutation-adjusted F-test using the Family-Wise Error Rate (FWER) as correction for multiple inferences (“adj-p” column). Not surprisingly, <italic>Prevotella</italic> and <italic>Bacteroides</italic> top the list, as they were major components of the “Enterotypes” classification described in the original article <xref ref-type="bibr" rid="pone.0061217-Arumugam1">[91]</xref>.</p></fn></table-wrap-foot></table-wrap></sec></sec><sec id="s3">
<title>Results and Discussion</title>
<p>As the complexity and sophistication of phylogenetic sequencing experiments continues to increase, it is clear that a “one-analysis fits all” approach is not sufficient. While it is often useful and convenient to have common analyses coupled within the application that decodes the sequences and clusters OTUs, we posit that a separate set of flexible open-source analytical tools is also needed that can be reproduced consistently by peers, and easily applied to new datasets and data sources. It should include a large library of statistical functions, and be independent of the choice of OTU-clustering method or sequencing technology. The phyloseq package helps satisfy this need by reducing the effort necessary to analyze OTU-clustered phylogenetic sequencing data via the R language and interactive computing environment.</p>
<sec id="s3a">
<title>Reproducible Research and Sharing</title>
<p>In exploratory statistical work it is easy to produce biased results <xref ref-type="bibr" rid="pone.0061217-Ioannidis1">[73]</xref> through poorly chosen metrics or tests, a failure to properly control for multiple inferences, undisclosed data “pruning”, and probably many other means. Although not commonly required <xref ref-type="bibr" rid="pone.0061217-Merali1">[74]</xref>–<xref ref-type="bibr" rid="pone.0061217-Ince1">[76]</xref>, an important defense against biased (or poorly-supported) findings is a higher standard for reproducibility in published research <xref ref-type="bibr" rid="pone.0061217-Carey1">[77]</xref>, in which journal articles are accompanied by sufficient data and software such that all presented analyses, tables, and figures can be reproduced exactly and with minimal effort <xref ref-type="bibr" rid="pone.0061217-Peng1">[75]</xref>. In this context of highly-parallel phylogenetic-sequencing experiments, reproducible research can be partially facilitated by emerging standards for experimental design <xref ref-type="bibr" rid="pone.0061217-Knight1">[78]</xref> and file format <xref ref-type="bibr" rid="pone.0061217-McDonald1">[37]</xref>. Virtual machine image and cloud-deployed “pipeline” analyses <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, <xref ref-type="bibr" rid="pone.0061217-Angiuoli1">[15]</xref>, <xref ref-type="bibr" rid="pone.0061217-Meyer1">[19]</xref> can further increase accessibility of analyses by mitigating the need for expensive computing hardware while also avoiding complicated installation procedures. However, the use of publicly available “pipeline” tools does not fully meet the reproducibility standard unless accompanied with the complete code and data used in the analysis being published <xref ref-type="bibr" rid="pone.0061217-Peng1">[75]</xref>. This is especially important when considering the many choices that are involved in decoding, OTU-clustering, and preprocessing; as well as the many varied approaches to incorporating sample covariates and performing multivariate analyses on complex data. The recent release of the HMP data and multiple articles on the results from their analyses underscore this fact. Thresholding and noise filtering were done independently by each team, but no overall robustness study was performed <xref ref-type="bibr" rid="pone.0061217-Human1">[79]</xref>. Changes early in the analysis pipeline could have downstream effects that are now prohibitively difficult or impossible to evaluate. Generally speaking, the preprocessing of OTU abundance data through filtering, normalizing, centering, shrinking, and other transformations is a common practice and necessary for analysis <xref ref-type="bibr" rid="pone.0061217-Allison1">[66]</xref>, but varies widely among researchers and is often difficult to reproduce. This is particularly true when the preprocessing transformations are the result of “manual” adjustments in a spreadsheet, custom code/script that is not included in the publication, or random subsampling (“rarefying” to even sequencing effort) with no reported seed. A related example is the (often not-so) reproducible choice of tuning parameters and perturbation-based statistical validation procedures, allowing for the easy testing of alternatives and robustness of results. To a large extent this revisits many of the same issues of reproducible research <xref ref-type="bibr" rid="pone.0061217-Donoho1">[80]</xref>–<xref ref-type="bibr" rid="pone.0061217-Gentleman2">[82]</xref> that have been addressed over the last decade for the analysis of microarray data <xref ref-type="bibr" rid="pone.0061217-Allison1">[66]</xref>, and for which there are many proven tools already available in Bioconductor/R. The emphasis of preprocessing tools in phyloseq is intended to decrease the extent to which these steps constitute opaque and idiosyncratic efforts by investigators, while making the results of different studies more comparable.</p>
<p>One of the goals of the phyloseq project is to help close the gap in reproducible research that presently exists between pipeline results and the additional analyses required by investigators. This can be achieved when phyloseq is used (possibly with other R packages) in conjunction with documentation tools such as Sweave <xref ref-type="bibr" rid="pone.0061217-Hardle1">[32]</xref>, knitr <xref ref-type="bibr" rid="pone.0061217-Xie1">[33]</xref>, iPython <xref ref-type="bibr" rid="pone.0061217-Prez1">[83]</xref> Notebook invoking the rmagic extension, or “ R flavored markdown” (RFM) <xref ref-type="bibr" rid="pone.0061217-Allaire1">[84]</xref>. The Sweave-format approach is part of the reproducible research standards strongly encouraged by the journal <italic>Biostatistics</italic> <xref ref-type="bibr" rid="pone.0061217-Peng2">[81]</xref>, as well as many disciplines related to statistics and bioinformatics <xref ref-type="bibr" rid="pone.0061217-Carey1">[77]</xref>, <xref ref-type="bibr" rid="pone.0061217-Gentleman3">[85]</xref>. The recently-described RFM format and iPython Notebook can also work very well for cases where a web-browser is a satisfactory documentation delivery medium, with RFM being our preferred source format for publishing reproducible online tutorials with embedded code and figures (HTML5) <xref ref-type="bibr" rid="pone.0061217-The1">[39]</xref>, <xref ref-type="bibr" rid="pone.0061217-The2">[86]</xref>. We emphasize that the benefits of reproducibility are not contingent on “pretty” code <xref ref-type="bibr" rid="pone.0061217-Barnes1">[87]</xref>, and we encourage researchers in the field to make their code available even if they feel insecure about its programmatic elegance. As an illustrative example, we have made available the Sweave (.Rnw) and supporting files required to completely reproduce this article, including especially the complete source as an RFM file (.Rmd) with its associated output HTML file, both of which provide the preprocessing steps and graphics commands needed to exactly reproduce each figure (File S2). We have also published a GitHub repository dedicated to reproducible demonstrations of analyses with phyloseq <xref ref-type="bibr" rid="pone.0061217-The2">[86]</xref>.</p>
</sec><sec id="s3b">
<title>Extending phyloseq</title>
<p>It is important to note that the new phyloseq-class is a significant departure from the originally-proposed phyloseq-class structure <xref ref-type="bibr" rid="pone.0061217-McMurdie1">[31]</xref>, which used nested multiple inheritance and a naming convention. It was a valid approach in principle, but was an overly complex approach for the goal of representing a phylogenetic sequencing experiment as a single object. The updated phyloseq-class is simple to extend for developers and easy to explain to users (<xref ref-type="fig" rid="pone-0061217-g003">Figure 3</xref>). In general, the downstream analysis and plotting functions that might operate on an instance of the phyloseq-class do not need to (re)perform common validity checks because these checks are consolidated as part of the phyloseq-constructor method.</p>
<p>Analysis tools available in R but not explicitly wrapped in phyloseq are nevertheless available to users and developers via accessors and other data infrastructure tools. This leverages the fact that phyloseq data components are based on standard R data classes and easily used in other package settings in R. For example, we have included example code that illustrates the use of the bioenv function from the vegan package, starting with data represented by the phyloseq-class (See File S2 for code, and the phyloseq demo <xref ref-type="bibr" rid="pone.0061217-The2">[86]</xref>). Similarly, as an open-source package in an open language/framework (R), phyloseq can be easily included at the relevant steps in pipelines, workbenches, and GUIs now under active development (E.g. ClovR <xref ref-type="bibr" rid="pone.0061217-Angiuoli1">[15]</xref>, MG-RAST <xref ref-type="bibr" rid="pone.0061217-Meyer1">[19]</xref>, QIIME <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, mcaGUI <xref ref-type="bibr" rid="pone.0061217-Copeland1">[88]</xref>). This represents a means for investigators with limited programming literacy to still benefit from some of the tools included in, or facilitated by, phyloseq.</p>
</sec></sec><sec id="s4">
<title>Conclusions</title>
<p>The phyloseq project is a new open-source software tool for statistical analysis of phylogenetic sequencing data within the R programming language and environment. The tools in phyloseq make it easy to read the data output of several of the most common OTU clustering pipelines, and also represents this data in a unified, integrated form amenable to many modern analysis methods. With this integrated representation of the data it is easy to use supervised methods — such as canonical correspondence analysis, discriminant correspondence analysis, sparse linear discriminant analysis, etc. — to explain clinical or environmental response variables. We hope that this will provide a gateway for users to take their analyses towards more robust nonparametric alternatives to classical least squares methods, and allow them to interact graphically with their data more easily and efficiently. By leveraging existing R infrastructure for reproducible research, the phyloseq project also enables reproducible preprocessing, analysis, and publication-quality graphics production — such that it is easy to document, share, and modify analyses of phylogenetic sequencing data. The phyloseq package is released on Bioconductor <xref ref-type="bibr" rid="pone.0061217-Gentleman1">[34]</xref> and developed collaboratively on GitHub <xref ref-type="bibr" rid="pone.0061217-The1">[39]</xref>.</p>
</sec><sec id="s5">
<title>Availability and Requirements</title>
<p><bold>Project name:</bold> phyloseq</p>
<p><bold>Project Stable Release:</bold> <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/phyloseq.html" xlink:type="simple">http://www.bioconductor.org/packages/release/bioc/html/phyloseq.html</ext-link></p>
<p><bold>Project Home Page:</bold> <ext-link ext-link-type="uri" xlink:href="http://joey711.github.com/phyloseq/" xlink:type="simple">http://joey711.github.com/phyloseq/</ext-link></p>
<p><bold>Project Issue Tracker:</bold> <ext-link ext-link-type="uri" xlink:href="https://github.com/joey711/phyloseq/issues" xlink:type="simple">https://github.com/joey711/phyloseq/issues</ext-link></p>
<p><bold>Project Demo Page:</bold> <ext-link ext-link-type="uri" xlink:href="http://joey711.github.com/phyloseq-demo/" xlink:type="simple">http://joey711.github.com/phyloseq-demo/</ext-link></p>
<p><bold>Operating System(s):</bold> Platform Independent</p>
<p><bold>Programming Language(s):</bold> R</p>
<p><bold>Other Requirements:</bold> R, R packages (ade4, ape, Biostrings, foreach, ggplot2, igraph0, multtest, picante, plyr, reshape, RJSONIO, scales, vegan)</p>
<p><bold>License:</bold> AGPL-3</p>
</sec><sec id="s6">
<title>Supporting Information</title>
<supplementary-material id="pone.0061217.s001" mimetype="application/pdf" xlink:href="info:doi/10.1371/journal.pone.0061217.s001" position="float" xlink:type="simple"><label>File S1</label><caption>
<p><bold>Summary of comparison between phyloseq and currently available software.</bold> This PDF file contains a table summarizing a comparison of supported capabilities between phyloseq and QIIME <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, mothur <xref ref-type="bibr" rid="pone.0061217-Schloss1">[12]</xref>, and the pair of packages OTUbase <xref ref-type="bibr" rid="pone.0061217-Beck1">[35]</xref> and mcaGUI <xref ref-type="bibr" rid="pone.0061217-Copeland1">[88]</xref>. A “+” or “–” indicates that the capability is not directly supported, respectively. A symbol or word instead of “+” implies that the capability is supported, but with an extra caveat or detail, further defined below the table, if necessary. This is not a comprehensive summary of the capabilities of each packages, but rather the capabilities of relevance to this article. The abbreviations CA, DCA, RDA, and DPCoA stand for the ordination methods correspondence analysis, detrended correspondence analysis, redundancy analysis, and double principal coordinates analysis, respectively. Note that in some cases the capabilities deemed “+” in this table are only supported for amplicon sequencing based data, sometimes from a specific sequencing platform and with the 16S rRNA gene as target. However, the phyloseq package is implemented at a stage in the analysis process that can be more generally applied to any phylogenetic sequencing, including non-standard amplicon targets, shotgun metagenome sequencing, etc.</p>
<p>(PDF)</p>
</caption></supplementary-material><supplementary-material id="pone.0061217.s002" mimetype="application/zip" xlink:href="info:doi/10.1371/journal.pone.0061217.s002" position="float" xlink:type="simple"><label>File S2</label><caption>
<p><bold>Source materials for reproducing this manuscript.</bold> This is a compressed .zip directory containing the main source file in Sweave .Rnw format <xref ref-type="bibr" rid="pone.0061217-Hardle1">[32]</xref>, as well as the additional files necessary to completely recreate the original manuscript submitted to PLoS ONE. For the uninitiated, Sweave is a R/LaTeX2e interleaved hybrid language format <xref ref-type="bibr" rid="pone.0061217-Hardle1">[32]</xref> that allows advanced typesetting description to accompany R code and its output (including graphics). Also included is the RFM source file that was used to create <xref ref-type="fig" rid="pone-0061217-g004">Figures 4</xref> and <xref ref-type="fig" rid="pone-0061217-g005">5</xref>, and its accompanying HTML output that includes additional documentation details, links, and intermediate graphics. This latter file is “sourced” (re-run) by the Sweave commands if any of the expected output files are missing. This supporting information zip file also includes R code (at the end of the RFM/HTML files) that demonstrates how to use a phyloseq data object as an argument to other R functions. In this particular example, the bioenv function from the vegan package <xref ref-type="bibr" rid="pone.0061217-Oksanen1">[92]</xref> is demonstrated.</p>
<p>(ZIP)</p>
</caption></supplementary-material></sec></body>
<back>
<ack>
<p>We would like to thank Martin Morgan and Valerie Obenchain at Bioconductor for their useful suggestions regarding the architecture and organization of phyloseq. We would also like to thank the developers of the open source packages on which phyloseq depends, in particular Rob Knight and his lab for QIIME <xref ref-type="bibr" rid="pone.0061217-Caporaso1">[11]</xref>, Hadley Wickham for the ggplot2 <xref ref-type="bibr" rid="pone.0061217-Wickham2">[57]</xref>, reshape <xref ref-type="bibr" rid="pone.0061217-Wickham3">[89]</xref>, and plyr <xref ref-type="bibr" rid="pone.0061217-Wickham4">[90]</xref> packages, as well as the Bioconductor and R teams <xref ref-type="bibr" rid="pone.0061217-R1">[24]</xref>, <xref ref-type="bibr" rid="pone.0061217-Gentleman1">[34]</xref>. Thanks also to RStudio and GitHub for immensely useful and free development applications. Julia Fukuyama provided prototype code for the DPCoA wrapper. Gregory Jordan provided several core functions that make a ggplot2-based phylogenetic tree plot possible, borrowed with permission from his “ggphylo” repository. Scott Chamberlain provided useful example code for a ggplot2-based network plot in his “gggraph” repository. Julia Fukuyama, Sam Pimentel, Kris Sankaran and Dustin Janatpour provided early feedback on the phyloseq package. Les Dethlefsen, Diana Proctor and other members of the David Relman Lab provided ongoing feedback and example data. Alfred Spormann, Tyrrell Nelson and Tim Meyer also provided early versions of an illustrative data set. We also thank the communities at stackoverflow.com for useful advice during development of phyloseq.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0061217-Metzker1"><label>1</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Metzker</surname><given-names>ML</given-names></name> (<year>2010</year>) <article-title>Sequencing technologies - the next generation</article-title>. <source>Nature Reviews Genetics</source> <volume>11</volume>: <fpage>31</fpage>–<lpage>46</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Hamady1"><label>2</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hamady</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Walker</surname><given-names>JJ</given-names></name>, <name name-style="western"><surname>Harris</surname><given-names>JK</given-names></name>, <name name-style="western"><surname>Gold</surname><given-names>NJ</given-names></name>, <name name-style="western"><surname>Knight</surname><given-names>R</given-names></name> (<year>2008</year>) <article-title>Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex</article-title>. <source>Nature Methods</source> <volume>5</volume>: <fpage>235</fpage>–<lpage>237</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Pace1"><label>3</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pace</surname><given-names>NR</given-names></name> (<year>1997</year>) <article-title>A molecular view of microbial diversity and the biosphere</article-title>. <source>Science</source> <volume>276</volume>: <fpage>734</fpage>–<lpage>740</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Liu1"><label>4</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Liu</surname><given-names>Z</given-names></name>, <name name-style="western"><surname>DeSantis</surname><given-names>TZ</given-names></name>, <name name-style="western"><surname>Andersen</surname><given-names>GL</given-names></name>, <name name-style="western"><surname>Knight</surname><given-names>R</given-names></name> (<year>2008</year>) <article-title>Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers</article-title>. <source>Nucleic Acids Research</source> <volume>36</volume>: <fpage>e120</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-DeSantis1"><label>5</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>DeSantis</surname><given-names>TZ</given-names></name>, <name name-style="western"><surname>Hugenholtz</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Keller</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Brodie</surname><given-names>EL</given-names></name>, <name name-style="western"><surname>Larsen</surname><given-names>N</given-names></name>, <etal>et al</etal>. (<year>2006</year>) <article-title>NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes</article-title>. <source>Nucleic Acids Research</source> <volume>34</volume>: <fpage>W394</fpage>–<lpage>9</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-DeSantis2"><label>6</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>DeSantis</surname><given-names>TZ</given-names></name>, <name name-style="western"><surname>Hugenholtz</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Larsen</surname><given-names>N</given-names></name>, <name name-style="western"><surname>Rojas</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Brodie</surname><given-names>EL</given-names></name>, <etal>et al</etal>. (<year>2006</year>) <article-title>Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB</article-title>. <source>Applied and Environ-mental Microbiology</source> <volume>72</volume>: <fpage>5069</fpage>–<lpage>5072</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Cole1"><label>7</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Cole</surname><given-names>JR</given-names></name>, <name name-style="western"><surname>Wang</surname><given-names>Q</given-names></name>, <name name-style="western"><surname>Cardenas</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Fish</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Chai</surname><given-names>B</given-names></name>, <etal>et al</etal>. (<year>2009</year>) <article-title>The Ribosomal Database Project: improved alignments and new tools for rRNA analysis</article-title>. <source>Nucleic Acids Research</source> <volume>37</volume>: <fpage>D141</fpage>–<lpage>5</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Pruesse1"><label>8</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pruesse</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Quast</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Knittel</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Fuchs</surname><given-names>BM</given-names></name>, <name name-style="western"><surname>Ludwig</surname><given-names>W</given-names></name>, <etal>et al</etal>. (<year>2007</year>) <article-title>SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB</article-title>. <source>Nucleic Acids Research</source> <volume>35</volume>: <fpage>7188</fpage>–<lpage>7196</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Li1"><label>9</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>W</given-names></name>, <name name-style="western"><surname>Godzik</surname><given-names>A</given-names></name> (<year>2006</year>) <article-title>CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences</article-title>. <source>Bioinformatics</source> <volume>22</volume>: <fpage>1658</fpage>–<lpage>1659</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Huang1"><label>10</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Huang</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Niu</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Gao</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Fu</surname><given-names>L</given-names></name>, <name name-style="western"><surname>Li</surname><given-names>W</given-names></name> (<year>2010</year>) <article-title>CD-HIT Suite: a web server for clustering and comparing biological sequences</article-title>. <source>Bioinformatics</source> <volume>26</volume>: <fpage>680</fpage>–<lpage>682</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Caporaso1"><label>11</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Caporaso</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Kuczynski</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Stombaugh</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Bittinger</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Bushman</surname><given-names>F</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>QIIME allows analysis of high-throughput community sequencing data</article-title>. <source>Nature methods</source> <volume>7</volume>: <fpage>335</fpage>–<lpage>336</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Schloss1"><label>12</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schloss</surname><given-names>PD</given-names></name>, <name name-style="western"><surname>Westcott</surname><given-names>SL</given-names></name>, <name name-style="western"><surname>Ryabin</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Hall</surname><given-names>JR</given-names></name>, <name name-style="western"><surname>Hartmann</surname><given-names>M</given-names></name>, <etal>et al</etal>. (<year>2009</year>) <article-title>Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities</article-title>. <source>Applied and Environmental Microbiology</source> <volume>75</volume>: <fpage>7537</fpage>–<lpage>7541</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Giongo1"><label>13</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Giongo</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Crabb</surname><given-names>DB</given-names></name>, <name name-style="western"><surname>Davis-Richardson</surname><given-names>AG</given-names></name>, <name name-style="western"><surname>Chauliac</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Mobberley</surname><given-names>JM</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>PANGEA: pipeline for analysis of next generation amplicons</article-title>. <source>The ISME Journal</source> <volume>4</volume>: <fpage>852</fpage>–<lpage>861</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Kunin1"><label>14</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kunin</surname><given-names>V</given-names></name> (<year>2010</year>) <article-title>PyroTagger: A fast, accurate pipeline for analysis of rRNA amplicon pyrosequence data</article-title>. <source>The Open Journal</source></mixed-citation>
</ref>
<ref id="pone.0061217-Angiuoli1"><label>15</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Angiuoli</surname><given-names>SV</given-names></name>, <name name-style="western"><surname>Matalka</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Gussman</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Galens</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Vangala</surname><given-names>M</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing</article-title>. <source>BMC Bioinformatics</source> <volume>12</volume>: <fpage>356</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-8th1"><label>16</label>
<mixed-citation publication-type="journal" xlink:type="simple"><collab xlink:type="simple">8th Annual Biotechnology and Bioinformatics Symposium</collab> (<year>2011</year>) <article-title>The Genboree Microbiome Toolset and the Analysis of 16S rRNA Microbial Sequences. biotconf.org</article-title>.</mixed-citation>
</ref>
<ref id="pone.0061217-QIIME1"><label>17</label>
<mixed-citation publication-type="other" xlink:type="simple">QIIME EC2 image documentation. Available: <ext-link ext-link-type="uri" xlink:href="http://qiime.org/svn_documentation/tutorials/working_with_aws.html" xlink:type="simple">http://qiime.org/svn_documentation/tutorials/working_with_aws.html</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-University1"><label>18</label>
<mixed-citation publication-type="other" xlink:type="simple">University of Colorado Boulder Knight Lab. n3phele bioinformatics in the cloud. Available: <ext-link ext-link-type="uri" xlink:href="http://www.n3phele.com/" xlink:type="simple">http://www.n3phele.com/</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-Meyer1"><label>19</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Meyer</surname><given-names>F</given-names></name>, <name name-style="western"><surname>Paarmann</surname><given-names>D</given-names></name>, <name name-style="western"><surname>D'Souza</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Olson</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Glass</surname><given-names>EM</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes</article-title>. <source>BMC Bioinformatics</source> <volume>9</volume>: <fpage>386</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Venter1"><label>20</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Venter</surname><given-names>JC</given-names></name>, <name name-style="western"><surname>Adams</surname><given-names>MD</given-names></name>, <name name-style="western"><surname>Sutton</surname><given-names>GG</given-names></name>, <name name-style="western"><surname>Kerlavage</surname><given-names>AR</given-names></name>, <name name-style="western"><surname>Smith</surname><given-names>HO</given-names></name>, <etal>et al</etal>. (<year>1998</year>) <article-title>Shotgun sequencing of the human genome</article-title>. <source>Science</source> <volume>280</volume>: <fpage>1540</fpage>–<lpage>1542</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Fleischmann1"><label>21</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fleischmann</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Adams</surname><given-names>M</given-names></name>, <name name-style="western"><surname>White</surname><given-names>O</given-names></name>, <name name-style="western"><surname>Clayton</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Kirkness</surname><given-names>E</given-names></name>, <etal>et al</etal>. (<year>1995</year>) <article-title>Whole-genome random sequencing and assembly of Haemophilus inuenzae Rd</article-title>. <source>Science</source> <volume>269</volume>: <fpage>496</fpage>–<lpage>512</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Venter2"><label>22</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Venter</surname><given-names>JC</given-names></name>, <name name-style="western"><surname>Remington</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Heidelberg</surname><given-names>JF</given-names></name>, <name name-style="western"><surname>Halpern</surname><given-names>AL</given-names></name>, <name name-style="western"><surname>Rusch</surname><given-names>D</given-names></name>, <etal>et al</etal>. (<year>2004</year>) <article-title>Environmental genome shotgun sequencing of the sargasso sea</article-title>. <source>Science</source> <volume>304</volume>: <fpage>66</fpage>–<lpage>74</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Sharpton1"><label>23</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Sharpton</surname><given-names>TJ</given-names></name>, <name name-style="western"><surname>Riesenfeld</surname><given-names>SJ</given-names></name>, <name name-style="western"><surname>Kembel</surname><given-names>SW</given-names></name>, <name name-style="western"><surname>Ladau</surname><given-names>J</given-names></name>, <name name-style="western"><surname>O'Dwyer</surname><given-names>JP</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data</article-title>. <source>PLoS computational biology</source> <volume>7</volume>: <fpage>e1001061</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-R1"><label>24</label>
<mixed-citation publication-type="book" xlink:type="simple">R Development Core Team (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.</mixed-citation>
</ref>
<ref id="pone.0061217-Stroustrup1"><label>25</label>
<mixed-citation publication-type="book" xlink:type="simple">Stroustrup B (2000) The C++ programming language. ISBN 0201700735. Addison-Wesley Pro-fessional, 3rd edition.</mixed-citation>
</ref>
<ref id="pone.0061217-Chambers1"><label>26</label>
<mixed-citation publication-type="book" xlink:type="simple">Chambers J (2008) Software for data analysis: programming with R. Springer Verlag.</mixed-citation>
</ref>
<ref id="pone.0061217-Simpson1"><label>27</label>
<mixed-citation publication-type="other" xlink:type="simple">Simpson GL. CRAN Task View: Analysis of Ecological and Environmental Data. Available: <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/views/Environmetrics.html" xlink:type="simple">http://cran.r-project.org/web/views/Environmetrics.html</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-Chakerian1"><label>28</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Chakerian</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name> (<year>2010</year>) <article-title>distory: Distances between trees</article-title>.</mixed-citation>
</ref>
<ref id="pone.0061217-Schliep1"><label>29</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schliep</surname><given-names>KP</given-names></name> (<year>2011</year>) <article-title>phangorn: phylogenetic analysis in R</article-title>. <source>Bioinformatics</source> <volume>27</volume>: <fpage>592</fpage>–<lpage>593</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Kembel1"><label>30</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kembel</surname><given-names>SW</given-names></name>, <name name-style="western"><surname>Cowan</surname><given-names>PD</given-names></name>, <name name-style="western"><surname>Helmus</surname><given-names>MR</given-names></name>, <name name-style="western"><surname>Cornwell</surname><given-names>WK</given-names></name>, <name name-style="western"><surname>Morlon</surname><given-names>H</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>Picante: R tools for integrating phylogenies and ecology</article-title>. <source>Bioinformatics</source> <volume>26</volume>: <fpage>1463</fpage>–<lpage>1464</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-McMurdie1"><label>31</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>McMurdie</surname><given-names>PJ</given-names></name>, <name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name> (<year>2012</year>) <article-title>phyloseq: A Bioconductor Package for Handling and Analysis of High-Throughput Phylogenetic Sequence Data</article-title>. <source>Pacific Symposium on Biocomputing</source> <volume>17</volume>: <fpage>235</fpage>–<lpage>246</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Hardle1"><label>32</label>
<mixed-citation publication-type="book" xlink:type="simple">Hardle W, Ronz B, editors (2002) Sweave. Dynamic generation of statistical reports using literate data analysis. Compstat 2002, Proceedings in Computational Statistics.</mixed-citation>
</ref>
<ref id="pone.0061217-Xie1"><label>33</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Xie</surname><given-names>Y</given-names></name> (<year>2012</year>) <article-title>knitr: A general-purpose package for dynamic report generation in R</article-title>. <source>R package version 0.8</source></mixed-citation>
</ref>
<ref id="pone.0061217-Gentleman1"><label>34</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gentleman</surname><given-names>RC</given-names></name>, <name name-style="western"><surname>Carey</surname><given-names>VJ</given-names></name>, <name name-style="western"><surname>Bates</surname><given-names>DM</given-names></name>, <name name-style="western"><surname>Bolstad</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Dettling</surname><given-names>M</given-names></name>, <etal>et al</etal>. (<year>2004</year>) <article-title>Bioconductor: open software development for computational biology and bioinformatics</article-title>. <source>Genome Biology</source> <volume>5</volume>: <fpage>R80</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Beck1"><label>35</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Beck</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Settles</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Foster</surname><given-names>JA</given-names></name> (<year>2011</year>) <article-title>OTUbase: an R infrastructure package for operational taxo-nomic unit data</article-title>. <source>Bioinformatics</source></mixed-citation>
</ref>
<ref id="pone.0061217-OTUbase1"><label>36</label>
<mixed-citation publication-type="other" xlink:type="simple">OTUbase Bioconductor Release Page. (2012) Available: <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html" xlink:type="simple">http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-McDonald1"><label>37</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>McDonald</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Clemente</surname><given-names>JC</given-names></name>, <name name-style="western"><surname>Kuczynski</surname><given-names>J</given-names></name> (<year>2012</year>) <article-title>The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome</article-title>. <source>Giga Science</source></mixed-citation>
</ref>
<ref id="pone.0061217-McMurdie2"><label>38</label>
<mixed-citation publication-type="other" xlink:type="simple">McMurdie PJ, Holmes S. Package manual for phyloseq. Available: <ext-link ext-link-type="uri" xlink:href="http://bioconductor.org/packages/devel/bioc/manuals/phyloseq/man/phyloseq.pdf" xlink:type="simple">http://bioconductor.org/packages/devel/bioc/manuals/phyloseq/man/phyloseq.pdf</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-The1"><label>39</label>
<mixed-citation publication-type="book" xlink:type="simple">The phyloseq Homepage. Available: joey711.github.com/phyloseq/. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-R2"><label>40</label>
<mixed-citation publication-type="journal" xlink:type="simple"><collab xlink:type="simple">R Development Core Team</collab> (<year>2012</year>) <article-title>Writing R Extensions</article-title>. <source>Comprehensive R Archive Network</source> (CRAN).</mixed-citation>
</ref>
<ref id="pone.0061217-Wickham1"><label>41</label>
<mixed-citation publication-type="other" xlink:type="simple">Wickham H, Danenberg P, Eugster M. roxygen2: In-source documentation for R. R package version 2.2.2. Available: <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/roxygen2/index.html" xlink:type="simple">http://cran.r-project.org/web/packages/roxygen2/index.html</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-Faith1"><label>42</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Faith</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Minchin</surname><given-names>P</given-names></name> (<year>1987</year>) <article-title>Compositional dissimilarity as a robust measure of ecological distance</article-title>. <source>Vegetatio</source> <volume>69</volume>: <fpage>57</fpage>–<lpage>68</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Anderson1"><label>43</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Anderson</surname><given-names>MJ</given-names></name>, <name name-style="western"><surname>Ellingsen</surname><given-names>KE</given-names></name>, <name name-style="western"><surname>McArdle</surname><given-names>BH</given-names></name> (<year>2006</year>) <article-title>Multivariate dispersion as a measure of beta diversity</article-title>. <source>Ecology Letters</source> <volume>9</volume>: <fpage>683</fpage>–<lpage>693</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Hamady2"><label>44</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hamady</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Lozupone</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Knight</surname><given-names>R</given-names></name> (<year>2009</year>) <article-title>Fast unifrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and phylochip data</article-title>. <source>The ISME Journal</source></mixed-citation>
</ref>
<ref id="pone.0061217-Lozupone1"><label>45</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lozupone</surname><given-names>CA</given-names></name>, <name name-style="western"><surname>Hamady</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Kelley</surname><given-names>ST</given-names></name>, <name name-style="western"><surname>Knight</surname><given-names>R</given-names></name> (<year>2007</year>) <article-title>Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities</article-title>. <source>Applied and Environmental Microbiology</source> <volume>73</volume>: <fpage>1576</fpage>–<lpage>1585</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Lozupone2"><label>46</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lozupone</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Knight</surname><given-names>R</given-names></name> (<year>2005</year>) <article-title>UniFrac: a new phylogenetic method for comparing microbial communities</article-title>. <source>Applied and Environmental Microbiology</source> <volume>71</volume>: <fpage>8228</fpage>–<lpage>8235</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Caporaso2"><label>47</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Caporaso</surname><given-names>JG</given-names></name>, <name name-style="western"><surname>Lauber</surname><given-names>CL</given-names></name>, <name name-style="western"><surname>Walters</surname><given-names>WA</given-names></name>, <name name-style="western"><surname>Berg-Lyons</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Lozupone</surname><given-names>CA</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample</article-title>. <source>Proceedings of the National Academy of Sciences</source> <volume>108</volume>: <fpage>4516</fpage>–<lpage>4522</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Greenacre1"><label>48</label>
<mixed-citation publication-type="book" xlink:type="simple">Greenacre MJ (1984) Theory and Applications of Correspondence Analysis. London: Academic Press.</mixed-citation>
</ref>
<ref id="pone.0061217-TerBraak1"><label>49</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ter Braak</surname><given-names>CJF</given-names></name> (<year>1986</year>) <article-title>Canonical Correspondence Analysis: A new eigenvector technique for multivariate direct gradient analysis</article-title>. <source>Ecology</source> <volume>67</volume>: <fpage>1167</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Hill1"><label>50</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hill</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Gauch</surname><given-names>H</given-names></name> (<year>1980</year>) <article-title>Detrended Correspondence Analysis, an improved ordination technique</article-title>. <source>Vegetatio</source> <volume>42</volume>: <fpage>47</fpage>–<lpage>58</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Wollenberg1"><label>51</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wollenberg</surname><given-names>AL</given-names></name> (<year>1977</year>) <article-title>Redundancy analysis an alternative for canonical correlation analysis</article-title>. <source>Psychometrika</source> <volume>42</volume>: <fpage>207</fpage>–<lpage>219</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Hotelling1"><label>52</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hotelling</surname><given-names>H</given-names></name> (<year>1933</year>) <article-title>Analysis of a complex of statistical variables into principal components</article-title>. <source>Journal of Educational Psychology</source> <volume>24</volume>: <fpage>417</fpage>–<lpage>441</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Pavoine1"><label>53</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pavoine</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Dufour</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Chessel</surname><given-names>D</given-names></name> (<year>2004</year>) <article-title>From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis</article-title>. <source>Journal of Theoretical Biology</source> <volume>228</volume>: <fpage>523</fpage>–<lpage>537</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Gower1"><label>54</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gower</surname><given-names>JC</given-names></name> (<year>1966</year>) <article-title>Some distance properties of latent root and vector methods used in multivariate analysis</article-title>. <source>Biometrika</source> <volume>53</volume>: <fpage>325</fpage>–<lpage>338</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Minchin1"><label>55</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Minchin</surname><given-names>PR</given-names></name> (<year>1987</year>) <article-title>An evaluation of the relative robustness of techniques for ecological ordination</article-title>. <source>Vegetatio</source> <volume>69</volume>: <fpage>89</fpage>–<lpage>107</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Thioulouse1"><label>56</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Thioulouse</surname><given-names>J</given-names></name> (<year>2011</year>) <article-title>Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods</article-title>. <source>Annals of Applied Statistics</source> <volume>5</volume>: <fpage>2300</fpage>–<lpage>2325</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Wickham2"><label>57</label>
<mixed-citation publication-type="book" xlink:type="simple">Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer New York.</mixed-citation>
</ref>
<ref id="pone.0061217-Wilkinson1"><label>58</label>
<mixed-citation publication-type="book" xlink:type="simple">Wilkinson L, Wills G (2005) The Grammar Of Graphics. Statistics and Computing. Springer, 2nd edition.</mixed-citation>
</ref>
<ref id="pone.0061217-Rajaram1"><label>59</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Rajaram</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Oono</surname><given-names>Y</given-names></name> (<year>2010</year>) <article-title>NeatMap–non-clustering heat map alternatives in R</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>: <fpage>45</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Csardi1"><label>60</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Csardi</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Nepusz</surname><given-names>T</given-names></name> (<year>2006</year>) <article-title>The igraph software package for complex network research</article-title>. <source>InterJournal Complex Systems</source> <fpage>1695</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Tufte1"><label>61</label>
<mixed-citation publication-type="book" xlink:type="simple">Tufte ER (2001) The visual display of quantitative information, Graphics Press, Cheshire, Con-necticut, chapter 9 Aesthetics and Technique in Data Graphical Design. 2nd edition, p. 178.</mixed-citation>
</ref>
<ref id="pone.0061217-Greenacre2"><label>62</label>
<mixed-citation publication-type="book" xlink:type="simple">Greenacre M (2007) Correspondence analysis in practice. Chapman &amp; Hall.</mixed-citation>
</ref>
<ref id="pone.0061217-Pinto1"><label>63</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pinto</surname><given-names>AJ</given-names></name>, <name name-style="western"><surname>Raskin</surname><given-names>L</given-names></name> (<year>2012</year>) <article-title>PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets</article-title>. <source>PLoS ONE</source> <volume>7</volume>: <fpage>e43093</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Sanders1"><label>64</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Sanders</surname><given-names>HL</given-names></name> (<year>1968</year>) <article-title>Marine benthic diversity: A comparative study</article-title>. <source>The American Naturalist</source> <volume>102</volume>: <fpage>243</fpage>–<lpage>282</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Holmes1"><label>65</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Alekseyenko</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Timme</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Nelson</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Pasricha</surname><given-names>PJ</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Visualization and statisti-cal comparisons of microbial communities using R packages on phylochip data</article-title>. <source>Pacific Symposium on Biocomputing</source> <fpage>142</fpage>–<lpage>153</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Allison1"><label>66</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Allison</surname><given-names>DB</given-names></name>, <name name-style="western"><surname>Cui</surname><given-names>X</given-names></name>, <name name-style="western"><surname>Page</surname><given-names>GP</given-names></name>, <name name-style="western"><surname>Sabripour</surname><given-names>M</given-names></name> (<year>2006</year>) <article-title>Microarray Data Analysis: from Disarray to Consolidation and Consensus</article-title>. <source>Nat Rev Genet</source> <volume>7</volume>: <fpage>55</fpage>–<lpage>65</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Holmes2"><label>67</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name>, <name name-style="western"><surname>McMurdie</surname><given-names>PJ</given-names></name> (<year>2012</year>) <article-title>Statistical analysis challenges in the microbiome</article-title>. <source>To appear PNAS: The Social Biology of Microbial Communities forum on Microbial Threats</source></mixed-citation>
</ref>
<ref id="pone.0061217-Nelson1"><label>68</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Nelson</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Pasricha</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Spormann</surname><given-names>A</given-names></name> (<year>2010</year>) <article-title>Shifts in luminal and mucosal microbial communities associated with an experimental model of irritable bowel syndrome</article-title>. <source>Gastroenterology</source></mixed-citation>
</ref>
<ref id="pone.0061217-Efron1"><label>69</label>
<mixed-citation publication-type="book" xlink:type="simple">Efron B, Tibshirani R (1993) An introduction to the bootstrap, volume 57. Chapman &amp; Hall/CRC.</mixed-citation>
</ref>
<ref id="pone.0061217-Holmes3"><label>70</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Holmes</surname><given-names>S</given-names></name> (<year>2003</year>) <article-title>Bootstrapping phylogenetic trees: theory and methods</article-title>. <source>Statistical Science</source> <fpage>241</fpage>–<lpage>255</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Westfall1"><label>71</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Westfall</surname><given-names>PH</given-names></name>, <name name-style="western"><surname>Young</surname><given-names>SS</given-names></name> (<year>1993</year>) <article-title>Resampling-Based Multiple Testing. Examples and Methods for P-Value Adjustment</article-title>. <source>Wiley-Interscience</source></mixed-citation>
</ref>
<ref id="pone.0061217-Pollard1"><label>72</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pollard</surname><given-names>KS</given-names></name>, <name name-style="western"><surname>Gilbert</surname><given-names>HN</given-names></name>, <name name-style="western"><surname>Ge</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Taylor</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Dudoit</surname><given-names>S</given-names></name> (<year>2010</year>) <article-title>multtest: Resampling-based multiple hypothesis testing</article-title>. <source>R package version 2.4.0</source></mixed-citation>
</ref>
<ref id="pone.0061217-Ioannidis1"><label>73</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ioannidis</surname><given-names>JPA</given-names></name> (<year>2005</year>) <article-title>Why most published research findings are false</article-title>. <source>PLoS medicine</source> <volume>2</volume>: <fpage>e124</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Merali1"><label>74</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Merali</surname><given-names>Z</given-names></name> (<year>2010</year>) <article-title>Computational science: Error, why scientific programming does not compute</article-title>. <source>Nature</source> <volume>467</volume>: <fpage>775</fpage>–<lpage>777</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Peng1"><label>75</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Peng</surname><given-names>RD</given-names></name> (<year>2011</year>) <article-title>Reproducible research in computational science</article-title>. <source>Science</source> <volume>334</volume>: <fpage>1226</fpage>–<lpage>1227</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Ince1"><label>76</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ince</surname><given-names>DC</given-names></name>, <name name-style="western"><surname>Hatton</surname><given-names>L</given-names></name>, <name name-style="western"><surname>Graham-Cumming</surname><given-names>J</given-names></name> (<year>2012</year>) <article-title>The case for open computer programs</article-title>. <source>Nature</source> <volume>482</volume>: <fpage>485</fpage>–<lpage>488</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Carey1"><label>77</label>
<mixed-citation publication-type="book" xlink:type="simple">Carey VJ, Stodden V (2010) Reproducible Research Concepts and Tools for Cancer Bioinformatics. In: Ochs MF, Casagrande JT, Davuluri RV, editors, Biomedical Informatics for Cancer Research, Boston, MA: Springer US. pp. 149–175.</mixed-citation>
</ref>
<ref id="pone.0061217-Knight1"><label>78</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Knight</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Jansson</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Field</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Fierer</surname><given-names>N</given-names></name>, <name name-style="western"><surname>Desai</surname><given-names>N</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>Unlocking the potential of metage-nomics through replicated experimental design</article-title>. <source>Nature biotechnology</source> <volume>30</volume>: <fpage>513</fpage>–<lpage>520</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Human1"><label>79</label>
<mixed-citation publication-type="journal" xlink:type="simple"><collab xlink:type="simple">Human Microbiome Project Consortium</collab> (<year>2012</year>) <article-title>Structure, function and diversity of the healthy human microbiome</article-title>. <source>Nature</source> <volume>486</volume>: <fpage>207</fpage>–<lpage>214</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Donoho1"><label>80</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Donoho</surname><given-names>DL</given-names></name> (<year>2010</year>) <article-title>An invitation to reproducible computational research</article-title>. <source>Biostatistics (Oxford, England)</source> <volume>11</volume>: <fpage>385</fpage>–<lpage>388</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Peng2"><label>81</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Peng</surname><given-names>RD</given-names></name> (<year>2009</year>) <article-title>Reproducible research and Biostatistics</article-title>. <source>Biostatistics (Oxford, England)</source> <volume>10</volume>: <fpage>405</fpage>–<lpage>408</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Gentleman2"><label>82</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gentleman</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Temple Lang</surname><given-names>D</given-names></name> (<year>2004</year>) <article-title>Statistical analyses and reproducible research</article-title>. <source>Bioconductor Project Working Papers</source> <fpage>2</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Prez1"><label>83</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pérez</surname><given-names>F</given-names></name>, <name name-style="western"><surname>Granger</surname><given-names>BE</given-names></name> (<year>2007</year>) <article-title>IPython: a System for Interactive Scientific Computing</article-title>. <source>Comput Sci Eng</source> <volume>9</volume>: <fpage>21</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Allaire1"><label>84</label>
<mixed-citation publication-type="other" xlink:type="simple">Allaire J, Horner J, Marti V, Porte N The markdown package: Markdown rendering for R. R package version 0.5.4. Available: <ext-link ext-link-type="uri" xlink:href="http://CRAN.R-project.org/package=markdown" xlink:type="simple">http://CRAN.R-project.org/package=markdown</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-Gentleman3"><label>85</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gentleman</surname><given-names>R</given-names></name> (<year>2005</year>) <article-title>Reproducible research: a bioinformatics case study</article-title>. <source>Statistical applications in genetics and molecular biology</source> <volume>4</volume>: <fpage>Article2</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-The2"><label>86</label>
<mixed-citation publication-type="other" xlink:type="simple">The phyloseq Demo Repository. Available: <ext-link ext-link-type="uri" xlink:href="https://github.com/joey711/phyloseq-demo" xlink:type="simple">https://github.com/joey711/phyloseq-demo</ext-link>. Accessed 2013 March 22.</mixed-citation>
</ref>
<ref id="pone.0061217-Barnes1"><label>87</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Barnes</surname><given-names>N</given-names></name> (<year>2010</year>) <article-title>Publish your computer code: it is good enough</article-title>. <source>Nature</source> <volume>467</volume>: <fpage>753</fpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Copeland1"><label>88</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Copeland</surname><given-names>WK</given-names></name>, <name name-style="western"><surname>Krishnan</surname><given-names>V</given-names></name>, <name name-style="western"><surname>Beck</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Settles</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Foster</surname><given-names>JA</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>mcaGUI: microbial commu-nity analysis R-Graphical User Interface (GUI)</article-title>. <source>Bioinformatics (Oxford, England)</source> <volume>28</volume>: <fpage>2198</fpage>–<lpage>2199</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Wickham3"><label>89</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wickham</surname><given-names>H</given-names></name> (<year>2007</year>) <article-title>Reshaping data with the reshape package</article-title>. <source>Journal of Statistical Software</source> <volume>21</volume>: <fpage>1</fpage>–<lpage>20</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Wickham4"><label>90</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wickham</surname><given-names>H</given-names></name> (<year>2011</year>) <article-title>The split-apply-combine strategy for data analysis</article-title>. <source>Journal of Statistical Software</source> <volume>40</volume>: <fpage>1</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Arumugam1"><label>91</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Arumugam</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Raes</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Pelletier</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Le Paslier</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Yamada</surname><given-names>T</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Enterotypes of the human gut microbiome</article-title>. <source>Nature</source> <volume>473</volume>: <fpage>174</fpage>–<lpage>180</lpage>.</mixed-citation>
</ref>
<ref id="pone.0061217-Oksanen1"><label>92</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Oksanen</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Blanchet</surname><given-names>FG</given-names></name>, <name name-style="western"><surname>Kindt</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Legendre</surname><given-names>P</given-names></name>, <name name-style="western"><surname>O'Hara</surname><given-names>RB</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>vegan: Community Ecology Package</article-title>. <source>R package version 1.17–10</source></mixed-citation>
</ref>
</ref-list></back>
</article>