URL: http://borensteinlab.com/software_fishtaco.html
Documentation: http://borenstein-lab.github.io/fishtaco/
Forum: https://groups.google.com/forum/#!forum/fishtaco-users
Publication: Manor, O., and Borenstein, E. (2017). Systematic Characterization and Analysis of the Taxonomic Drivers of Functional Shifts in the Human Microbiome. Cell Host & Microbe 21, 254–267.
Follow these steps to quickly run a FishTaco analysis and visualize the results.
If you do not have Python and either Anaconda or Miniconda installed, follow the instructions to do so here. We recommend Miniconda (it will be faster). If you are using a Windows computer, you will want to enter commands for the rest of this tutorial in the Anaconda Prompt.
Install FishTaco, if you haven’t done so yet. The easiest way to do so is to set up a conda environment with FishTaco and all of its dependencies. To set up and activate this environment:
For Mac and Linux computers, download and save this .yml file to your computer (if it opens in your browser, right-click it to save). Then run the following commands in a command line shell to create and activate the FishTaco environment:
conda env create -f fishtaco_1-1-1.yml
conda activate fishtaco
For Windows computers, run the following commands:
conda create -n fishtaco python=2.7.16 scipy=0.18.1 numpy=1.11.3 scikit-learn=0.17.1 pandas=0.23.4 statsmodels==0.9.0 pip
conda activate fishtaco
pip install musicc==1.0.2 fishtaco==1.1.1
test_fishtaco.py
Note: The test_fishtaco.py script may fail on Windows computers, and you may need to provide the full path to these scripts to use them in the command prompt. You can try running:
run_fishtaco.py -h
If that successfully lists the options for running FishTaco, your installation is still okay and you should be able to now run the analysis.
If you encounter any errors, you can also try installing FishTaco and its dependencies individually by following the instructions here.
Download the example data by clicking and unzipping this zip file. See below for more details on this dataset.
Run the FishTaco analysis! In a command line shell, navigate into the downloaded data directory and run the following command to analyze the example data:
run_fishtaco.py -ta bv_qpcr.txt -fu bv_metagenomes.txt -l bv_metadata_fishtaco.txt -inf -log -assessment single_taxa
If you just want to see the FishTaco visualization, the FishTaco example data file also includes the main output of the run_fishtaco.py command on this dataset, which you can provide it to the visualization server.
FishTaco is a tool for systematic analysis of the taxonomic contributors to shifts in functional abundances. FishTaco quantifies the extent to which an observed difference in functional abundances between cases and controls can be attributed to different taxa. Taxa can contribute to a difference in functional abundances via 4 different mechanisms (also illustrated in panel C below):
Make sure you understand conceptually how each mechanism affects the differential abundance of a function.
In this tutorial, you will:
FishTaco is a Python library. It can be installed as an Anaconda environment or as a standalone package along with its dependencies. To install using the conda environment, follow the commands below (which were also included in the pre-workshop instructions).
First, if you do not have Python and either Anaconda or Miniconda installed, follow the instructions to do so here. We recommend Miniconda (it will be faster). If you are using a Windows computer, you will want to enter commands for the rest of this tutorial in the Anaconda Prompt.
Install a Python conda environment containing the MUSiCC and FishTaco packages. To set up the environment, download and save this .yml file to your computer (if it opens in your browser, right-click it to save). Then run the following command in a command line shell:
conda env create -f fishtaco_1-1-1.yml
conda activate fishtaco
test_fishtaco.py
Note: The test_fishtaco.py script may fail on Windows computers. If this happens, instead run:
run_fishtaco.py -h
If that successfully lists the options for running FishTaco, your installation is still okay and you should be able to now run the analysis.
If you run into problems, you can also install FishTaco and its dependencies individually by following the instructions here.
If you have a 16S rRNA dataset consisting of samples in two different groups (e.g. cases and controls), and which you have already analyzed with PICRUSt (1 or 2), you can analyze it with FishTaco, but it will require some additional re-formatting. You will need the following files:
The formatting required for each of these files is described at https://borenstein-lab.github.io/fishtaco/fishtaco_file_formats.html You can also use the provided example data as a guide.
Otherwise, we will use an example dataset describing the vaginal microbiome. This dataset is from the following publication:
Srinivasan, S., Morgan, M.T., Fiedler, T.L., Djukovic, D., Hoffman, N.G., Raftery, D., Marrazzo, J.M., and Fredricks, D.N. (2015). Metabolic Signatures of Bacterial Vaginosis. MBio 6, e00204-15.
Download the example data by clicking and unzipping this zip file.
The dataset describes a cohort of women with and without Bacterial Vaginosis (BV), consisting of the following data types for 39 samples:
The small number of taxa in this dataset allows for running a full analysis in the workshop time frame.
FishTaco relates any taxonomic abundances to any community-level gene abundances in the context of a case-control study. You may have prior information on the functions encoded by each taxon, or not. The taxonomic and functional abundances can be generated by any method, meaning that many different data types and processing methods may be appropriate for generating input data for FishTaco, each with different pros and cons. A few possibilities include:
For this analysis, we are going to have FishTaco infer the functional content of each taxon de novo, by providing the “-inf” flag. An alternative option would be to provide a file detailing the genome content of each measured taxon, but not all of the taxa here have reference genomes available.
run_fishtaco.py -ta bv_qpcr.txt -fu bv_metagenomes.txt -l bv_metadata_fishtaco.txt -inf -log -assessment single_taxa
This command runs the simplified, “single_taxa” version of FishTaco. The full and more accurate FishTaco multi-taxa permutation analysis takes around 25-30 minutes to complete on this dataset. If you are using your own dataset, it could take much longer, depending on the number of taxa and the number of functions. You can read more about how to speed up a FishTaco analysis here.
You can examine the other possible options to provide for a FishTaco analysis by using the “-h” flag:
run_fishtaco.py -h
Some of these are briefly described below.
Inferring and/or providing genomic content (-gc, -inf): A file specifying the genomic KO content for each taxon can be provided with the “-gc” flag. If the “-inf” flag is also provided, FishTaco will still infer a genomic profile for each taxon, but will use the provided content as a prior for this inference.
Analyzing different levels of functions (-map_function_level): By default, FishTaco analyzes taxonomic contributors to the differential abundance of KEGG pathways. To decompose a more specific functional category, FishTaco can also analyze KEGG modules, or custom groups of functions (see the documentation).
Single-taxa versus multi-taxa permutations (-assessment): By default, FishTaco analyzes the contribution of each taxon to each function by permuting the observed abundances of varying subsets of taxa across samples. Supplying “-assessment single_taxa” will cause FishTaco to instead permute the abundances of each single taxon independently, which will speed up the runtime but produce less accurate results. See the publication for more details.
Analyzing only a specific subset of functions (-single_function_filter, -multi_function_filter_list): Specify a single function (KEGG pathway or module) or list of functions to be analyzed with FishTaco.
FishTaco uses a separate R package called FishTacoPlot to generate plots displaying the results of the analysis. You can also generate output plots via a web server, although you will have fewer options for customization. Both options are demonstrated below - you can choose to try one or both.
The example data download also includes the main FishTaco output file from the analysis above, so you can also use that for this portion of the tutorial.
To use the web server, navigate to http://elbo-spice.gs.washington.edu/shiny/FishTacoPlot/, and select the following options on the right-hand menu:
Install RStudio (here) if you haven’t done so yet, and open a new session.
First, install the FishTacoPlot package from GitHub, using the devtools package:
if(!requireNamespace("devtools", quietly = T)){ # Install devtools if you haven't already
install.package("devtools")
}
devtools::install_github("borenstein-lab/fishtaco-plot")
library(FishTacoPlot)
library(KEGGREST)
##Number of functions to include in plot
n_functions = 5
## Obtain list of the n most differentially abundant functions
top_functions = fread("fishtaco_out_STAT_DA_function_SCORE_wilcoxon_ASSESSMENT_single_taxa.tab")[order(abs(StatValue), decreasing = T)][1:n_functions, Function]
## Obtain the names of the top pathways from KEGG
top_functions_names = sapply(top_functions, function(x){
return(keggGet(x)[[1]]$NAME)
})
# Make the plot
p = MultiFunctionTaxaContributionPlots(input_dir=getwd(), input_prefix="fishtaco_out",
input_taxa_taxonomy="taxonomy_vaginal_fishtaco.txt", sort_by="predicted_da", plot_type="bars", add_predicted_da_markers=TRUE, add_original_da_markers=TRUE, add_case_control_line = T,
add_names_in_bars = T, input_function_filter_list = top_functions, add_facet_labels = T)
## Adjust the plot formatting
p = p + scale_x_continuous(breaks=seq(1:n_functions), labels = top_functions_names) +
guides(fill=guide_legend(nrow=7)) + ylab("Wilcoxon test statistic (W)") +
theme(plot.title=element_blank(), axis.title.x=element_text(size=12,colour="black",face="plain"),
axis.text.x=element_text(size=10,colour="black",face="plain"), axis.title.y=element_blank(),
axis.text.y=element_text(size=9,colour="black",face="plain"), axis.ticks.y=element_blank(),
axis.ticks.x=element_blank(), panel.grid.major.x = element_line(colour="light gray"), panel.grid.major.y = element_line(colour="light gray"),
panel.grid.minor.x = element_line(colour="light gray"), panel.grid.minor.y = element_line(colour="light gray"), panel.background = element_rect(fill="transparent",colour=NA), panel.border = element_rect(fill="transparent",colour="black"), legend.background=element_rect(colour="black"), legend.title=element_text(size=10), legend.text=element_text(size=8,face="plain"),
legend.key.size=unit(0.8,"line"), legend.spacing=unit(0.1,"line"), legend.position="bottom")
## Display the plot
p
As you examine the FishTaco plots, recall that the upper bar represents case-associated taxa, while the lower bar represents control-associated taxa. Some questions to consider:
Depending on the type of data used, the information on the genomic content of each taxon and the total functional abundances may be more or less complete and accurate.
The FishTaco plots display both the differential abundance of a function as observed in the metagenomic profile (red diamond), as well as the differential abundance of the taxa-based functional profile (white diamond). If these are very far apart, it is an indication that the genomic content inferred for each taxa may not be capturing the observed community-level differences very accurately.