Running FishTaco¶
The FishTaco python module handles all calculations internally. FishTaco offers an interface to the FishTaco functionality via the command line and the run_fishtaco.py script.
Usage¶
FishTaco can be used in two alternative modes, depending on the availability of genomic information for each taxon. Specifically, if such data is available (e.g., through reference genomes), FishTaco can be used with the -gc flag. However, FishTaco can also infer this data by using the -inf flag. If you are using 16S data coupled with PICRUSt, please read Can I run FishTaco with a PICRUSt-derived metagenomic functional profile?.
Running FishTaco with genomic content data:
run_fishtaco.py -ta TAXA_ABUN_FILE -fu FUNCTION_ABUN_FILE -l LABELS_FILE -gc GENOMIC_CONTENT_FILE [options]
Running FishTaco with genomic content inference:
run_fishtaco.py -ta TAXA_ABUN_FILE -fu FUNCTION_ABUN_FILE -l LABELS_FILE -inf [options]
Required arguments¶
Optional arguments¶
-fu, --function_abundance FUNCTION_ABUN_FILE
Input file of function abundance (format)
-gc, --genomic_content_of_taxa GENOMIC_CONTENT_FILE
Input file of genomic content of each taxa (format)
-inf, --perform_inference_of_genomic_content
Defines if genome content is inferred (either de-novo or prior-based if genomic content is also given, default: FALSE)
-label_to_find_enrichment_in
Define sample set label to find enrichment in (default: 1)
-label_to_find_enrichment_against
Define sample set label to find enrichment against (default: 0)
-op, --output_prefix OUTPUT_PREF
Output prefix for result files (default: fishtaco_out)
-map_function_level {pathway, module, none, custom}
Map KOs to pathways, modules, none, or custom (default: pathway)
-assessment, --taxa_assessment_method {single_taxa, multi_taxa}
The method used when assessing taxa to compute individual contributions. The running time of single_taxa will be significantly lower than multi_taxa, but less accurate (see manuscript for details) (default: multi_taxa)
Advanced usage arguments¶
-map_function_file FUNC_LEVEL_MAP_FILE
Mapping file from KOs to pathways, modules, or custom (default: use internal KEGG database downloaded 07/15/2013)
-perform_inference_on_ko_level
Indicates to perform the inference on the KO level (default: use the mapped functional level, e.g., pathway)
-mult_hyp, --multiple_hypothesis_correction {Bonf, FDR-0.01, FDR-0.05, FDR-0.1, none}
Multiple hypothesis correction for functional enrichment (default: FDR-0.05)
-max_func, --maximum_functions_to_analyze MAX_FUNCTIONS
Maximum number of enriched functions to consider (default: All)
-score, --score_to_compute {t_test, mean_diff, median_diff, wilcoxon, log_mean_ratio}
The enrichment score to compute for each function (default: wilcoxon)
-max_score, --max_score_cutoff MAX_SCORE_CUTOFF
The maximum score cutoff (for example, when dividing by zero) (default: 100)
-na_rep NA_REP
How to represent NAs in the output (default: NA)
-number_of_permutations NUMBER_OF_PERMUTATIONS
number of permutations (default: 100)
-number_of_shapley_orderings_per_taxa NUMBER_OF_SHAPLEY_ORDERINGS_PER_TAXA
number of shapley orderings per taxa (default: 5)
-en, --enrichment_results DA_RESULT_FILE
Pre-computed functional enrichment results from the compute_differential_abundance.py script (default: None)
-single_function_filter SINGLE_FUNCTION_FILTER
Limit analysis only to this single function (default: None)
-multi_function_filter_list MULTI_FUNCTION_FILTER
Limit analysis only to these comma-separated functions (default: None)
-h, --help
show help message and exit
-functional_profile_already_corrected_with_musicc
Indicates that the functional profile has been already corrected with MUSiCC prior to running FishTaco (default: False)
-log, --log
Write to log file (default: False)
FishTaco Output Files¶
Main output files¶
fishtaco_out_main_output_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the taxon-level decomposition of shift scores for the differentially abundant functions. (format)
Supporting stats output files¶
fishtaco_out_STAT_taxa_contributions_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the final taxon-level contribution score for every differentially abundant(shifted) function in the input data, as calculated by FishTaco
fishtaco_out_STAT_DA_function_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains statistics regarding the differential abundance for each function in the input file
fishtaco_out_STAT_DA_taxa_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains statistics regarding the differential abundance for each taxa in the input file
fishtaco_out_STAT_mean_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the mean taxon-level contribution score for every differentially abundant(shifted) function in the input data (in default settings, this is equal to the final score)
fishtaco_out_STAT_median_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the median taxon-level contribution score for every differentially abundant(shifted) function in the input data
fishtaco_out_STAT_std_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the standard deviation of taxon-level contribution score for every differentially abundant(shifted) function in the input data
fishtaco_out_STAT_original_value_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the metagenome-based shift statistics value for each function in the input file
fishtaco_out_STAT_predicted_DA_value_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the taxa-based shift statistics value for each function in the input file
fishtaco_out_STAT_predicted_function_abundance_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the taxa-based abundance profile for each function in each sample
fishtaco_out_STAT_predicted_function_agreement_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains various statistics regarding the agreement between the metagenome- and taxa-based abundance profiles for each function
fishtaco_out_STAT_residual_function_abundance_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the residual between the metagenome- and taxa-based abundance profiles for each function (in “remove-residual” mode the residual is equal to zero)
fishtaco_out_STAT_shapley_orderings_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the random Shapley orderings used in the run (for “permuted_shapley_orderings” mode)
fishtaco_out_STAT_taxa_learned_copy_num_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the inferred copy numbers of each function in each taxon (for FishTaco with prior-based or de novo inference)
fishtaco_out_STAT_taxa_learning_rsqr_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains various statistics regarding the agreement between the metagenome- and taxa-based abundance profiles for each function (on test data)
fishtaco_out_STAT_run_log_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab
contains the running log of FishTaco
Examples¶
The fishtaco/examples directory contains the following:
the file METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab contains scaled abundance measurements of 10 species in 213 samples from the HMP dataset
the file WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab contains MUSiCC-corrected abundance values for the K00001 orthology group in the same samples
the file METAPHLAN_taxa_vs_KO_only_K00001.tab contains the copy numbers of the K00001 orthology group in the 10 species as above
the file SAMPLE_vs_CLASS.tab contains class labels from the same samples (control vs. case)
Using these files as input for FishTaco results in the following output files (found in the fishtaco/examples/output directory):
Note: If you installed the FishTaco package using pip, the examples directory is located in your python packages directory, e.g., lib/python3.3/site-packages
FishTaco with no inference¶
Running FishTaco with no inference generates the output files found in fishtaco/examples/output/fishtaco_out_no_inf_STAT_*
run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -gc fishtaco/examples/METAPHLAN_taxa_vs_KO_only_K00001.tab -op fishtaco_out_no_inf
-map_function_level none -functional_profile_already_corrected_with_musicc -assessment single_taxa -log
FishTaco with prior-based inference¶
Running FishTaco with prior-based inference generates the output files found in fishtaco/examples/output/fishtaco_out_prior_based_inf_STAT_*
run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -gc fishtaco/examples/METAPHLAN_taxa_vs_KO_only_K00001.tab -op fishtaco_out_prior_based_inf
-map_function_level none -functional_profile_already_corrected_with_musicc -inf -assessment single_taxa -log
FishTaco with de novo inference¶
Running FishTaco with de novo inference generates the output files found in fishtaco/examples/output/fishtaco_out_de_novo_inf_STAT_*
run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -op fishtaco_out_de_novo_inf -map_function_level none -functional_profile_already_corrected_with_musicc
-inf -assessment single_taxa -log