Running FishTaco¶

The FishTaco python module handles all calculations internally. FishTaco offers an interface to the FishTaco functionality via the command line and the run_fishtaco.py script.

Usage¶

FishTaco can be used in two alternative modes, depending on the availability of genomic information for each taxon. Specifically, if such data is available (e.g., through reference genomes), FishTaco can be used with the -gc flag. However, FishTaco can also infer this data by using the -inf flag. If you are using 16S data coupled with PICRUSt, please read Can I run FishTaco with a PICRUSt-derived metagenomic functional profile?.

Running FishTaco with genomic content data:

run_fishtaco.py -ta TAXA_ABUN_FILE -fu FUNCTION_ABUN_FILE -l LABELS_FILE -gc GENOMIC_CONTENT_FILE [options]

Running FishTaco with genomic content inference:

run_fishtaco.py -ta TAXA_ABUN_FILE -fu FUNCTION_ABUN_FILE -l LABELS_FILE -inf [options]

Required arguments¶

-ta, --taxa_abundance TAXA_ABUN_FILE: Input file of taxonomic abundance profiles (format)
-l, --labels LABELS_FILE: Input file of label assignment for the two sample sets being compared (format)

Optional arguments¶

-fu, --function_abundance FUNCTION_ABUN_FILE: Input file of function abundance (format)
-gc, --genomic_content_of_taxa GENOMIC_CONTENT_FILE: Input file of genomic content of each taxa (format)
-inf, --perform_inference_of_genomic_content: Defines if genome content is inferred (either de-novo or prior-based if genomic content is also given, default: FALSE)
-label_to_find_enrichment_in: Define sample set label to find enrichment in (default: 1)
-label_to_find_enrichment_against: Define sample set label to find enrichment against (default: 0)
-op, --output_prefix OUTPUT_PREF: Output prefix for result files (default: fishtaco_out)
-map_function_level {pathway, module, none, custom}: Map KOs to pathways, modules, none, or custom (default: pathway)
-assessment, --taxa_assessment_method {single_taxa, multi_taxa}: The method used when assessing taxa to compute individual contributions. The running time of single_taxa will be significantly lower than multi_taxa, but less accurate (see manuscript for details) (default: multi_taxa)

Advanced usage arguments¶

-map_function_file FUNC_LEVEL_MAP_FILE: Mapping file from KOs to pathways, modules, or custom (default: use internal KEGG database downloaded 07/15/2013)
-perform_inference_on_ko_level: Indicates to perform the inference on the KO level (default: use the mapped functional level, e.g., pathway)
-mult_hyp, --multiple_hypothesis_correction {Bonf, FDR-0.01, FDR-0.05, FDR-0.1, none}: Multiple hypothesis correction for functional enrichment (default: FDR-0.05)
-max_func, --maximum_functions_to_analyze MAX_FUNCTIONS: Maximum number of enriched functions to consider (default: All)
-score, --score_to_compute {t_test, mean_diff, median_diff, wilcoxon, log_mean_ratio}: The enrichment score to compute for each function (default: wilcoxon)
-max_score, --max_score_cutoff MAX_SCORE_CUTOFF: The maximum score cutoff (for example, when dividing by zero) (default: 100)
-na_rep NA_REP: How to represent NAs in the output (default: NA)
-number_of_permutations NUMBER_OF_PERMUTATIONS: number of permutations (default: 100)
-number_of_shapley_orderings_per_taxa NUMBER_OF_SHAPLEY_ORDERINGS_PER_TAXA: number of shapley orderings per taxa (default: 5)
-en, --enrichment_results DA_RESULT_FILE: Pre-computed functional enrichment results from the compute_differential_abundance.py script (default: None)
-single_function_filter SINGLE_FUNCTION_FILTER: Limit analysis only to this single function (default: None)
-multi_function_filter_list MULTI_FUNCTION_FILTER: Limit analysis only to these comma-separated functions (default: None)
-h, --help: show help message and exit
-functional_profile_already_corrected_with_musicc: Indicates that the functional profile has been already corrected with MUSiCC prior to running FishTaco (default: False)
-log, --log: Write to log file (default: False)

FishTaco Output Files¶

Main output files¶

fishtaco_out_main_output_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the taxon-level decomposition of shift scores for the differentially abundant functions. (format)

Supporting stats output files¶

fishtaco_out_STAT_taxa_contributions_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the final taxon-level contribution score for every differentially abundant(shifted) function in the input data, as calculated by FishTaco
fishtaco_out_STAT_DA_function_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains statistics regarding the differential abundance for each function in the input file
fishtaco_out_STAT_DA_taxa_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains statistics regarding the differential abundance for each taxa in the input file
fishtaco_out_STAT_mean_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the mean taxon-level contribution score for every differentially abundant(shifted) function in the input data (in default settings, this is equal to the final score)
fishtaco_out_STAT_median_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the median taxon-level contribution score for every differentially abundant(shifted) function in the input data
fishtaco_out_STAT_std_stat_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the standard deviation of taxon-level contribution score for every differentially abundant(shifted) function in the input data
fishtaco_out_STAT_original_value_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the metagenome-based shift statistics value for each function in the input file
fishtaco_out_STAT_predicted_DA_value_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the taxa-based shift statistics value for each function in the input file
fishtaco_out_STAT_predicted_function_abundance_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the taxa-based abundance profile for each function in each sample
fishtaco_out_STAT_predicted_function_agreement_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains various statistics regarding the agreement between the metagenome- and taxa-based abundance profiles for each function
fishtaco_out_STAT_residual_function_abundance_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the residual between the metagenome- and taxa-based abundance profiles for each function (in “remove-residual” mode the residual is equal to zero)
fishtaco_out_STAT_shapley_orderings_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the random Shapley orderings used in the run (for “permuted_shapley_orderings” mode)
fishtaco_out_STAT_taxa_learned_copy_num_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the inferred copy numbers of each function in each taxon (for FishTaco with prior-based or de novo inference)
fishtaco_out_STAT_taxa_learning_rsqr_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains various statistics regarding the agreement between the metagenome- and taxa-based abundance profiles for each function (on test data)
fishtaco_out_STAT_run_log_SCORE_wilcoxon_ASSESSMENT_permuted_shapley_orderings.tab: contains the running log of FishTaco

Examples¶

The fishtaco/examples directory contains the following:

the file METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab contains scaled abundance measurements of 10 species in 213 samples from the HMP dataset
the file WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab contains MUSiCC-corrected abundance values for the K00001 orthology group in the same samples
the file METAPHLAN_taxa_vs_KO_only_K00001.tab contains the copy numbers of the K00001 orthology group in the 10 species as above
the file SAMPLE_vs_CLASS.tab contains class labels from the same samples (control vs. case)

Using these files as input for FishTaco results in the following output files (found in the fishtaco/examples/output directory):

Note: If you installed the FishTaco package using pip, the examples directory is located in your python packages directory, e.g., lib/python3.3/site-packages

FishTaco with no inference¶

Running FishTaco with no inference generates the output files found in fishtaco/examples/output/fishtaco_out_no_inf_STAT_*

run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -gc fishtaco/examples/METAPHLAN_taxa_vs_KO_only_K00001.tab -op fishtaco_out_no_inf
-map_function_level none -functional_profile_already_corrected_with_musicc -assessment single_taxa -log

FishTaco with prior-based inference¶

Running FishTaco with prior-based inference generates the output files found in fishtaco/examples/output/fishtaco_out_prior_based_inf_STAT_*

run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -gc fishtaco/examples/METAPHLAN_taxa_vs_KO_only_K00001.tab -op fishtaco_out_prior_based_inf
-map_function_level none -functional_profile_already_corrected_with_musicc -inf -assessment single_taxa -log

FishTaco with de novo inference¶

Running FishTaco with de novo inference generates the output files found in fishtaco/examples/output/fishtaco_out_de_novo_inf_STAT_*

run_fishtaco.py -ta fishtaco/examples/METAPHLAN_taxa_vs_SAMPLE_for_K00001.tab -fu fishtaco/examples/WGS_KO_vs_SAMPLE_MUSiCC_only_K00001.tab
-l fishtaco/examples/SAMPLE_vs_CLASS.tab -op fishtaco_out_de_novo_inf -map_function_level none -functional_profile_already_corrected_with_musicc
-inf -assessment single_taxa -log