taxa_coverage.py – Generates taxa coverage graphs, reports about given primers or primer pairs

Description:

Taxa coverage graphs and text file summaries can be generated by this module for a primer, multiple primers, or primer pairs. These graphs are generated by using primer data generated by the analyze_primers.py module, to determine if a primer will amplify a given sequence.

A single _hits.txt file, multiple hits files, or a directory of hits.txt files can be passed as input (see the analyze_primers.py file for information about the hits.txt file format).

A taxonomy mapping file is required. Sequence IDs in the hits files that do not have corresponding taxonomy mapping data will be binned in a category of “Unknown” sequences under the domain level output of the graph and text summary files. Taxonomy mapping files have the format of the sequence ID<tab>taxonomy, delineated by semicolons, starting with domain.

Example (taxonomy comes from the Silva 97 reference set):

EU884952<tab>Bacteria;Bacteroidetes-Chlorobi;Bacteroidetes;Rikenella

Forward and reverse primer pairs can be tested (-p option). In these cases, amplification for the primers are decided by the poorest score of two primers tested. Warning This module will not check for logical combinations of primers, so all forward and reverse primers, including those that are not positioned to generate amplicons (i.e., 910f and 495r) will be tested if the -p option is enabled.

Usage: taxa_coverage.py [options]

Input Arguments:

Note

[REQUIRED]

-i, --hits_fps
Target primer hits files to generate linkers against. Separate multiple files with a colon.
-T, --taxa_fp
Taxonomy mapping file.

[OPTIONAL]

-d, --taxa_depth
Depth of taxa to generate graphs and summaries for, starting with domain. [default: 3]
-r, --all_files
Test all _hits.txt files in directory specified with -i. [default: False]
-p, --primer_pairs
Test primer pairs. Will test all input hits files that are forward and reverse primers. Hits files must have matching sequences. The worse scoring primer of the pair dictates amplification success. [default: False]
-o, --output_dir
Specify base output directory for taxa summary. A log file be output to this directory. Taxonomy graphs and text summaries will be generated in separated subdirectories from the main output directory. [default: .]
-s, --score_type
Value to use from primer hits file to determine a givenprimer’s amplification success. Valid choices are weighted_score, overall_mismatches, tp_mismatches. Gibbs energy scores not currently implemented [default: weighted_score]
-t, --score_threshold
If primer has score at or below this parameter, the primer amplification is considered to be successful [default: 1.0]

Output:

A log file will be generated for each hits file tested in the output directory. For each primer or primer pair tested, a subdirectory in the main output directory will be generated with taxonomy graphs for each level of taxonomy tested as well as a text file summary of the taxonomy coverage.

Standard Example usage:

taxa_coverage.py [options] {-i hits_filepath [required] -t taxa_mapping_filepath [required]}

Change taxa depth reported to 5, manually pass 2 hits files to test, use overall mismatches for scoring:

taxa_coverage.py -i primer1f_hits.txt:primer2r_hits.txt -t taxa_map.txt -d 5 -s overall_mismatches

Test all hits file in a target directory, get primer pair results, put output in taxa_coverage directory:

taxa_coverage.py -i hits_dir/ -t taxa_map.txt -p -o taxa_coverage/

Site index


sampledoc