sort_denovo_primers.py – Analyze, sort, and format de novo primers from generate_primers_denovo.py output

Description:

The purpose of this module is to analyze conserved sequence hits file from generate_primers_denovo.py to determine if the upstream or downstream sequences from the conserved 3’ primer end are reasonably conserved and suitable for use in designing primers.

This module uses the sequences given in the hits file to calculate the shannon entropies for each position in the primers. The output from this module will contain the overall consensus sequence for each potential primer (going upstream or downstream), a degenerate IUPAC sequence that considers all bases found in each position, a filtered degenerate IUPAC sequence that does not consider bases whose percentage is under a specified value, and the shannon entropy scores for the overall sequences.

Note-the non-filtered IUPAC sequence output will contain a ”.” for positions that contain a “-” character (this results from filling in gap characters at unknown bases that exceed the beginning or end of a sequence in the generate_primers_denovo.py module).

Following this initial analysis, the primers are then sorted into a summary file containing information about each prospective primer and a primer file that is formatted for use with the analyze_primers.py module. The primers can be sorted according to sensitivity (greatest to least), specificity (most to least), or shannon entropy of the overall primer.

Finally, a list of known primers can be passed via the -k option to compare to the de novo primers generated by generate_primers_denovo.py. The de novo primers are compared to the primers passed, and flags any primers that overlap (considering matching degenerate characters as well). The ‘primers_overlap.txt’ file contains information about the overlapping primers for the entire primer set. This file contains a section showing primers that have a ‘match’ to the supplied primers, meaning that the primers overlap and match at the 3’ end. An ‘overlap’ section shows details about primers that overlap with the given primers but do not match at the 3’ end. Finally, a ‘unique’ primers section shows details about primers that do not overlap with the supplied primer set.

These formatted primer files are in the following format: primer_id <tab> primer sequence (5’->3’) Any comments are preceeded by the pound (#) symbol. If known primers are passed with the -k parameter, they need to be in this format as well.

If a standard alignment was used to record indices in the generate_primers_denovo.py module (-a option), this module will detect the presence of the standard aligned indices. If absent, the primers’ numeric names will be based on the initial unaligned index of the sequence they were found in.

Usage: sort_denovo_primers.py [options]

Input Arguments:

Note

[REQUIRED]

-i, --hits_file
Conserved Xmers file, output file from generate_primers_denovo.py module.

[OPTIONAL]

-o, --output_dir
Output directory
-p, --variable_pos_freq
Percentage at which a variable base is considered for degenerate primer design. [default: 0.2]
-k, --known_primers_filepath
Optional primers filepath to compare with the de novo primers. Comparisons will be used to separate the de novo primers into unique, partially overlapping, or overlapping primers with the known primers supplied. [default: None]
-S, --sort_method
Sorting method for reverse and forward primers output files, pass either S, O, or P for [S]ensitivity, [O]verall shannon entropy, s[P]ecificity.[default: S]
-P, --primer_name
Root name for primers in the formatted_primers.txt output file. [default: ]
-m, --match_len
Number of base pairs in overlapping sequences to be considered as a significant overlap. [default: 10]
-T, --truncate_len
Number of bases to truncate for the 3’ end of the primer comparison to the universal primer set. [default: 10]
-a, --amplicon_len
Generate primer pair output file that will yield amplicons with an estimated size within the given range. Requires that the standard alignment option (-a) was used with generate_primers_denovo.py. Pass the min and max amplicon size separated by a colon, for example -a 250:500 [default: False]

Output:

The output files are a formatted_primers.txt file containing primers in a format compatible with analyze_primers.py, a primers_details.txt file giving information about sensitivity, specificity, and shannon entropy for each primer, and a primers_overlap.txt file showing information about overlap with known primers if the -k parameter is used.

Standard Example usage:

sort_denovo_primers.py [options] {-i input_primer_hits_filepath [required] -o output_directory [required]}

Sort prospective primers with default settings:

sort_denovo_primers.py -i conserved_site_hits.txt -o denovo_primers/

Sort the same primers, increase degeneracy allowed (include nucleotides that occur as little as 10% of the time), test primers against primers from the literature (known_primers.txt), and sort according to specificity:

sort_denovo_primers.py -i conserved_sites_hits.txt -o denovo_primers/ -p 0.10 -k known_primers.txt -S P

Site index


sampledoc