Description:
The purpose of this module is to analyze conserved sequence hits file from generate_primers_denovo.py to determine if the upstream or downstream sequences from the conserved 3’ primer end are reasonably conserved and suitable for use in designing primers.
This module uses the sequences given in the hits file to calculate the shannon entropies for each position in the primers. The output from this module will contain the overall consensus sequence for each potential primer (going upstream or downstream), a degenerate IUPAC sequence that considers all bases found in each position, a filtered degenerate IUPAC sequence that does not consider bases whose percentage is under a specified value, and the shannon entropy scores for the overall sequences.
Note-the non-filtered IUPAC sequence output will contain a ”.” for positions that contain a “-” character (this results from filling in gap characters at unknown bases that exceed the beginning or end of a sequence in the generate_primers_denovo.py module).
Following this initial analysis, the primers are then sorted into a summary file containing information about each prospective primer and a primer file that is formatted for use with the analyze_primers.py module. The primers can be sorted according to sensitivity (greatest to least), specificity (most to least), or shannon entropy of the overall primer.
Finally, a list of known primers can be passed via the -k option to compare to the de novo primers generated by generate_primers_denovo.py. The de novo primers are compared to the primers passed, and flags any primers that overlap (considering matching degenerate characters as well). The ‘primers_overlap.txt’ file contains information about the overlapping primers for the entire primer set. This file contains a section showing primers that have a ‘match’ to the supplied primers, meaning that the primers overlap and match at the 3’ end. An ‘overlap’ section shows details about primers that overlap with the given primers but do not match at the 3’ end. Finally, a ‘unique’ primers section shows details about primers that do not overlap with the supplied primer set.
These formatted primer files are in the following format: primer_id <tab> primer sequence (5’->3’) Any comments are preceeded by the pound (#) symbol. If known primers are passed with the -k parameter, they need to be in this format as well.
If a standard alignment was used to record indices in the generate_primers_denovo.py module (-a option), this module will detect the presence of the standard aligned indices. If absent, the primers’ numeric names will be based on the initial unaligned index of the sequence they were found in.
Usage: sort_denovo_primers.py [options]
Input Arguments:
Note
[REQUIRED]
[OPTIONAL]
Output:
The output files are a formatted_primers.txt file containing primers in a format compatible with analyze_primers.py, a primers_details.txt file giving information about sensitivity, specificity, and shannon entropy for each primer, and a primers_overlap.txt file showing information about overlap with known primers if the -k parameter is used.
Standard Example usage:
sort_denovo_primers.py [options] {-i input_primer_hits_filepath [required] -o output_directory [required]}
Sort prospective primers with default settings:
sort_denovo_primers.py -i conserved_site_hits.txt -o denovo_primers/
Sort the same primers, increase degeneracy allowed (include nucleotides that occur as little as 10% of the time), test primers against primers from the literature (known_primers.txt), and sort according to specificity:
sort_denovo_primers.py -i conserved_sites_hits.txt -o denovo_primers/ -p 0.10 -k known_primers.txt -S P