Module Detail Information

Type: Module
Short URL:
Description:water uses the Smith-Waterman algorithm (modified for speed enhancments) to calculate the local alignment. A local alignment searches for regions of local similarity between two sequences and need not include the entire length of the sequences. Local alignment methods are very useful for scanning databases or other circumsatnces when you wish to find matches between small regions of sequences, for example between protein domains. Algorithm The Smith-Waterman algorithm is a member of the class of algorithms that can calculate the best score and local alignment in the order of mn steps, (where 'n' and 'm' are the lengths of the two sequences). These dynamic programming algorithms were first developed for protein sequence comparison by Smith and Waterman, though similar methods were independently devised during the late 1960's and early 1970's for use in the fields of speech processing and computer science. Dynamic programming methods ensure the optimal local alignment by exploring all possible alignments and choosing the best. It does this by reading in a scoring matrix that contains values for every possible residue or nucleotide match. water finds an alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix. An important problem is the treatment of gaps, i.e., spaces inserted to optimise the alignment score. A penalty is subtracted from the score for each gap opened (the 'gap open' penalty) and a penalty is subtracted from the score for the total number of gap spaces multiplied by a cost (the 'gap extension' penalty). Typically, the cost of extending a gap is set to be 5-10 times lower than the cost for opening a gap. There are two ways to compute a penalty for a gap of n positions : gap opening penalty + (n - 1) * gap extension penalty gap penalty + n * gap length penalty The first way is used by EMBOSS and WU-BLAST The second way is used by NCBI-BLAST, GCG, Staden and CLUSTAL. Fasta used it for a long time the first way, but Prof. Pearson decided recently to shift to the second. The two methods are basically equivalent. The Smith-Waterman algorithm contains no negative scores in the path matrix it creates. The algorithm starts the alignment at the highest path matrix score and works backwards until a cell contains zero. See the Reference Smith et al. for details.
Input Parameters:
 - A Sequence
 - B Sequence
 - A Start
 - A End
 - A Reverse
 - A Ask
 - A Nucleotide
 - A Protein
 - A Lower Case
 - A Upper Case
 - A Format
 - A Database Name
 - A Entry Name
 - A UFO Features
 - A Features Format
 - A Features File
 - Gap Opening Penalty
 - Gap Extension Penalty
 - B Start
 - B End
 - B Reverse
 - B Ask
 - B Nucleotide
 - B Protein
 - B Lower Case
 - B Upper Case
 - B Format
 - B Database Name
 - B Entry Name
 - B UFO Features
 - B Features Format
 - B Features File
 - Alignment format
 - File Name extension
 - Output directory
 - Base file name
 - Alignment width
 - Show Accession Number
 - Show Description
 - Show Full USA
 - Show Full sequence
 - Turn Off Prompts
 - Standard Output
 - Filter
 - Options
 - Debug
 - Verbose
 - Help
 - Report Warnings
 - Report Errors
 - Report Fatal Errors
 - Report Dying Program Messages
 - Scoring Matrix File
Output Parameters:
 - Output
File size:35.7 KB
View Source    Download    Open