Module Detail Information

Name:Needle
Type: Module
Short URL:
http://bit.ly/1B7Eg44
Description:This program uses the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire length. What is the optimal alignment? Dynamic programming methods ensure the optimal global alignment by exploring all possible alignments and choosing the best. It does this by reading in a scoring matrix that contains values for every possible residue or nucleotide match. Needle finds an alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix. Algorithm The Needleman-Wunsch algorithm is a member of the class of algorithms that can calculate the best score and alignment in the order of mn steps, (where 'n' and 'm' are the lengths of the two sequences). These dynamic programming algorithms were first developed for protein sequence comparison by Needleman and Wunsch, though similar methods were independently devised during the late 1960's and early 1970's for use in the fields of speech processing and computer science. An important problem is the treatment of gaps, i.e., spaces inserted to optimise the alignment score. A penalty is subtracted from the score for each gap opened (the 'gap open' penalty) and a penalty is subtracted from the score for the total number of gap spaces multiplied by a cost (the 'gap extension' penalty). Typically, the cost of extending a gap is set to be 5-10 times lower than the cost for opening a gap. There are two ways to compute a penalty for a gap of n positions : gap opening penalty + (n - 1) * gap extension penalty gap penalty + n * gap length penalty The first way is used by EMBOSS and WU-BLAST The second way is used by NCBI-BLAST, GCG, Staden and CLUSTAL. Fasta used it for a long time the first way, but Prof. Pearson decided recently to shift to the second. The two methods are basically equivalent. In a Needleman-Wunsch global alignment, the entire length of each sequence is aligned. This can be thought of as an overlap between the two sequences (one can be completely within the other, or their ends can overlap). This leaves no penalty for the hanging ends of the overlap. In bioinformatics, it is usually reasonable to assume that the sequences are incomplete and there should be no penalty for failing to align the missing bases.
Executable:
/needle
Input Parameters:
 - A Sequence
 - B Sequence
 - A Start
 - A End
 - A Reverse
 - A Ask
 - A Nucleotide
 - A Protein
 - A Lower Case
 - A Upper Case
 - A Format
 - A Database Name
 - A Entry Name
 - A UFO Features
 - A Features Format
 - A Features File
 - Gap Opening Penalty
 - Gap Extension Penalty
 - B Start
 - B End
 - B Reverse
 - B Ask
 - B Nucleotide
 - B Protein
 - B Lower Case
 - B Upper Case
 - B Format
 - B Database Name
 - B Entry Name
 - B UFO Features
 - B Features Format
 - B Features File
 - Alignment format
 - File Name extension
 - Output directory
 - Base file name
 - Alignment width
 - Show Accession Number
 - Show Description
 - Show Full USA
 - Show Full sequence
 - Turn Off Prompts
 - Standard Output
 - Filter
 - Options
 - Debug
 - Verbose
 - Help
 - Report Warnings
 - Report Errors
 - Report Fatal Errors
 - Report Dying Program Messages
 - Scoring Matrix File
 - No Brief
Output Parameters:
 - Output
File size:36.5 KB
View Source    Download    Open