1/12
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
What is BLOSUM62 and why is it useful?
BLOSUM62 is one of the most common substituation matrixes used to score amino acid matches in sequence alignments. It is useful because all aino acid changes (mutations) are not equally probable
What is a Genbank?
A database of all known nucleotide sequences
What is a Uniprot?
A database of all known protein sequences
What does it mean when two sequences are homologs?
That they have a common ancestor
What is an E-value?
E-values are used to assess the significance of sequencce database searches. An E-value for a given hit is the number of sequences that by chance would get a better score, Typically you like the E-value to be <0,001 to be significant
What is BLAST and what is it best for?
Heuristic tool for finding local sequence similarity
Fast, but may miss weak/distant matches
Works by findning short exact matches (“words”) and extending them
What is PSI-BLAST and how does it differ from BLAST?
Iterative version of BLAST for proteins
Builds a position-specific scoring matrix (PSSM)
More sensitive than BLAST; detects distant homologs
Slower than BLAST duo to iterations
What is Smith-Waterman and why is it important?
Exact method for local sequence alignment using dynamic programming
Guarantees the optimal alignment
Very sensitive, but slowest
Rank BLAST, PSI-BLAST and Smith-Waterman by speed
BLAST > PSI-BLAST > Smith-Waterman
Rank BLAST, PSI-BLAST and Smith-Waterman by sensitivity
Smith-Waterman > PSI-BLAST > BLAST
What is PSSM?
Position Specific Scoring Matrix
Represents conservation of residues at each position in a sequence alignment
Scores indicate how likely a residue is at a postion compared to background
How is a PSSM constructed?
Start with a query sequence
BLAST search to find similar sequences
Align hits to the query
Count residue frequencies at each position
Convert frequencies to log-odds scores (observed vs backround)
Optionally adjust for small sample sizes with pseudo-counts
What is PSSM used for?
Detect distant homologs
Iterative searches in PSI-BLAST
Identify motifs/domains in proteins
Score new sequences against known patterns