Biochemistry: Homology and Sequence Analysis
Homology: Distinguishing Relationships
Divergent Evolution: Organisms sharing a common ancestor.
Convergent Evolution: Organisms evolving similar traits without a common ancestor.
Homologs: Proteins derived from a common ancestor.
Paralogs: Homologs within the same species; may have different functions (e.g., human ribonuclease and human angiogenin).
Orthologs: Homologs in different species with similar functions (e.g., bovine ribonuclease and human ribonuclease).
Sequence Alignment for Homology Detection
Sequence Alignment: A systematic process to compare protein sequences and identify similarities, helping to rule out chance resemblances.
Determining Identities: Involves sliding one sequence past another and counting amino acid matches.
Introducing Gaps: Gaps can be inserted into sequences to improve alignments, consolidating matches into a single, more meaningful alignment.
Alignment Scoring Systems: Assign points for matches (e.g., points) and deduct points for gaps (e.g., points).
Statistical Significance of Alignments
Assessing Significance: Performed by randomly rearranging one of the comparing sequences multiple times to generate a distribution of scores for random alignments.
Interpretation: If the score of the original, unshuffled alignment is significantly higher than those from randomized sequences, the similarity is statistically significant and likely not due to chance.
Substitution Matrices
Purpose: Used to detect distant evolutionary relationships by scoring amino acid substitutions based on their likelihood in nature.
Mechanism: Awards positive points for common, conservative substitutions (similar residues) and subtracts points for rare, radical substitutions (dissimilar residues).
Example: Blosum-62 is a common substitution matrix.
Advantage: Can reveal homologies that basic sequence identity comparison alone might miss.
Importance of 3D Structure and Databases
3D Structure: Tertiary structure is more indicative of protein function than primary sequence alone. Proteins with dramatically different functions can be paralogs based on structural homology.
Sequence Templates: Maps of conserved, structurally, and functionally important residues common to a protein family. These allow detection of homologies even when overall sequence similarity is low.
BLAST (Basic Local Alignment Search Tool): A computational tool to search databases for homologous sequences, helping to infer potential functions for unknown proteins based on high-similarity matches.