Biochemistry: Homology and Sequence Analysis

Homology: Distinguishing Relationships

  • Divergent Evolution: Organisms sharing a common ancestor.

  • Convergent Evolution: Organisms evolving similar traits without a common ancestor.

  • Homologs: Proteins derived from a common ancestor.

    • Paralogs: Homologs within the same species; may have different functions (e.g., human ribonuclease and human angiogenin).

    • Orthologs: Homologs in different species with similar functions (e.g., bovine ribonuclease and human ribonuclease).

Sequence Alignment for Homology Detection

  • Sequence Alignment: A systematic process to compare protein sequences and identify similarities, helping to rule out chance resemblances.

  • Determining Identities: Involves sliding one sequence past another and counting amino acid matches.

  • Introducing Gaps: Gaps can be inserted into sequences to improve alignments, consolidating matches into a single, more meaningful alignment.

  • Alignment Scoring Systems: Assign points for matches (e.g., +10+10 points) and deduct points for gaps (e.g., 25-25 points).

Statistical Significance of Alignments

  • Assessing Significance: Performed by randomly rearranging one of the comparing sequences multiple times to generate a distribution of scores for random alignments.

  • Interpretation: If the score of the original, unshuffled alignment is significantly higher than those from randomized sequences, the similarity is statistically significant and likely not due to chance.

Substitution Matrices

  • Purpose: Used to detect distant evolutionary relationships by scoring amino acid substitutions based on their likelihood in nature.

  • Mechanism: Awards positive points for common, conservative substitutions (similar residues) and subtracts points for rare, radical substitutions (dissimilar residues).

  • Example: Blosum-62 is a common substitution matrix.

  • Advantage: Can reveal homologies that basic sequence identity comparison alone might miss.

Importance of 3D Structure and Databases

  • 3D Structure: Tertiary structure is more indicative of protein function than primary sequence alone. Proteins with dramatically different functions can be paralogs based on structural homology.

  • Sequence Templates: Maps of conserved, structurally, and functionally important residues common to a protein family. These allow detection of homologies even when overall sequence similarity is low.

  • BLAST (Basic Local Alignment Search Tool): A computational tool to search databases for homologous sequences, helping to infer potential functions for unknown proteins based on high-similarity matches.