Bioinformatics: Patterns & Profiles

Learning Objectives

  • Classifying Sequences

    • Importance of classification

    • Methods used

  • Levels of Classification

    • Families

    • Domains

    • Motifs

    • Patterns

    • Profiles

    • Fingerprints

  • Classification Tools

    • Example: InterProScan

Sequence Classification and Functionality

  • Sequence Similarity

    • BLAST can identify unknown sequences by finding homologous sequences

    • Rule of thumb: Similar sequences tend to have similar structures and functions

  • Protein Classification

    • Classifying proteins by:

    • Families

    • Domains

    • Sequence features

    • Grouping new proteins with sequences that share common features

Protein Families

  • Definition

    • A protein family consists of proteins sharing a common evolutionary origin and related functions, often with similar sequences or structures

    • Importance of evolutionary relationships in predicting function

  • Examples of Families

    • G protein-coupled receptors

    • Secretin-like GPCRs

    • Rhodopsin-like GPCRs

    • Metabotropic glutamate receptors

Protein Domains

  • Definition

    • Domains are distinct functional and/or structural units within a protein, contributing to its role and functionality

  • Characteristics of Domains

    • Perform specific functions

    • Similar domains across different proteins

  • Example: Cystatin Domains

    • Length: 115 amino acids

    • Function: Protease inhibitors with conserved motifs that block protease active sites

Protein Functional Features

  • Multiple Domains

    • Proteins can have multiple domains, each with a distinct role

    • Example: Cathepsin F includes:

    • Signal peptide for secretion

    • Regulatory cystatin domain

    • Prosegment

    • Protease domain that cleaves proteins

Sequence Features

  • Definition

    • Small groups of amino acids that impart biochemical properties to proteins

  • Types:

    • Active Sites: Catalytic residues (e.g., Serine proteases have an active site triad of His, Asp, Ser)

    • Binding Sites: Resides binding to molecules/ions (e.g., Tubulin's GTP binding sites)

    • Post-Translational Modification (PTM) Sites: Sites for enzymatic modifications, such as glycosylation and phosphorylation

Types of Sequence Features

  • Motifs

    • Conserved short sequences with structural and functional importance

    • Example: The “LSH” motif in cysteine proteases

  • Patterns

    • Qualitative consensus sequences derived from multiple sequence alignments

    • Example: N-glycosylation site denoted as N-{P}-[ST]-{P}

  • Profiles

    • Quantitative information captured in position-specific scoring matrices, using substitution matrices for scoring

  • Fingerprints

    • Collections of motifs occurring in a specific order relevant for protein functions

Bioinformatics Tools for Classification

  • InterPro

    • A comprehensive resource for protein sequence analysis that integrates patterns, profiles, and fingerprints from different databases

  • PROSITE

    • Contains documentation for protein domains, families, and functional sites, enhancing classification power

  • SignalP

    • Predicts signal peptide cleavage sites in protein sequences to determine secretory pathways

  • TargetP

    • Identifies cellular locations of proteins based on their signal sequences

Summary

  • Key Concepts in Sequence Classification

    • Families, Domains, Sequence Features (Motifs, Patterns, Profiles, Signals, Fingerprints)

  • Use of tools: Prosite, SignalP, TargetP, and InterProScan for enhancing protein classification and predicting functions.