Bioinformatics: Patterns & Profiles
Learning Objectives
Classifying Sequences
Importance of classification
Methods used
Levels of Classification
Families
Domains
Motifs
Patterns
Profiles
Fingerprints
Classification Tools
Example: InterProScan
Sequence Classification and Functionality
Sequence Similarity
BLAST can identify unknown sequences by finding homologous sequences
Rule of thumb: Similar sequences tend to have similar structures and functions
Protein Classification
Classifying proteins by:
Families
Domains
Sequence features
Grouping new proteins with sequences that share common features
Protein Families
Definition
A protein family consists of proteins sharing a common evolutionary origin and related functions, often with similar sequences or structures
Importance of evolutionary relationships in predicting function
Examples of Families
G protein-coupled receptors
Secretin-like GPCRs
Rhodopsin-like GPCRs
Metabotropic glutamate receptors
Protein Domains
Definition
Domains are distinct functional and/or structural units within a protein, contributing to its role and functionality
Characteristics of Domains
Perform specific functions
Similar domains across different proteins
Example: Cystatin Domains
Length: 115 amino acids
Function: Protease inhibitors with conserved motifs that block protease active sites
Protein Functional Features
Multiple Domains
Proteins can have multiple domains, each with a distinct role
Example: Cathepsin F includes:
Signal peptide for secretion
Regulatory cystatin domain
Prosegment
Protease domain that cleaves proteins
Sequence Features
Definition
Small groups of amino acids that impart biochemical properties to proteins
Types:
Active Sites: Catalytic residues (e.g., Serine proteases have an active site triad of His, Asp, Ser)
Binding Sites: Resides binding to molecules/ions (e.g., Tubulin's GTP binding sites)
Post-Translational Modification (PTM) Sites: Sites for enzymatic modifications, such as glycosylation and phosphorylation
Types of Sequence Features
Motifs
Conserved short sequences with structural and functional importance
Example: The “LSH” motif in cysteine proteases
Patterns
Qualitative consensus sequences derived from multiple sequence alignments
Example: N-glycosylation site denoted as N-{P}-[ST]-{P}
Profiles
Quantitative information captured in position-specific scoring matrices, using substitution matrices for scoring
Fingerprints
Collections of motifs occurring in a specific order relevant for protein functions
Bioinformatics Tools for Classification
InterPro
A comprehensive resource for protein sequence analysis that integrates patterns, profiles, and fingerprints from different databases
PROSITE
Contains documentation for protein domains, families, and functional sites, enhancing classification power
SignalP
Predicts signal peptide cleavage sites in protein sequences to determine secretory pathways
TargetP
Identifies cellular locations of proteins based on their signal sequences
Summary
Key Concepts in Sequence Classification
Families, Domains, Sequence Features (Motifs, Patterns, Profiles, Signals, Fingerprints)
Use of tools: Prosite, SignalP, TargetP, and InterProScan for enhancing protein classification and predicting functions.