Gene Ontology Lecture Notes
Introduction to Gene Ontology (GO)
- Definition: GO is a standardized tool that allows the representation and processing of information regarding biological products and functions by providing a controlled vocabulary.
- Purpose: To describe cellular components, molecular functions, and biological processes.
- History: Launched in 1998 by representatives from various model organism databases to create a common reference.
Structure and Terms of Gene Ontology
- As of June 19, 2003:
- Total Terms: 11020 (1297 components, 5396 functions, 7290 processes)
- Organization: Terms are structured in a parent-child hierarchy, allowing associations between GO terms and gene entries.
GO Classification
- Cellular Component: Locations within the cell where functions occur (e.g., mitochondria, nucleus).
- Molecular Function: The activities performed by gene products (e.g., catalysis, transport).
- Biological Process: Series of events or actions leading to a biological function (e.g., glycolysis, immune response).
Understanding Molecular Function (MF)
- Definition: Refers to molecular-level activities (e.g., catalytic activity).
- Characteristics:
- Applicable to individual gene products or complexes.
- Indicated through terms like "protein kinase activity."
- Does not specify context of action.
Exploring Cellular Components (CC)
- Definition: Captures the physical locations where molecular functions are performed.
- Examples:
- Chromosomes, organelles, virion components.
- Significance: Provides insights into the physical structure associated with gene products.
Biological Process Ontology
- Definition: A series of molecular functions working together to result in a specific biological effect.
- Examples of Terms:
- Glycolysis (specific), development (general).
- Relation with Molecular Functions: Functions initiate processes, yet they maintain distinct classifications (not all functions result in processes).
Ontological Distinctions
- Universals vs. Particulars:
- Universals: Broad terms such as species or functions.
- Particulars: Specific instances (e.g., specific bacteria).
- Continuants vs. Occurrents:
- Continuants: Persistent entities (cells, organisms).
- Occurrents: Events/processes that unfold over time.
Organization and Relationships in GO
- Hierarchical Structure: Organized as a Directed Acyclic Graph (DAG).
- Each term can have multiple parents/children and inherits properties.
- Types of Relationships:
- "is a" (e.g., Glycolysis is a metabolic process).
- "part of" (e.g., DNA replication is part of cell cycle).
- "regulates" (e.g., one process influences another).
Using GO in Research
- Functional Annotation: Used to assign GO terms based on experimental evidence or computational predictions.
- Comparative Genomics: Allows function comparison across species, identifying conserved pathways.
- Gene Set Enrichment Analysis (GSEA): Identifies overrepresented GO terms in datasets, signaling active biological processes.
- Tools for GO:
- DAVID: Functional annotation and enrichment analysis.
- GSEA: Gene Set Enrichment Analysis Tool.
- PANTHER: Analysis through evolutionary relationships.
Benefits of GO
- Immediate populating of terms without complex formal logic.
- Intuitive incorporation of existing biological vocabularies.
- Unique identifiers for consistency in databases.
Drawbacks of GO
- Unclear reasoning for permissible use of hierarchies.
- Lack of preserved rationale for subclassifications.
- No clear procedures for validation of the ontology.
- Insufficient rules for representation of concepts.
Conclusion
- GO serves as a vital framework for understanding biological processes and their functions across species, influencing research strategies and methodologies in genomics and bioinformatics.