Gene Ontology Lecture Notes

Introduction to Gene Ontology (GO)
  • Definition: GO is a standardized tool that allows the representation and processing of information regarding biological products and functions by providing a controlled vocabulary.
  • Purpose: To describe cellular components, molecular functions, and biological processes.
  • History: Launched in 1998 by representatives from various model organism databases to create a common reference.
Structure and Terms of Gene Ontology
  • As of June 19, 2003:
  • Total Terms: 11020 (1297 components, 5396 functions, 7290 processes)
  • Organization: Terms are structured in a parent-child hierarchy, allowing associations between GO terms and gene entries.
GO Classification
  1. Cellular Component: Locations within the cell where functions occur (e.g., mitochondria, nucleus).
  2. Molecular Function: The activities performed by gene products (e.g., catalysis, transport).
  3. Biological Process: Series of events or actions leading to a biological function (e.g., glycolysis, immune response).
Understanding Molecular Function (MF)
  • Definition: Refers to molecular-level activities (e.g., catalytic activity).
  • Characteristics:
  • Applicable to individual gene products or complexes.
  • Indicated through terms like "protein kinase activity."
  • Does not specify context of action.
Exploring Cellular Components (CC)
  • Definition: Captures the physical locations where molecular functions are performed.
  • Examples:
  • Chromosomes, organelles, virion components.
  • Significance: Provides insights into the physical structure associated with gene products.
Biological Process Ontology
  • Definition: A series of molecular functions working together to result in a specific biological effect.
  • Examples of Terms:
  • Glycolysis (specific), development (general).
  • Relation with Molecular Functions: Functions initiate processes, yet they maintain distinct classifications (not all functions result in processes).
Ontological Distinctions
  • Universals vs. Particulars:
  • Universals: Broad terms such as species or functions.
  • Particulars: Specific instances (e.g., specific bacteria).
  • Continuants vs. Occurrents:
  • Continuants: Persistent entities (cells, organisms).
  • Occurrents: Events/processes that unfold over time.
Organization and Relationships in GO
  • Hierarchical Structure: Organized as a Directed Acyclic Graph (DAG).
  • Each term can have multiple parents/children and inherits properties.
  • Types of Relationships:
  • "is a" (e.g., Glycolysis is a metabolic process).
  • "part of" (e.g., DNA replication is part of cell cycle).
  • "regulates" (e.g., one process influences another).
Using GO in Research
  • Functional Annotation: Used to assign GO terms based on experimental evidence or computational predictions.
  • Comparative Genomics: Allows function comparison across species, identifying conserved pathways.
  • Gene Set Enrichment Analysis (GSEA): Identifies overrepresented GO terms in datasets, signaling active biological processes.
  • Tools for GO:
  • DAVID: Functional annotation and enrichment analysis.
  • GSEA: Gene Set Enrichment Analysis Tool.
  • PANTHER: Analysis through evolutionary relationships.
Benefits of GO
  1. Immediate populating of terms without complex formal logic.
  2. Intuitive incorporation of existing biological vocabularies.
  3. Unique identifiers for consistency in databases.
Drawbacks of GO
  1. Unclear reasoning for permissible use of hierarchies.
  2. Lack of preserved rationale for subclassifications.
  3. No clear procedures for validation of the ontology.
  4. Insufficient rules for representation of concepts.
Conclusion
  • GO serves as a vital framework for understanding biological processes and their functions across species, influencing research strategies and methodologies in genomics and bioinformatics.