Ch. 17- Intro to Bioinformatics
Introduction to Bioinformatics
Presenter: Dr. Toby
Course Context: General Biology I
Getting Started
Video Resource
What is Bioinformatics?
General Definition:
Computational techniques for solving biological problems.
Data Problems:
Representation (graphics)
Storage and Retrieval (databases)
Analysis (statistics, artificial intelligence, optimization, etc.)
Biology Problems:
Sequence analysis
Structure or function prediction
Data mining
Also known as “Data Science” for Biology
Chapter Outline
Biotechnology
Mapping Genomes
Whole-Genome Sequencing
Applying Genomics
Genomics and Proteomics
Introduction
The study of nucleic acids began with the discovery of DNA and has since evolved into genomics.
Genomics:
Study of entire genomes, including:
Complete set of genes
Nucleotide sequence and organization
Genetic interactions within and between species
Advances in genomics enabled by DNA sequencing technology.
Analogous to information technology tools like Google Maps, genomic maps support various fields, including anthropology and medicine (e.g., studying human migration and mapping genetic diseases).
Molecular Biology Basics
DNA
DNA can be treated as the "recipe" for organisms made of nucleotides.
Four different nucleotides distinguished by bases:
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
Structure:
DNA is a polymer made of repeating units (nucleotides) and can be viewed as a string of letters: A, C, G, T.
Example sequence: ctgctggaccgggtgctaggaccctgactgcccggg…
The Double Helix
DNA typically consists of two strands twisted into a double helix structure.
Watson-Crick Base Pairing:
A pairs with T
C pairs with G
Structural Components:
Phosphate Molecule
Deoxyribose Sugar Molecule
Weak Hydrogen Bonds between Nitrogenous Bases
Sugar-Phosphate Backbone
Directionality of DNA Strands
Each strand has a direction, denoted as 5' and 3' ends:
Starts with 5' (terminal carbon of the sugar) and ends with 3'.
The DNA strands run antiparallel to each other.
DNA Replication Prior to Cell Division
Illustrated process of complementary strand synthesis from parent strands.
Example:
Parental Strand: ATCG
New Strand: TAGC
RNA involvement in transcription indicated.
Chromosomes
DNA is organized into chromosomes:
Prokaryotes:
Typically have a single circular chromosome (e.g., bacteria, archaea)
Eukaryotes:
Have multiple linear chromosomes unique to each species (e.g., plants, animals, fungi).
Human Chromosomes
Human genome consists of 23 pairs of chromosomes, including sex chromosomes (X and Y).
Genomes
Definition: Complete set of DNA for a given species.
Human genome: 23 pairs of chromosomes.
Example organism genome counts:
Mosquitoes: 3 pairs
Camels: 35 pairs
Every cell (excluding sex cells and mature red blood cells) contains the complete genome of an organism.
Genes
Genes as the basic units of heredity:
Definition: A sequence of bases that carries information for constructing a specific protein (polypeptide).
Encoding Proteins:
The human genome has approximately 25,000 protein-coding genes.
Gene Density
Not all DNA encodes proteins:
Bacteria: ~90% coding genes per kilobase
Humans: ~1.5% coding genes per 35 kilobases
RNA
RNA: Similar to DNA with distinctions:
Different backbone configuration
Often single-stranded
Base Uracil (U) replaces Thymine (T)
Structure represented as a string of characters: A, C, G, U.
Transcription
Enzyme RNA polymerase builds an RNA strand from a gene to create messenger RNA (mRNA).
Example transcription process:
Coding strand of DNA example:
DNA: ATGCCGTTAGACCGTTAGCGGACCTGAC
mRNA: AUGCCGUUAGACCGUUAGCGGACCUGAC
Proteins
Definition: Molecules composed of one or more polypeptides.
Polypeptides: Chains of amino acids, made from 20 different amino acids.
Functions of Proteins:
Structural support
Storage of amino acids
Transport of substances
Coordination of activities
Response to stimuli
Movement
Disease protection
Acceleration of chemical reactions
Amino Acids
List of standard amino acids with three-letter abbreviations:
Alanine (Ala)
Arginine (Arg)
Aspartic Acid (Asp)
and others (complete list through Tyrosine (Tyr) and Valine (Val)).
Example Amino Acid Structure: Hexokinase
Illustrative amino acid sequence:
1: ASX2
2: D
… (sequence continues up to a detailed length).
Involved in glycolysis across organisms.
Hemoglobin
Structure: Made from 4 polypeptides.
Function: Responsible for oxygen transport in red blood cells.
Translation
Ribosomes synthesize proteins from mRNA.
Structure: Organization of codons in a reading frame.
Begins at start codon and ends at stop codon.
Codons and Reading Frames
Example Codon Sequence:
Codon 1: UUU
Codon 2: UUC (encoding Phenylalanine, an amino acid)
… (additional codons for a full genetic code).
RNA Processing in Eukaryotes
Eukaryotes: Organisms with enclosed nuclei (animals, plants, fungi, etc.).
Characteristic: Genes/mRNAs consist of alternating segments of exons and introns:
Exons: Coding parts retained for translation.
Introns: Non-coding parts spliced out before translation.
RNA Splicing
Process Diagram:
Gene: Exon1, Intron1, Exon2, Intron2, Exon3
Result after transcription: Exon1, Exon2, Exon3 (processed mRNA).
Protein Synthesis: Eukaryotes vs Prokaryotes
In eukaryotes: Introns spliced out before mRNA is exported to cytoplasm for translation.
Comparison chart of processes between eukaryotic and prokaryotic systems.
Impact of DNA Sequence Variation
Example Gene A sequences demonstrating how changes can affect the amino acid produced.
Discussion on genetic variation impacts on phenotypes and traits.
RNA Genes
Not all genes produce proteins; some encode RNA products such as:
Ribosomal RNA (rRNA)
Transfer RNA (tRNA)
Micro RNAs (miRNAs)
The Dynamics of Cells
All cells share the same genomic data; however, gene expression varies by cell type, time, and environment.
Networks exist for biochemical interactions including:
Metabolism
Signaling
Gene regulation
Overview of Metabolic Pathways
Illustration: E. coli pathways outlining various metabolism routes (carbohydrates, lipids, amino acids, etc.).
Databases in Bioinformatics
Reference to Kyoto Encyclopedia of Genes and Genomes (KEGG) for mining molecular datasets and metabolic pathways.
Various genomic and protein databases listed with specific entries and significant statistics.
Significance of Genomics Revolution
Data-Driven Biology:
Functional genomics
Comparative genomics
Systems biology
Molecular medicine (gene therapy, pharmacogenomics)
Toxicogenomics
Summary of Bioinformatics Focus
Focus on representation, storage, retrieval, and analysis of biological data:
Including sequences, structures, functions, and activity level interactions among biomolecules.
Encompassing textual data from literature.