Study Notes on Genomes and Genomics
Chapter 14 – Genomes and Genomics
Chapter 10.4 – Determining the Base Sequence of a DNA Segment
Lecture Course: Genetics, BIOL 3166 Lecture 4
Lecture Outline
Obtaining the sequence of a genome.
Bioinformatics: meaning from genomic sequence.
The structure of the human genome.
Comparative genomics of humans with other species.
Functional genomics.
Importance of Gene Sequencing
The sequence of a gene can be utilized to determine the amino acid sequence of the corresponding protein.
Key applications of DNA sequencing include:
Genetic testing
DNA fingerprinting
Characterization of infectious diseases
Paternity testing
Understanding evolutionary processes, such as gene divergence and gene duplication.
DNA Sequencing Techniques
Dideoxy (Sanger) Sequencing
Traditional method of DNA sequencing, often referred to as the classic approach.
Current practices include various next-generation sequencing (NGS) methods utilized for whole-genome sequencing (WGS).
dideoxynucleotides
Dideoxy (sanger) sequencing relies on modified DNA bases called dideoxynucleotides (ddNTP).
These bases cannot form phosphodiester bonds, which is crucial for normal DNA synthesis.
When incorporated into a reaction, they result in the termination of DNA synthesis upon their addition.
Sequencing Reaction Setup
The sequencing reaction consists of:.
DNA template
Sequencing primer
Four dNTPs (adenine (A), guanine (G), cytosine (C), thymine (T))
ddATP (with any one of ddNTPs used in the mix)
DNA polymerase
Key Outcome: Incorporation of ddATP leads to the halting of DNA synthesis, resulting in multiple fragments due to the presence of normal dATP, allowing for random incorporation.
Electrophoresis Separation
The obtained DNA fragments can be separated via electrophoresis.
Four different reactions can be conducted, each with a unique ddNTP producing distinct termination products.
When run on a gel, sequences are read from top to bottom. The ddNTPS are radioactively labeled for visual detection on an X-ray film.
Capillary Gel Electrophoresis
Current method in DNA sequencing is capillary gel electrophoresis.
In this technique:
ddNTPs are fluorescently labeled with different colors.
All four ddNTPs can be used in a single reaction.
Reaction products are separated by size within a capillary rather than a gel.
A scanner then detects the fluorescently marked products, generating a chromatogram, where each color/peak indicates a different base.
Whole Genome Sequencing (WGS)
Overview of WGS Process
Cut many genome copies into random fragments.
Sequence each fragment.
Sequence reads overlap to form contigs.
Contigs overlap for a complete sequence.
Purpose of Whole Genome Sequencing
Comparative genomics: Analyzes genomes from related species to provide evolutionary insights and gene function definitions.
Functional genomics: Employs reverse genetic methods to comprehend gene functions and interactions within biological networks.
Major Commercial DNA Sequencing Technologies
Technology | Sequencing Machine | Read Length (Nucleotides) | Reads per Run | Run Time |
|---|---|---|---|---|
Conventional (Sanger dideoxy sequencing) | ABI Prism 3730 | 400-900 | 96 | 20 minutes to 3 hours |
Roche/454 Pyrosequencer | Roche/454 | 400-600 | 1 million | 7 hours |
Illumina/Solexa HiSeq 2000 | Illumina | 150 × 2 | hundreds of millions | 2 days to 10 days |
Life Technologies Ion Torrent | Ion Torrent | 200 | 5 million | 1 hour |
Pacific Biosciences SMRT sequencing | Pacific Biosciences | ~3000 | up to 75,000 | 7 days |
Next-Gen Sequencing (NGS) Technologies
Various NGS technologies utilize specific procedures for DNA sequencing, such as:
Ligating adaptors to DNA,
Amplifying DNA in emulsion,
Employing computing resources for processing and analysis.
Detailed technical processes vary significantly across methods, but generally include:
Single-stranded DNA immobilization,
Random oligonucleotide techniques,
Target-specific amplification, among others.
Bioinformatics
Bioinformatics emerged in the 1960s due to the need for computational tools to manage and analyze biological data, particularly DNA sequences.
Primary functions of bioinformatics include:
Accessing and processing data,
Storing and sharing information,
Visualizing and annotating genomic data.
Identifying Features in the Genome
Important DNA features to identify in any genome include:
Regulatory elements (where proteins bind to DNA).
Transcription elements, such as promoters and regulatory elements.
Ribosome and tRNA binding sites along the mRNA.
Splice sites for introns and exons.
The goal is to understand complex genomics interactions and the specific roles of genes within a broader biological context.
Gene Identification Approaches
Genes are identified based on conserved regions like:
Ribosome binding sites,
TATA boxes,
The transcriptome or comprehensive analysis of mRNA sequences.
Proteomic analyses, such as mass spectrometry, can connect proteins' amino acid sequences back to their respective gene sequences.
Annotation of Genomic Data
Annotation defines the identification of functional elements within a genome, encompassing:
Open Reading Frames (ORFs), and sites for protein and RNA binding.
Expressed Sequence Tags (ESTs) that correspond to mRNAs, revealing transcriptions.
Utilizing Databases and Tools
Researchers utilize online tools like BLAST (Basic Local Alignment Search Tool) for matching and analyzing DNA and protein sequences, with available search types such as:
Nucleotide BLAST (BLASTN): Matches nucleotide sequences.
Protein BLAST (BLASTP): Matches amino acid sequences.
Translated BLAST (BLASTX): Matches protein sequences derived from translated DNA.
Learning about the Human Genome
The Human Genome Project was a fundamental initiative launched in October 1990 and completed in April 2003, culminating in the complete sequence of the human genome.
It revealed the human genetic blueprint, enhancing the understanding of human biology and healthcare practices.
Findings from the Human Genome Project
Key findings include:
Approximately 20,500 protein-coding genes in the human genome.
45% of the genome comprises repetitive sequences, predominantly transposons.
Under 3% encodes exons of genes, averaging ~150 base pairs per exon, with most mRNA containing around 10 exons.
Introns can range from 1,000 to 100,000 base pairs, significantly larger than exons.
There are roughly 19,000 pseudogenes present.
Comparative Genomics
Comparative genomics aids in understanding evolutionary relationships by examining the genomes of closely related and distantly related species.
Homologs define closely related genes and can be categorized into:
Orthologs: Genes in different species derived from a common ancestor.
Paralogs: Genes within the same genome resulting from gene duplication.
Non-Coding Region Conservation
Non-coding regions between species show less conservation related to fitness but some crucial sequences may be integral for regulatory functions.
Functional Genomics Approach
The field of functional genomics investigates the following:
The function of specific genes
Timing and location of gene expression
Interaction of gene products in biological systems
Reverse Genetics
Reverse genetics starts with identifying a gene, followed by experimentation to:
Mutate, knockout, or over-express the gene
Analyze resulting phenotypic changes.
Genome Editing Techniques
Gene Knockout (KO) eliminates gene expression by creating insertion/deletion mutations (InDels) in DNA.
Gene Knockdown (KD) reduces gene expression targeting mRNA directly using methods like RNAi.
Techniques for knocking down or removing a gene include CRISPR-Cas9 and TALEN, both enabling precise genomic alterations.
Summary
Review practical exercises related to gene sequencing and genomics to enhance understanding.
Engage with assignments and quiz material available on educational platforms for better grasp of genomic concepts.