Study Notes on Genomes and Genomics

Chapter 14 – Genomes and Genomics

Chapter 10.4 – Determining the Base Sequence of a DNA Segment

Lecture Course: Genetics, BIOL 3166 Lecture 4


Lecture Outline

  1. Obtaining the sequence of a genome.

  2. Bioinformatics: meaning from genomic sequence.

  3. The structure of the human genome.

  4. Comparative genomics of humans with other species.

  5. Functional genomics.


Importance of Gene Sequencing

  • The sequence of a gene can be utilized to determine the amino acid sequence of the corresponding protein.

  • Key applications of DNA sequencing include:

    • Genetic testing

    • DNA fingerprinting

    • Characterization of infectious diseases

    • Paternity testing

    • Understanding evolutionary processes, such as gene divergence and gene duplication.


DNA Sequencing Techniques

Dideoxy (Sanger) Sequencing

  • Traditional method of DNA sequencing, often referred to as the classic approach.

  • Current practices include various next-generation sequencing (NGS) methods utilized for whole-genome sequencing (WGS).


dideoxynucleotides

  • Dideoxy (sanger) sequencing relies on modified DNA bases called dideoxynucleotides (ddNTP).

  • These bases cannot form phosphodiester bonds, which is crucial for normal DNA synthesis.

  • When incorporated into a reaction, they result in the termination of DNA synthesis upon their addition.


Sequencing Reaction Setup

  • The sequencing reaction consists of:.

    • DNA template

    • Sequencing primer

    • Four dNTPs (adenine (A), guanine (G), cytosine (C), thymine (T))

    • ddATP (with any one of ddNTPs used in the mix)

    • DNA polymerase

  • Key Outcome: Incorporation of ddATP leads to the halting of DNA synthesis, resulting in multiple fragments due to the presence of normal dATP, allowing for random incorporation.


Electrophoresis Separation

  • The obtained DNA fragments can be separated via electrophoresis.

  • Four different reactions can be conducted, each with a unique ddNTP producing distinct termination products.

  • When run on a gel, sequences are read from top to bottom. The ddNTPS are radioactively labeled for visual detection on an X-ray film.


Capillary Gel Electrophoresis

  • Current method in DNA sequencing is capillary gel electrophoresis.

  • In this technique:

    • ddNTPs are fluorescently labeled with different colors.

    • All four ddNTPs can be used in a single reaction.

    • Reaction products are separated by size within a capillary rather than a gel.

    • A scanner then detects the fluorescently marked products, generating a chromatogram, where each color/peak indicates a different base.


Whole Genome Sequencing (WGS)

Overview of WGS Process

  1. Cut many genome copies into random fragments.

  2. Sequence each fragment.

  3. Sequence reads overlap to form contigs.

  4. Contigs overlap for a complete sequence.

Purpose of Whole Genome Sequencing

  • Comparative genomics: Analyzes genomes from related species to provide evolutionary insights and gene function definitions.

  • Functional genomics: Employs reverse genetic methods to comprehend gene functions and interactions within biological networks.


Major Commercial DNA Sequencing Technologies

Technology

Sequencing Machine

Read Length (Nucleotides)

Reads per Run

Run Time

Conventional (Sanger dideoxy sequencing)

ABI Prism 3730

400-900

96

20 minutes to 3 hours

Roche/454 Pyrosequencer

Roche/454

400-600

1 million

7 hours

Illumina/Solexa HiSeq 2000

Illumina

150 × 2

hundreds of millions

2 days to 10 days

Life Technologies Ion Torrent

Ion Torrent

200

5 million

1 hour

Pacific Biosciences SMRT sequencing

Pacific Biosciences

~3000

up to 75,000

7 days


Next-Gen Sequencing (NGS) Technologies

  • Various NGS technologies utilize specific procedures for DNA sequencing, such as:

    • Ligating adaptors to DNA,

    • Amplifying DNA in emulsion,

    • Employing computing resources for processing and analysis.

  • Detailed technical processes vary significantly across methods, but generally include:

    • Single-stranded DNA immobilization,

    • Random oligonucleotide techniques,

    • Target-specific amplification, among others.


Bioinformatics

  • Bioinformatics emerged in the 1960s due to the need for computational tools to manage and analyze biological data, particularly DNA sequences.

  • Primary functions of bioinformatics include:

    • Accessing and processing data,

    • Storing and sharing information,

    • Visualizing and annotating genomic data.


Identifying Features in the Genome

  • Important DNA features to identify in any genome include:

    • Regulatory elements (where proteins bind to DNA).

    • Transcription elements, such as promoters and regulatory elements.

    • Ribosome and tRNA binding sites along the mRNA.

    • Splice sites for introns and exons.

  • The goal is to understand complex genomics interactions and the specific roles of genes within a broader biological context.

Gene Identification Approaches

  • Genes are identified based on conserved regions like:

    • Ribosome binding sites,

    • TATA boxes,

    • The transcriptome or comprehensive analysis of mRNA sequences.

  • Proteomic analyses, such as mass spectrometry, can connect proteins' amino acid sequences back to their respective gene sequences.


Annotation of Genomic Data

  • Annotation defines the identification of functional elements within a genome, encompassing:

    • Open Reading Frames (ORFs), and sites for protein and RNA binding.

    • Expressed Sequence Tags (ESTs) that correspond to mRNAs, revealing transcriptions.

Utilizing Databases and Tools

  • Researchers utilize online tools like BLAST (Basic Local Alignment Search Tool) for matching and analyzing DNA and protein sequences, with available search types such as:

    • Nucleotide BLAST (BLASTN): Matches nucleotide sequences.

    • Protein BLAST (BLASTP): Matches amino acid sequences.

    • Translated BLAST (BLASTX): Matches protein sequences derived from translated DNA.


Learning about the Human Genome

  • The Human Genome Project was a fundamental initiative launched in October 1990 and completed in April 2003, culminating in the complete sequence of the human genome.

    • It revealed the human genetic blueprint, enhancing the understanding of human biology and healthcare practices.

Findings from the Human Genome Project

  • Key findings include:

    • Approximately 20,500 protein-coding genes in the human genome.

    • 45% of the genome comprises repetitive sequences, predominantly transposons.

    • Under 3% encodes exons of genes, averaging ~150 base pairs per exon, with most mRNA containing around 10 exons.

    • Introns can range from 1,000 to 100,000 base pairs, significantly larger than exons.

    • There are roughly 19,000 pseudogenes present.


Comparative Genomics

  • Comparative genomics aids in understanding evolutionary relationships by examining the genomes of closely related and distantly related species.

  • Homologs define closely related genes and can be categorized into:

    • Orthologs: Genes in different species derived from a common ancestor.

    • Paralogs: Genes within the same genome resulting from gene duplication.

Non-Coding Region Conservation

  • Non-coding regions between species show less conservation related to fitness but some crucial sequences may be integral for regulatory functions.


Functional Genomics Approach

  • The field of functional genomics investigates the following:

    • The function of specific genes

    • Timing and location of gene expression

    • Interaction of gene products in biological systems

Reverse Genetics

  • Reverse genetics starts with identifying a gene, followed by experimentation to:

    • Mutate, knockout, or over-express the gene

    • Analyze resulting phenotypic changes.

Genome Editing Techniques

  • Gene Knockout (KO) eliminates gene expression by creating insertion/deletion mutations (InDels) in DNA.

  • Gene Knockdown (KD) reduces gene expression targeting mRNA directly using methods like RNAi.

  • Techniques for knocking down or removing a gene include CRISPR-Cas9 and TALEN, both enabling precise genomic alterations.


Summary

  1. Review practical exercises related to gene sequencing and genomics to enhance understanding.

  2. Engage with assignments and quiz material available on educational platforms for better grasp of genomic concepts.