Study Notes on Genomes and Genomics

Chapter 14 – Genomes and Genomics

Chapter 10.4 – Determining the Base Sequence of a DNA Segment

Lecture Course: Genetics, BIOL 3166 Lecture 4

Lecture Outline

Obtaining the sequence of a genome.
Bioinformatics: meaning from genomic sequence.
The structure of the human genome.
Comparative genomics of humans with other species.
Functional genomics.

Importance of Gene Sequencing

The sequence of a gene can be utilized to determine the amino acid sequence of the corresponding protein.
Key applications of DNA sequencing include:
- Genetic testing
- DNA fingerprinting
- Characterization of infectious diseases
- Paternity testing
- Understanding evolutionary processes, such as gene divergence and gene duplication.

DNA Sequencing Techniques

Dideoxy (Sanger) Sequencing

Traditional method of DNA sequencing, often referred to as the classic approach.
Current practices include various next-generation sequencing (NGS) methods utilized for whole-genome sequencing (WGS).

dideoxynucleotides

Dideoxy (sanger) sequencing relies on modified DNA bases called dideoxynucleotides (ddNTP).
These bases cannot form phosphodiester bonds, which is crucial for normal DNA synthesis.
When incorporated into a reaction, they result in the termination of DNA synthesis upon their addition.

Sequencing Reaction Setup

The sequencing reaction consists of:.
- DNA template
- Sequencing primer
- Four dNTPs (adenine (A), guanine (G), cytosine (C), thymine (T))
- ddATP (with any one of ddNTPs used in the mix)
- DNA polymerase
Key Outcome: Incorporation of ddATP leads to the halting of DNA synthesis, resulting in multiple fragments due to the presence of normal dATP, allowing for random incorporation.

Electrophoresis Separation

The obtained DNA fragments can be separated via electrophoresis.
Four different reactions can be conducted, each with a unique ddNTP producing distinct termination products.
When run on a gel, sequences are read from top to bottom. The ddNTPS are radioactively labeled for visual detection on an X-ray film.

Capillary Gel Electrophoresis

Current method in DNA sequencing is capillary gel electrophoresis.
In this technique:
- ddNTPs are fluorescently labeled with different colors.
- All four ddNTPs can be used in a single reaction.
- Reaction products are separated by size within a capillary rather than a gel.
- A scanner then detects the fluorescently marked products, generating a chromatogram, where each color/peak indicates a different base.

Whole Genome Sequencing (WGS)

Overview of WGS Process

Cut many genome copies into random fragments.
Sequence each fragment.
Sequence reads overlap to form contigs.
Contigs overlap for a complete sequence.

Purpose of Whole Genome Sequencing

Comparative genomics: Analyzes genomes from related species to provide evolutionary insights and gene function definitions.
Functional genomics: Employs reverse genetic methods to comprehend gene functions and interactions within biological networks.

Major Commercial DNA Sequencing Technologies

Technology	Sequencing Machine	Read Length (Nucleotides)	Reads per Run	Run Time
Conventional (Sanger dideoxy sequencing)	ABI Prism 3730	400-900	96	20 minutes to 3 hours
Roche/454 Pyrosequencer	Roche/454	400-600	1 million	7 hours
Illumina/Solexa HiSeq 2000	Illumina	150 × 2	hundreds of millions	2 days to 10 days
Life Technologies Ion Torrent	Ion Torrent	200	5 million	1 hour
Pacific Biosciences SMRT sequencing	Pacific Biosciences	~3000	up to 75,000	7 days

Next-Gen Sequencing (NGS) Technologies

Various NGS technologies utilize specific procedures for DNA sequencing, such as:
- Ligating adaptors to DNA,
- Amplifying DNA in emulsion,
- Employing computing resources for processing and analysis.
Detailed technical processes vary significantly across methods, but generally include:
- Single-stranded DNA immobilization,
- Random oligonucleotide techniques,
- Target-specific amplification, among others.

Bioinformatics

Bioinformatics emerged in the 1960s due to the need for computational tools to manage and analyze biological data, particularly DNA sequences.
Primary functions of bioinformatics include:
- Accessing and processing data,
- Storing and sharing information,
- Visualizing and annotating genomic data.

Identifying Features in the Genome

Important DNA features to identify in any genome include:
- Regulatory elements (where proteins bind to DNA).
- Transcription elements, such as promoters and regulatory elements.
- Ribosome and tRNA binding sites along the mRNA.
- Splice sites for introns and exons.
The goal is to understand complex genomics interactions and the specific roles of genes within a broader biological context.

Gene Identification Approaches

Genes are identified based on conserved regions like:
- Ribosome binding sites,
- TATA boxes,
- The transcriptome or comprehensive analysis of mRNA sequences.
Proteomic analyses, such as mass spectrometry, can connect proteins' amino acid sequences back to their respective gene sequences.

Annotation of Genomic Data

Annotation defines the identification of functional elements within a genome, encompassing:
- Open Reading Frames (ORFs), and sites for protein and RNA binding.
- Expressed Sequence Tags (ESTs) that correspond to mRNAs, revealing transcriptions.

Utilizing Databases and Tools

Researchers utilize online tools like BLAST (Basic Local Alignment Search Tool) for matching and analyzing DNA and protein sequences, with available search types such as:
- Nucleotide BLAST (BLASTN): Matches nucleotide sequences.
- Protein BLAST (BLASTP): Matches amino acid sequences.
- Translated BLAST (BLASTX): Matches protein sequences derived from translated DNA.

Learning about the Human Genome

The Human Genome Project was a fundamental initiative launched in October 1990 and completed in April 2003, culminating in the complete sequence of the human genome.
- It revealed the human genetic blueprint, enhancing the understanding of human biology and healthcare practices.

Findings from the Human Genome Project

Key findings include:
- Approximately 20,500 protein-coding genes in the human genome.
- 45% of the genome comprises repetitive sequences, predominantly transposons.
- Under 3% encodes exons of genes, averaging ~150 base pairs per exon, with most mRNA containing around 10 exons.
- Introns can range from 1,000 to 100,000 base pairs, significantly larger than exons.
- There are roughly 19,000 pseudogenes present.

Comparative Genomics

Comparative genomics aids in understanding evolutionary relationships by examining the genomes of closely related and distantly related species.
Homologs define closely related genes and can be categorized into:
- Orthologs: Genes in different species derived from a common ancestor.
- Paralogs: Genes within the same genome resulting from gene duplication.

Non-Coding Region Conservation

Non-coding regions between species show less conservation related to fitness but some crucial sequences may be integral for regulatory functions.

Functional Genomics Approach

The field of functional genomics investigates the following:
- The function of specific genes
- Timing and location of gene expression
- Interaction of gene products in biological systems

Reverse Genetics

Reverse genetics starts with identifying a gene, followed by experimentation to:
- Mutate, knockout, or over-express the gene
- Analyze resulting phenotypic changes.

Genome Editing Techniques

Gene Knockout (KO) eliminates gene expression by creating insertion/deletion mutations (InDels) in DNA.
Gene Knockdown (KD) reduces gene expression targeting mRNA directly using methods like RNAi.
Techniques for knocking down or removing a gene include CRISPR-Cas9 and TALEN, both enabling precise genomic alterations.

Summary

Review practical exercises related to gene sequencing and genomics to enhance understanding.
Engage with assignments and quiz material available on educational platforms for better grasp of genomic concepts.