KK

Genome Sequencing & HGP Notes

  • Definition: Whole genome sequencing (WGS) is the comprehensive process of determining the complete DNA sequence of an organism's genome. This includes the identification of all nucleotide sequences within a genome, facilitating a deeper understanding of genetic information and functions.

  • Significance: Over 500,000 human genomes have been sequenced to date, providing a valuable database for genetics research. This extensive data enables scientists to study genetic variability, disease predisposition, and the impact of environmental factors on genetic expression.

Key Milestones in Genomic Sequencing

  • Phage X174: The first genome sequenced in 1977, consisting of 5,375 base pairs (bp). This milestone marked the beginning of modern genomic sequencing.

  • Haemophilus influenzae: Sequenced in 1995, this was the first bacterium to have its genome decoded, comprising 1,830,000 bp. Its sequencing paved the way for the understanding of bacterial genomics and antibiotic resistance.

  • Human Genome Project (HGP): Completed in 2003, the HGP resulted in the sequencing of approximately 3,200,000,000 bp of the human genome. It fundamental shifted the field of genetics, revealing insights about human biology, health, and disease.

  • Various organisms that have been sequenced include:

    • Drosophila melanogaster (fruit fly): Important for genetic research and understanding development.

    • Arabidopsis thaliana (first plant): Significant in plant biology and genetics.

    • Escherichia coli (bacterium): Widely used in biotechnology and genetic engineering.

    • Ciona intestinalis (sea squirt): Provides insights into evolutionary biology and developmental processes.

Genome Sequencing Strategies

  • Basic Strategy:

    1. Fragment a large DNA molecule into smaller, manageable pieces.

    2. Create a DNA library of these fragments for sequencing.

    3. Sequence the individual fragments and then assemble the complete DNA sequence by aligning the overlapping regions of the fragments.

  • Cutting and Cloning:

    • DNA libraries are created by cloning specific DNA fragments into vectors, enabling their amplification in bacterial colonies.

    • The cloning process allows for replication of the DNA for easier sequencing and analysis.

Types of DNA Libraries

  • Genomic DNA Libraries:

    • Contains all DNA sequences, both coding (genes) and non-coding regions (regulatory sequences, introns).

    • These libraries are crucial for comprehensive sequencing projects, like the human genome project, as they represent the full genetic blueprint of an organism.

  • cDNA Libraries:

    • Represents expressed genes in a particular cell or tissue at a specific time, providing insight into gene activity and expression patterns.

    • Created from mRNA through reverse transcription to cDNA, thus lacking introns, making them ideal for studying gene function and regulation.

Human Genome Project (HGP)

  • Initiation: Launched in 1990, the HGP aimed to sequence the entire human genome, which consists of 3 billion base pairs of DNA across 23 pairs of chromosomes.

  • Goals:

    • Identify and map all human genes, estimated to be between 20,000 and 25,000 in number.

    • Determine DNA sequences of model organisms to facilitate comparative studies.

    • Create accessible databases to store sequencing data for researchers worldwide.

    • Address ethical, legal, and social issues arising from genomic research, ensuring the benefits of genomic information are shared equitably.

  • Instrumentation: The project utilized two primary sequencing methods:

    1. Whole-genome shotgun method by Celera: This method aimed for speed by randomly sequencing small DNA fragments and using computational methods to assemble the genome.

    2. Hierarchical shotgun method employed by the public project, where the genome was first mapped into clones before sequencing, allowing for a more organized approach to assembling the complete genomic sequence.

Advances in DNA Sequencing Technologies

  • First Generation: Sanger sequencing, known for its chain-termination method, allowed for the sequencing of shorter DNA fragments and set the standard for genetic analyses.

  • Second Generation: Technologies such as Illumina have vastly improved throughput, enabling high-speed sequencing of millions of fragments simultaneously, significantly reducing costs and time.

  • Third Generation (Next-Gen): Innovations like real-time sequencing and single-molecule resolution, exemplified by platforms like PacBio and Oxford Nanopore, allow for longer read lengths and provide greater insights into structural variants within genomes.

Future Implications

  • Cost-Reduction: The cost of sequencing has drastically decreased from approximately $1 billion for the HGP to under $1,000 today. This reduction opens doors for widespread genomic testing and personalized medicine.

  • Biological and Medical Insights: The advancements in genomic sequencing technologies have revolutionized our understanding of genetic foundations of diseases, enabling the development of targeted therapies and precision medicine approaches for various genetic conditions.

Number of Protein-Coding Genes in the Human Genome

  • Current estimates of protein-coding genes are approximately 21,306; however, earlier estimates ranged as low as 6,700, reflecting the evolving understanding of gene functionality and the complexity of genomic interpretation.

  • This ongoing debate about the number of genes underscores the intricacies involved in gene structure and regulation, as well as our expanding knowledge of non-coding DNA's role in genome function.