1/22
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
initial plan for HGP
Expect to be a 15-year initiative
Gain experience with model organisms first before giving full attention to human genome
In each case, map DNA first and then sequence DNA
Wait to sequence human genome until a new ‘revolutionary’ DNA sequencing method(s) becomes available – replacing Sanger DNA sequencing
Make generating the first sequence of the human genome the signature accomplishment of the HGP
Ensure that this fundamental research benefitted all by championing ELSI
ELSI
acronym for Ethical, Legal, and Social Issues
includes all non-technical issues that arise when developing emerging science and technologies and implementing them in society
T coined by James Watson, the director of HGI at NIH in 1988
The ELSI Research Program was born in the Human Genome Project that started in the US in January 1990
ELSI priority areas
Privacy and fairness in the use and interpretation of genetic information
Clinical integration of new genetic technologies
Issues surrounding genetics research
Public and professional education
1990 in genomics
scientific community had mixed opinions about HGP
No detailed start-to-finish plan for executing HGP(i.e., overt expectation to ‘figure it out along the way’)
Genomics was a ‘toddler’ field, growing up as a melting pot of scientific other disciplines
Either no internet or painfully early days of a functional internet
HGP organisms
Yeast: Used to study fundamental biological processes in a eukaryotic organism
Fruit Fly: A long-standing model for understanding genetics, especially related to human biological processes
Mouse: The most common animal model due to significant genetic and physiological similarities to humans (around 85% similar protein-coding regions), allowing for the study of human diseases and the discovery of new drugs
Zebrafish: A vertebrate model organism with a short life cycle, allowing for the study of processes like heart and blood vessel formation and development of complex structures like eyes and brains
Humans
clone based physical mapping
restriction enzymes break up parts of the chromosome
contigs
a continuous stretch of DNA sequence assembled from overlapping DNA fragments, such as short reads or cloned DNA segments.
overlapping pieces are aligned and merged to form a complete, gap-free sequence that represents a larger, contiguous region of the genome.
Contigs further organized into scaffolds to form a complet
cystic fibrosis gene mapping
gene identification for cystic fibrosis found in chromosome 7
subclone construction
a molecular biology technique to transfer a specific DNA fragment (the "insert") from one plasmid vector to another "destination" vector.
isolating the insert, preparing the destination vector,
joining the two DNA fragments through ligation
inserting the resulting recombinant plasmid into bacterial cells
screening for successful subclones.
shotgun sequencing
a laboratory technique for determining the DNA sequence of an organism’s genome (or part of the genome).
method involves randomly breaking up the DNA into small fragments that are then sequenced individually. A computer program looks for overlaps in the DNA sequences, using them to reassemble the fragments in their correct order to determine the sequence of the starting DNA.
clone-by-clone sequencing strategy
map of each chromosome of the genome is made before the DNA is split up into fragments. These chunks of DNA are inserted into Bacterial Artificial Chromosome (BAC) libraries and put inside bacterial cells to grow before sequencing.
sequence reads
the specific DNA sequence obtained from a single, short piece of DNA that is sequenced during a DNA sequencing experiment
These individual reads are then computationally assembled, like pieces of a puzzle, to reconstruct the original, complete DNA sequence
Sequence of base pairs: A read represents an ordered sequence of DNA's chemical building blocks, known as base pairs (adenine, guanine, cytosine, and thymine).
Fragment-based: Each read comes from a single, fragmented section of a larger DNA molecule.
first eukaryotic genomes sequenced by HGP
First DNA Genome (1977): The DNA genome of bacteriophage ϕX174, a small virus, was the first DNA genome sequenced by Frederick Sanger's team.
First Cellular Genome (1995): The bacterium Haemophilus influenzae was the first complete sequence of a cellular organism.
First Eukaryotic Genome (1996): Saccharomyces cerevisiae, or baker's yeast, was the first eukaryotic genome to be fully sequenced.
First Multicellular Genome (1998): The genomic sequence for the nematode C. elegans was announced in 1998, making it the first multicellular organism to have its genome sequenced.
The Drosophila melanogaster (fruit fly) genome was sequenced and published in March 2000
First Human Genome (2001): The world's first draft of the human genome was completed, a monumental map of human genetic
HGP donor genome
70% of one individual with blended ancestry
30% from 19 individuals mostly from european ancestry
mosaic representation
challenges of human genome sequencing
Human Genome: ~3,000,000,000 nucleotides(bases or base pairs)
Sanger DNA sequencing Circa 1990: ~500-800bases per read
‘Coverage’ (i.e., number of time each base is read) needed to be high (e.g., >30-fold) to attain high accuracy
Roughly half of human genome consists repetitive DNA, much of it reflecting remnants of transposable elements (difficult to read)
first human genome sequence
6 Countries, 20 Centers, 1000’s of researchers
~1,000 bases/second, 24 hours/day, & 7 days/week for ~6 years
Brute force using Sanger DNA sequencing and massive computational help
use of both clone by clone and whole genome shotgun sequencing
James Watson and Francis Collins
HGP-clone by clone shotgun sequnecing
Collins had a leading role in the HGP taking a strategic mapping-based approach having succeeded in cloning some major disease genes (CF, DMD) as a determined and elegant research scientist.
Celera Genomics (Craig Venter)
whole genome shotgun sequencing
Initially Venter worked within the HGP consortium, but he clashed with them over strategy and personality. There is no doubt his more high-tech and faster approach raised the game and pioneered some approaches that led to more rapid progress.
2022
a truly complete (‘telomere-to-telomere’) human genome sequence was finally generated
Bermuda Principles for Data Sharing
Significant attention to release and sharing of HGP genome sequence data
Two seminal meetings in Bermuda in1 996 and 1997
Landmark agreement for rapid data release and public access to HGP genome sequence data
Became known as ‘Bermuda Principles’
Among the most important legacy of HGP
HGP output
Declared complete April 2003
3 billion base pairs (bp) (3164.7 million precisely)
1.1% exons, 24% introns, 75% intergenic DNA
Approx. 3 million single nucleotide polymorphisms (SNP’s)
Less than 1% of all SNP’s cause changes in proteins
Approximately 20K genes
The average gene consists of 3000 bases, but sizes vary greatly (Dystrophin gene is 2.4 million bases).
Almost all (99.9%) bases are exactly the same in all people
The functions are unknown for over 50% of discovered genes
Chromosome 1 had the most genes (2968, now 4220), and the Y the fewest (231, now 693)
Challenges: What We Still Don’t Know
reference genome
a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species.
They are assembled from the sequencing of DNA from a number of individual donors, not reflecting any individual at the genetic level.
Instead a reference provides a haploid mosaic of different DNA sequences from multiple donors.
human pangenome
new human pangenome (2023)
The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals
Captures known variants and haplotypes and reveals new alleles at structurally complex loci
Adds 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing referenceGRCh38
first draft of the human pangenome reference.
These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels.
Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci