In the preceding sections you have learnt that it is the sequence of bases inDNA that determines the genetic information of a given organism. In otherwords, genetic make-up of an organism or an individual lies in the DNAsequences. If two individuals differ, then their DNA sequences should alsobe different, at least at some places. These assumptions led to the quest offinding out the complete DNA sequence of human genome. With theestablishment of genetic engineering techniques where it was possible toisolate and clone any piece of DNA and availability of simple and fasttechniques for determining DNA sequences, a very ambitious project ofsequencing human genome was launched in the year 1990.Human Genome Project (HGP) was called a mega project. You canimagine the magnitude and the requirements for the project if we simplydefine the aims of the project as follows:Human genome is said to have approximately 3 x 109 bp, and if thecost of sequencing required is US $ 3 per bp (the estimated cost in thebeginning), the total estimated cost of the project would be approximately9 billion US dollars. Further, if the obtained sequences were to be storedin typed form in books, and if each page of the book contained 1000letters and each book contained 1000 pages, then 3300 such books wouldbe required to store the information of DNA sequence from a single humancell. The enormous amount of data expected to be generated alsonecessitated the use of high speed computational devices for data storageand retrieval, and analysis. HGP was closely associated with the rapiddevelopment of a new area in biology called Bioinformatics.Goals of HGPSome of the important goals of HGP were as follows:(i) Identify all the approximately 20,000-25,000 genes in human DNA;(ii) Determine the sequences of the 3 billion chemical base pairs thatmake up human DNA;(iiii) Store this information in databases;(iv) Improve tools for data analysis;(v) Transfer related technologies to other sectors, such as industries;(vi) Address the ethical, legal, and social issues (ELSI) that may arisefrom the project.The Human Genome Project was a 13-year project coordinated bythe U.S. Department of Energy and the National Institute of Health. Duringthe early years of the HGP, the Wellcome Trust (U.K.) became a majorpartner; additional contributions came from Japan, France, Germany,China and others. The project was completed in 2003. Knowledge aboutthe effects of DNA variations among individuals can lead to revolutionarynew ways to diagnose, treat and someday prevent the thousands of
disorders that affect human beings. Besides providing clues tounderstanding human biology, learning about non-human organismsDNA sequences can lead to an understanding of their natural capabilitiesthat can be applied toward solving challenges in health care, agriculture,energy production, environmental remediation. Many non-human modelorganisms, such as bacteria, yeast, Caenorhabditis elegans (a free livingnon-pathogenic nematode), Drosophila (the fruit fly), plants (rice andArabidopsis), etc., have also been sequenced.Methodologies : The methods involved two major approaches. Oneapproach focused on identifying all the genes that are expressed asRNA (referred to as Expressed Sequence Tags (ESTs). The other tookthe blind approach of simply sequencing the whole set of genome thatcontained all the coding and non-coding sequence, and later assigningdifferent regions in the sequence with functions (a term referred to asSequence Annotation). For sequencing, the total DNA from a cell isisolated and converted into random fragments of relatively smaller sizes(recall DNA is a very long polymer, and there are technical limitations insequencing very long pieces of DNA) and cloned in suitable host usingspecialised vectors. The cloning resulted into amplification of each pieceof DNA fragment so that it subsequently could be sequenced with ease.The commonly used hosts were bacteria and yeast, and the vectors werecalled as BAC (bacterial artificial chromosomes), and YAC (yeast artificialchromosomes).The fragments were sequenced using automated DNA sequencers thatworked on the principle of a method developed by Frederick Sanger.(Remember, Sanger is also credited for developing method fordetermination of amino acidsequences in proteins). Thesesequences were then arranged basedon some overlapping regionspresent in them. This requiredgeneration of overlapping fragmentsfor sequencing. Alignment of thesesequences was humanly notpossible. Therefore, specialisedcomputer based programs weredeveloped (Figure 5.15). Thesesequences were subsequentlyannotated and were assigned to eachchromosome. The sequence ofchromosome 1 was completed onlyin May 2006 (this was the last of the24 human chromosomes – 22autosomes and X and Y – to be sequenced). Another challenging task was assigning the genetic andphysical maps on the genome. This was generated using information onpolymorphism of restriction endonuclease recognition sites, and somerepetitive DNA sequences known as microsatellites (one of the applicationsof polymorphism in repetitive DNA sequences shall be explained in nextsection of DNA fingerprinting)
Salient Features of Human GenomeSome of the salient observations drawn from human genome project areas follows:(i) The human genome contains 3164.7 million bp.(ii) The average gene consists of 3000 bases, but sizes vary greatly, withthe largest known human gene being dystrophin at 2.4 million bases.(iii) The total number of genes is estimated at 30,000–much lowerthan previous estimates of 80,000 to 1,40,000 genes. Almost all(99.9 per cent) nucleotide bases are exactly the same in all people.(iv) The functions are unknown for over 50 per cent of the discoveredgenes.(v) Less than 2 per cent of the genome codes for proteins.(vi) Repeated sequences make up very large portion of the human genome.(vii) Repetitive sequences are stretches of DNA sequences that arerepeated many times, sometimes hundred to thousand times. Theyare thought to have no direct coding functions, but they shed lighton chromosome structure, dynamics and evolution.(viii) Chromosome 1 has most genes (2968), and the Y has the fewest (231).(ix) Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs – single nucleotide polymorphism,pronounced as ‘snips’) occur in humans. This information promisesto revolutionise the processes of finding chromosomal locations fordisease-associated sequences and tracing human historY
Applications and Future ChallengesDeriving meaningful knowledge from the DNA sequences will defineresearch through the coming decades leading to our understanding ofbiological systems. This enormous task will require the expertise andcreativity of tens of thousands of scientists from varied disciplines in boththe public and private sectors worldwide. One of the greatest impacts ofhaving the HG sequence may well be enabling a radically new approachto biological research. In the past, researchers studied one or a few genesat a time. With whole-genome sequences and new high-throughputtechnologies, we can approach questions systematically and on a much
broader scale. They can study all the genes in a genome, for example, allthe transcripts in a particular tissue or organ or tumor, or how tens ofthousands of genes and proteins work together in interconnected networksto orchestrate the chemistry of life