LW

Chapter 27 - Genomics Fundamentals: Detailed Notes

Genomics Fundamentals

Concepts to Review

  • Structure, synthesis, and function of DNA (Sections 26.2 and 26.3).
  • Base pairing and heredity (Sections 26.4 and 26.5).
  • Replication of DNA (Section 26.6).
  • Transcription, translation, and the genetic code (Sections 26.8–26.10).

Study of DNA

  • Genetics: Study of genes and how traits are passed on (heredity/inheritance), including control of expression.
  • Genomics: Study of whole sets of genes and their functions at the organism level.
  • Epigenetics: Study of how behaviors and environment cause changes affecting gene function; these changes are reversible and don't alter DNA sequence, representing global control of gene expression.

What is a Gene?

  • Example: 1606 bases are copied to make a protein that is 147 amino acids long. This is calculated as 3 \times 147 = 441 bases.

DNA and Chromosomes - Genes

  • Genes consist of coding (exons) and noncoding (introns) segments.
  • Chromosome 22: The first chromosome to have its nonrepetitive DNA fully sequenced and mapped, containing 49 million bases and 693 genes, with an average of 8 exons and 7 introns per gene.

What is a Gene?

  • Gene: A segment of DNA that directs the synthesis of a single polypeptide.
  • Gene expression: The frequency and timing of protein creation based on instructions within genes.
  • Additional control regions include transcription factors, enhancers, activators, and promoters, with RNA Polymerase II playing a key role.

Mapping the Human Genome

  • Genomics: Study of whole sets of genes and their functions
  • Human genome consists of 3 billion base pairs.
  • Average chromosome has 130,400,000 base pairs.
  • There are approximately 20,500 genes in the human genome.
  • Average is 890 genes per chromosome.

Chromosomes

  • Total bases in genes:
    • Gene size average is 10-15 kb, variation exists ranging from ~0.2kb (Tyrosine tRNA gene) to ~2500kb (dystrophin gene). Considering 28 kbp as an average, 28,000 \text{ bases } \times 20,500 \text{ genes } = 574,000,000 \text{ bases}
    • Approximately 3,000,000,000 - 574,000,000 = 2,426,000,000 leftover bases. What are they?

Mapping the Human Genome

  • A genetic map is a physical representation of landmarks in a genome, showing their relative positions.
  • Initial studies of genetic diseases (pre-1990) involved identifying landmarks co-inherited with the disease gene.
  • Provided information about the chromosome and general location of the gene.

Mapping the Human Genome

  • Early sequencing experiments could only provide about 300 base pairs of information.
  • The Human Genome Project was initiated in 1990 by a collection of 20 groups at not-for-profit institutes and universities led by NIH.

Human Genome Project Strategy

  • A genetic map was generated, showing the physical location of markers (identifiable DNA sequences known to be inherited).
  • The physical map refined the distance between markers to about 100,000 base pairs.

Human Genome Project Strategy

  • A chromosome was cut into large segments, and multiple copies (clones) of the segments were produced.
  • Overlapping clones, which covered the entire length of the chromosome, were arranged in order to produce the next level of map.

Human Genome Project Strategy

  • Each clone was cut into 500 base-pair fragments, and the identity and order of bases in each fragment were determined.
  • All 500 base-pair sequences were assembled into a completed nucleotide map of the chromosome.

Celera Genomics Project Strategy

  • In 1995 Celera Genomics started a separate effort to sequence the human genome using a shotgun approach, breaking the human genome into fragments without identifying the origin of any given fragment.

Celera Genomics Project Strategy

  • Fragments were copied many times to generate many clones of each area of the genome.
  • Ultimately, they were cut into 500-base-long pieces and modified with fluorescently labeled bases that could be sequenced by high-speed machines. This involved new technology.

Celera Genomics Project Strategy

  • Sequences were reassembled by identifying overlapping ends; this was carried out using the world’s largest nongovernmental supercomputing center.

Mapping the Human Genome

  • In 2001, 90% of the human genome sequence had been mapped in 15 months instead of the originally anticipated four years.
  • By October 2004, 99% of the genome was sequenced and declared to be 99.999% accurate.
  • The mapped sequence correctly identifies almost all known genes, allowing researchers to rely on highly accurate sequence information.

DNA and Chromosomes

  • Understanding DNA structure should provide insight into the biotechnology revolution ushered in by the HGP.

Telomeres and Centromeres

  • Telomeres are specialized regions of DNA at both ends of every chromosome.
  • Each telomere is a long, noncoding series of a repeating sequence of nucleotides, (TTAGGG)_n.

Telomeres and Centromeres

  • Telomeres act as “endcaps,” protecting the ends of the chromosome from accidental changes that might alter the more important DNA coding sequences.
  • Telomeres also prevent the DNA ends from fusing to the DNA in other chromosomes or to DNA fragments.

Telomeres and Centromeres

  • Each new cell starts with approximately 8000 bp of telomeric DNA.
  • Between 50 and 250 bases are lost with each cell division, so that as the cell ages, the telomere gets shorter. An elderly person may have only 1500 bp of telomere in typical cells.

Telomeres and Centromeres

  • A very short telomere is associated with senescence (cells that no longer divide).
  • Continuation of shortening beyond this stage is associated with DNA instability and cell death.
  • Telomerase increases telomere length in DNA and is active during embryonic development.
  • In adults, telomerase is only active in germ cells.

Telomeres and Centromeres

  • There is widespread speculation that telomere shortening plays a role in aging.
  • Experiments with mice whose telomerase activity has been “knocked out” show premature aging, and embryos do not survive if the mice become pregnant.

Telomeres and Centromeres

  • The majority of cancer cells are known to contain active telomerase, which is thought to confer immortality on these tumor cells.
  • Current research suggests that the genes responsible for regulating telomerase expression are altered in cancer cells.
  • There are ongoing experiments on the consequences of telomerase inactivation on cancer cells.

Centromeres

  • As the DNA in each chromosome is duplicated in preparation for cell division, the two copies remain joined together at the centromere.
  • The duplicated chromosomes bound together at the centromere are known as sister chromatids.

Noncoding DNA

  • Only about 1.5% of the genome codes for proteins.
  • There are noncoding promoter sequences, which are regulatory regions of DNA that determine which of its genes are turned on.
  • Only the genes needed by any individual cell will be activated in that cell.
  • Out of 20,500 genes, only about 2000 are expressed in a given cell.

Noncoding DNA

  • Some scientists have suggested that the segments of noncoding DNA are needed to accommodate the folding of DNA with the nucleus.
  • Others think these segments may play a role in evolution.

Noncoding DNA

  • Some scientists argue that the segments are functional, but the functions are not yet understood.
  • The function of noncoding DNA remains to be discovered, and the debate over its role continues.

Epigenetics

  • Epigenetic mechanisms are affected by several factors and processes including development in utero and in childhood, environmental chemicals, drugs and pharmaceuticals, aging, and diet.
  • DNA methylation occurs when methyl groups tag DNA and activate or repress genes.
  • Histones are proteins around which DNA can wind for compaction and gene regulation.
  • Histone modification occurs when epigenetic factors bind to histone “tails,” altering the extent to which DNA is wrapped around histones and the availability of genes in the DNA to be activated.
  • These factors and processes can affect health, possibly resulting in cancer, autoimmune disease, mental disorders, or diabetes.

Mutations and Polymorphisms

  • Mutation: An error in base sequence that is carried along during DNA replication.
  • Mutation commonly refers to variations in DNA sequence found in a very small number of individuals of a species.
  • An error in nucleic acid composition that occurs once in 3–4 million lobsters is responsible for the beautiful color of this crustacean.

Types of Mutations

  • Point mutations: A single base change.
    • Silent: Specifies the same amino acid (e.g., GUU → GUC, gives Val → Val).
    • Missense: Specifies a different amino acid (e.g., GUU → GCU gives Val → Ala).
    • Nonsense: Produces a stop codon (e.g., CGA → UGA gives Arg → Stop).
  • Frameshift:
    • Insertion: Addition of one or more bases. If the number of inserted or deleted bases is not a multiple of 3, then all triplets following the mutation are read differently
    • Deletion: Loss of one or more bases.

Mutations and Polymorphisms

  • Some mutations result from spontaneous and random events; error rate of replication is about 1 in 1 billion.
  • Others are induced by exposure to a mutagen—an external agent that can cause a mutation.
  • Viruses, chemicals, and ionizing radiation can all be mutagenic.

Mutations and Polymorphisms

  • Polymorphisms: Variations in the nucleotide sequence of DNA that are common within a given population.
  • Most polymorphisms are simply differences in the DNA sequence between individuals due to geographical and ethnic differences, and are part of the biodiversity exhibited by life on Earth.

Mutations and Polymorphisms

  • The vast majority of polymorphisms seen have neither advantageous nor deleterious effects; some have been shown to give rise to various disease states.

Mutations and Polymorphisms - Hereditary Diseases

  • Examples of hereditary diseases, their causes, and their prevalence:
    • Phenylketonuria (PKU): Brain damage in infants caused by the defective enzyme phenylalanine hydroxylase; prevalence is 1 in 40,000.
    • Albinism: Absence of skin pigment caused by the defective enzyme tyrosinase; prevalence is 1 in 20,000.
    • Tay-Sachs disease: Mental retardation caused by a defect in production of the enzyme hexosaminidase A; prevalence is 1 in 6000 (Ashkenazi Jews) and 1 in 100,000 (general population).
    • Cystic fibrosis: Bronchopulmonary, liver, and pancreatic obstructions by thickened mucus; prevalence is 1 in 3000.
    • Sickle-cell anemia: Anemia and obstruction of blood flow caused by a defect in hemoglobin; prevalence is 1 in 185 (African Americans).

Single-Nucleotide Polymorphism and Disease

  • The replacement of one nucleotide by another in the same location along the DNA sequence is a single-nucleotide polymorphism (SNP).

Single-Nucleotide Polymorphism and Disease

  • The biological effects of SNPs range from negligible to normal variations (e.g., eye or hair color) to genetic diseases.
  • SNPs are the most common source of variations between individual human beings.

Single-Nucleotide Polymorphism and Disease

  • In addition to producing a change in the identity of an amino acid, a SNP might specify the same amino acid (for example, changing GUU to GUC, both of which code for valine), or it might terminate protein synthesis by introducing a stop codon.

Single-Nucleotide Polymorphism and Disease

  • Their frequency is roughly one SNP for about every 300 nucleotides, with many of them in coding regions.
  • Knowing their exact locations may one day help doctors to predict an individual’s risk of developing a disease.

Single-Nucleotide Polymorphism and Disease

  • The SNP catalog has been used to locate SNPs responsible for 30 abnormal conditions, including total color blindness, one type of epilepsy, and susceptibility to the development of breast cancer.
  • As of June 2015, the SNP catalog maintained by the National Human Genome Research Institute contains over 147 million SNP entries.

Single-Nucleotide Polymorphism and Disease

  • The cataloging of SNPs has ushered in the era of genetic medicine.
  • The SNP catalog may allow physicians to predict for an individual the potential age at which inherited diseases will become active, their severity, and their reactions to various types of treatment.
  • The therapeutic course will be designed to meet the distinctive genomic profile of the person.

Ancestry and Disease Testing

  • Companies like Ancestry.com and 23andMe offer ancestry and disease testing services based on SNP analysis.
  • These tests may be incomplete, as dozens of SNPs or mutations are known for some genetic diseases.

Worked Example 27.1

  • The severity of a mutation in a DNA sequence that changes a single amino acid in a protein depends on the type of amino acid replaced and the nature of the new amino acid.
    • (a) What kind of change would have little effect on the protein containing the alternative amino acid?
    • (b) What kind of change could have a major effect on the protein that contains the alternative amino acid? Give an example of each type of mutation.
  • Analysis: Exchanging one amino acid for another depends on the change in the nature of the amino acid side chains.
  • Solution: (a) Exchange of an amino acid with a small nonpolar side chain for another with the same type of side chain (e.g., glycine for alanine) or exchange of amino acids with very similar side chains (e.g., serine for threonine) might have little effect.
  • (b) Conversion of an amino acid with a nonpolar side chain to one with a polar, acidic, or basic side chain could have a major effect because the side-chain interactions that affect protein folding may change. Some examples of this type include exchanging threonine, glutamate, or lysine for isoleucine. In hemoglobin, a single replacement of glutamic acid with a valine leads to sickle-cell anemia.

Recombinant DNA

  • Recombinant DNA: DNA that contains two or more DNA segments not found together in nature. Technology that predates the Human Genome Project.

Recombinant DNA

  • Progress in all aspects of genomics has built upon information gained in the application of recombinant DNA.
  • Using recombinant DNA technology, it is possible to cut a gene out of one organism and splice it into (recombine it with) the DNA of a second organism.

Recombinant DNA

  • Bacteria provide excellent hosts for recombinant DNA.
  • Bacterial cells contain part of their DNA in small circular pieces called plasmids, each of which carries just a few genes.
    • Can be passed from one bacterium to another
    • Many carry antibiotic resistance genes

Recombinant DNA

  • Plasmids are extremely easy to isolate, several copies of each plasmid may be present in a cell, and each plasmid replicates through the normal base-pairing pathway. Plasmids from the bacterium Escherichia coli, hosts for recombinant DNA.

Recombinant DNA

  • The ease of isolating and manipulating plasmids plus the rapid replication of bacteria create ideal conditions for production of recombinant DNA and the proteins whose synthesis it directs.

Recombinant DNA

  • The plasmid is cut open with a restriction endonuclease or restriction enzyme, which recognizes a specific sequence.
  • The restriction enzyme makes its cut at the same spot in the sequence of both strands of the double-stranded DNA when read in the same 5’ to 3’ direction.

Recombinant DNA

  • This results in unpaired bases, known as sticky ends because they are available to match up with complementary base sequences.

Recombinant DNA

  • Consider a gene fragment that has been cut from human DNA and is to be inserted into a plasmid.
    • The first step is cutting the gene and plasmid with the same restriction enzyme.
    • The next step is re-forming their phosphodiester bonds with ligase.

Recombinant DNA

  • The altered plasmid is inserted back into a bacterial cell where the normal processes of transcription and translation synthesize the protein encoded by the inserted gene.
  • Bacteria multiply rapidly; there are soon a large number of them, all containing the recombinant DNA and all manufacturing the protein encoded by the recombinant DNA.

Recombinant DNA

  • There are some technical hurdles that have to be overcome before a protein manufactured in this way can be used commercially. They include the following:
    • The recombinant plasmid must be inserted into a bacterium.
    • Host organisms may modify the protein (Glycosylation).
    • The protein of interest must be isolated from endotoxins—potentially toxic natural compounds found inside the host organism.

Recombinant DNA

  • Despite the obstacles, proteins manufactured in this manner have already reached the marketplace, including human insulin, human growth hormone, and blood clotting factors for hemophiliacs.
  • A major advantage of this technology is that large amounts of these proteins can be made, thus allowing their practical therapeutic use.

Genomics: Using What We Know - Genetically Modified Plants and Animals

  • The development of new varieties of plants and animals has been proceeding for centuries as the result of natural accidents and occasional success in the hybridization of known varieties.
  • The mapping and study of plant and animal genomes can greatly accelerate our ability to generate crop plants and farm animals with desirable characteristics and lacking undesirable ones.

Genomics: Using What We Know - Genetically Modified Plants and Animals

  • Examples of genetically modified crops:
    • Corn modified with a bacterial gene (from Bacillus thuringiensis, Bt) to produce a toxin that kills the European corn borer.
    • Tests are under way with genetically modified coffee beans that are caffeine-free, potatoes that absorb less fat when they are fried, and “Golden Rice,” a yellow rice that provides the vitamin A desperately needed in poor populations where insufficient vitamin A causes death and blindness.

Genomics: Using What We Know - Genetically Modified Plants and Animals

  • Concerns and Debates:
    • Will genetically modified plants and animals intermingle with natural varieties and cause harm to them?
    • Should food labels state whether the food contains genetically modified ingredients?
    • Might unrecognized harmful substances enter the food supply?
    • These hotly debated questions have led to the establishment of the Non-GMO Project, the goal of which is to offer consumers a non-GMO choice for organic and natural products.

Genomics: Using What We Know - Gene Therapy

  • Gene therapy is based on the premise that a disease-causing gene can be corrected or replaced by inserting a functional, healthy gene.
  • The most clear-cut expectations for gene therapy lie in treating monogenic diseases, those that result from defects of a single gene.

Genomics: Using What We Know - Gene Therapy

  • The focus has been on using nonpathogenic viruses as vectors, the agents that deliver therapeutic quantities of DNA directly into cell nuclei.
  • The expectation was that this method could result in lifelong elimination of an inherited disease, and many studies have been undertaken.

Genomics: Using What We Know - Gene Therapy

  • Expectations remain greater than achievements thus far.
  • The Food and Drug Administration (FDA) has, as of 2014, not yet approved any human gene therapy product for sale.
  • Early vector was AAV, apparently caused an allergic reaction.
  • CRISPER-Cas-9 allows targeted modification of specific sequences in living cells.

FDA Considering CRISPR Therapy for Sickle Cell Disease

  • Average life expectancy with sickle cell disease: 45 years (compared to 77 in USA).
  • Sickle cell disease is caused by a single base change in the globin gene.
  • Proposed Treatment regimen:
    • Extract patient’s stem cells
    • Treat with CRISPR (targeted modification of the globin gene)
    • Return cells to patient
  • Test population: 40 patients, 39 “had no vaso-occlusive crisis” (blocked blood vessel).
  • Concern: Off target modification of the DNA => Benefit vs risk?

Genomics: Using What We Know - A Personal Genomic Survey

  • If a patient lacks an enzyme needed for a drug’s metabolism or has a monogenic defect, therapies could be individually tailored.
  • In cancer therapy, understanding the genetic differences between normal cells and tumor cells could assist in chemotherapy, immune cell therapy, and CAR-T cells.

Genomics: Using What We Know - A Personal Genomic Survey

  • Genetic screening of infants might permit the use of gene therapy to eliminate the threat of a monogenically based disease, or a lifestyle adjustment for an individual with SNPs that predict a susceptibility to a disease that results from combinations of genetic and environmental influences.

Genomics: Using What We Know - Bioethics

  • One area of major concern that has arisen from the genomics revolution is that of the ethical and social implications this groundbreaking work has brought to the fore.
  • The ELSI program of the National Human Genome Research Institute deals with the ethical, legal, and social implications of human genetic research.

Genomics: Using What We Know - Bioethics

  • The scope of ELSI is broad and thought-provoking, dealing with questions such as:
    • Who should have access to personal genetic information and how will it be used?
    • Who should own and control genetic information?
    • Should genetic testing be performed when no treatment is available?
    • Are disabilities diseases? Do they need to be cured or prevented?
    • Preliminary attempts at gene therapy are exorbitantly expensive. Who will have access to these therapies? Who will pay for their use?
    • Should we re-engineer the genes we pass on to our children?
    • Should we get every newborn’s full genetic sequence?