Chapter 9 - Molecular Genetic Techniques and Genomics
The information encoded in the DNA sequence of genes specifies the sequence and therefore the structure and function of every protein molecule in a cell
The power of genetics as a tool for studying cells and organisms lies in the ability of researchers to selectively alter every copy of just one type of protein in a cell by making a change in the gene for that protein.
The different forms, or variants, of a gene, are referred to as alleles. Geneticists commonly refer to the numerous naturally occurring genetic variants that exist in populations, particularly human populations, as alleles
A fundamental genetic difference between experimental organisms is whether their cells carry a single set of chromosomes or two copies of each chromosome
The former are referred to as haploid; the latter, as diploid
Since diploid organisms carry two copies of each gene, they may carry identical alleles, that is, be homozygous for a gene, or carry different alleles, that is, be heterozygous for a gene
Recessive alleles usually result from a mutation that inactivates the affected gene, leading to a partial or complete loss of function
Dominant mutations in certain genes are associated with a loss of function
Geneticists exploit the normal life cycle of an organism to test for the dominance or recessivity of alleles
Like somatic cells, premeiotic germ cells are diploid, containing two homologs of each morphologic type of chromosome
The two homologs constituting each pair of homologous chromosomes are descended from different parents, and thus their genes may exist in different allelic forms.
Geneticists usually strive to begin breeding experiments with strains that are homozygous for the genes under examination
In such true-breeding strains, every individual will receive the same allele from each parent and therefore the composition of alleles will not change from one generation to the next
The procedures used to identify and isolate mutants, referred to as genetic screens, depend on whether the experimental organism is haploid or diploid and, if the latter, whether the mutation is recessive or dominant
Genes that encode proteins essential for life are among the most interesting and important ones to study
In haploid yeast cells, essential genes can be studied through the use of conditional mutations
Among the most common conditional mutations are temperature-sensitive mutations, which can be isolated in bacteria and lower eukaryotes but not in warm-blooded eukaryotes
Once temperature-sensitive mutants were isolated, further analysis revealed that they indeed were defective in cell division
In S. cerevisiae, cell division occurs through a bidding process, and the size of the bud, which is easily visualized by light microscopy, indicates a cell’s position in the cell cycle.
In diploid organisms, phenotypes resulting from recessive mutations can be observed only in individuals homozygous for the mutant alleles
Since mutagenesis in a diploid organism typically changes only one allele of a gene, yielding heterozygous mutants, genetic screens must include inbreeding steps to generate progeny that are homozygous for the mutant alleles
In the genetic approach to studying a particular cellular process, researchers often isolate multiple recessive mutations that produce the same phenotype
A common test for determining whether these mutations are in the same gene or in different genes exploits the phenomenon of genetic complementation, that is, the restoration of the wild-type phenotype by the mating of two different mutants
Complementation analysis of a set of mutants exhibiting the same phenotype can distinguish the individual genes in a set of functionally related genes, all of which must function to produce a given phenotypic trait
Based on careful analysis of mutant phenotypes associated with a particular cellular process, researchers often can deduce the order in which a set of genes and their protein products function
Ordering of Biosynthetic Pathways: A simple example of the first type of process is the biosynthesis of a metabolite such as the amino acid tryptophan in bacteria
Ordering of Signaling Pathways: The expression of many eukaryotic genes is regulated by signaling pathways that are initiated by extracellular hormones, growth factors, or other signals
Such signaling pathways may include numerous components, and double-mutant analysis often can provide insight into the functions and interactions of these components
The only prerequisite for obtaining useful information from this type of analysis is that the two mutations must have opposite effects on the output of the same regulated pathway
Two other types of genetic analysis can provide additional clues about how proteins that function in the same cellular process may interact with one another in the living cell
Suppressor Mutations: The first type of analysis is based on genetic suppression
To understand this phenomenon, suppose that point mutations lead to structural changes in one protein (A) that disrupt its ability to associate with another protein (B) involved in the same cellular process
Similarly, mutations in protein B lead to small structural changes that inhibit its ability to interact with protein A
Synthetic Lethal Mutations: A phenomenon, called synthetic lethality, produces a phenotypic effect opposite to that of suppression
In this case, the deleterious effect of one mutation is greatly exacerbated (rather than suppressed) by a second mutation in the same or a related gene
One situation in which such synthetic lethal mutations can occur
A variety of techniques often referred to as recombinant DNA technology, are used in DNA cloning, which permits researchers to prepare large numbers of identical DNA molecules
Recombinant DNA is simply any DNA molecule composed of sequences derived from different sources
A major objective of DNA cloning is to obtain discrete, small regions of an organism’s DNA that constitute specific genes
Only relatively small DNA molecules can be cloned in any of the available vectors
Cutting DNA Molecules into Small Fragments: Restriction enzymes are endonucleases produced by bacteria that typically recognize specific 4- to 8-bp sequences, called restriction sites, and then cleave both DNA strands at this site
Restriction sites commonly are short palindromic sequences; that is, the restriction-site sequence is the same on each DNA strand when read in the 5’ → 3’ direction
Many restriction enzymes make staggered cuts in the two DNA strands at their recognition site, generating fragments that have a single-stranded “tail” at both ends
Inserting DNA Fragments into Vectors: DNA fragments with either sticky ends or blunt ends can be inserted into vector DNA with the aid of DNA ligases
During normal DNA replication, DNA ligase catalyzes the end-to-end joining (ligation) of short fragments of DNA, called Okazaki fragments
Plasmids are circular, double-stranded DNA (dsDNA) molecules that are separate from a cell’s chromosomal DNA
These extrachromosomal DNAs, which occur naturally in bacteria and in lower eukaryotic cells (e.g., yeast), exist in a parasitic or symbiotic relationship with their host cell
The plasmids most commonly used in recombinant DNA technology are those that replicate in E. coli. Investigators have engineered these plasmids to optimize their use as vectors in DNA cloning
DNA fragments from a few base pairs up to ≈20 kb commonly are inserted into plasmid vectors
If special precautions are taken to avoid manipulations that might mechanically break DNA, even longer DNA fragments can be inserted into a plasmid vector
When a recombinant plasmid with an inserted DNA fragment transforms an E. coli cell, all the antibiotic-resistant progeny cells that arise from the initially transformed cell will contain plasmids with the same inserted DNA
Vectors constructed from bacteriophage are about a thousand times more efficient than plasmid vectors in cloning large numbers of DNA fragments
Phage vectors have been widely used to generate DNA libraries, comprehensive collections of DNA fragments representing the genome or expressed mRNAs of an organism
A λ virion consists of a head, which contains the phage DNA genome, and a tail, which functions in infecting E. coli host cells
The λ genes encoding the head and tail proteins, as well as various proteins involved in phage DNA replication and cell lysis, are grouped in discrete regions of the ≈50-kb viral genome
It is technically feasible to use λ phage cloning vectors to generate a genomic library, that is, a collection of λ clones that collectively represent all the DNA sequences in the genome of a particular organism
The first step in preparing a cDNA library is to isolate the total mRNA from the cell type or tissue of interest
Because of their poly(A) tails, mRNAs are easily separated from the much more prevalent rRNAs and tRNAs present in a cell extract by use of a column to which short strings of thymidylate (oligo-dTs) are linked to the matrix
Each plaque arises from a single recombinant phage, all the progeny λ phages that develop are genetically identical and constitute a clone carrying a cDNA derived from a single mRNA; collectively they constitute a λ cDNA library
One feature of cDNA libraries arises because different genes are transcribed at very different rates
Both genomic and cDNA libraries of various organisms contain hundreds of thousands to upwards of a million individual clones in the case of higher eukaryotes
The basis for screening with oligonucleotide probes is the hybridization
The ability of complementary single-stranded DNA or RNA molecules to associate (hybridize) specifically with each other via base pairing
Clearly, the identification of specific clones by the membrane-hybridization technique depends on the availability of complementary radiolabeled probes
For an oligonucleotide to be useful as a probe, it must be long enough for its sequence to occur uniquely in the clone of interest and not in any other clones
Chemical synthesis of single-stranded DNA probes of defined sequence can be accomplished by the series of reactions
In some cases, a DNA library can be screened for the ability to express a functional protein that complements a recessive mutation
Such a screening strategy would be an efficient way to isolate a cloned gene that corresponds to an interesting recessive mutation identified in an experimental organism
Libraries constructed for the purpose of screening among yeast gene sequences usually are constructed from genomic DNA rather than cDNA
To increase the probability that all regions of the yeast genome are successfully cloned and represented in the plasmid library
The genomic DNA usually is only partially digested to yield overlapping restriction fragments of ≈10 kb
In order to manipulate or sequence a cloned DNA fragment, it first must be separated from the vector DNA
This can be accomplished by cutting the recombinant DNA clone with the same restriction enzyme used to produce the recombinant vectors originally
Near neutral pH, DNA molecules carry a large negative charge and therefore move toward the positive electrode during gel electrophoresis
A common method for visualizing separated DNA bands on a gel is to incubate the gel in a solution containing the fluorescent dye ethidium bromide
This planar molecule binds to DNA by intercalating between the base pairs
The complete characterization of any cloned DNA fragment requires the determination of its nucleotide sequence. F. Sanger and his colleagues developed the method
Now most commonly used to determine the exact nucleotide sequence of DNA fragments up to ≈500 nucleotides long
If the nucleotide sequences at the ends of a particular DNA region are known, the intervening fragment can be amplified directly by the polymerase chain reaction (PCR)
The PCR depends on the ability to alternately denature (melt) double-stranded DNA molecules and renature (anneal) complementary single strands in a controlled fashion
Direct Isolation of a Specific Segment of Genomic DNA: For organisms in which all or most of the genome has been sequenced, PCR amplification starting with the total genomic DNA often is the easiest way to obtain a specific DNA region of interest for cloning
Preparation of Probes: Preparation of such probes by PCR amplification requires chemical synthesis of only two relatively short primers corresponding to the two ends of the target sequence
Tagging of Genes by Insertion Mutations: Another useful application of the PCR is to amplify a “tagged” gene from the genomic DNA of a mutant strain
This approach is a simpler method for identifying genes associated with a particular mutant phenotype than the screening of a library by functional complementation
The key to this use of PCR is the ability to produce mutations by insertion of a known DNA sequence into the genome of an experimental organism
Two very sensitive methods for detecting a particular DNA or RNA sequence within a complex mixture combine separation by gel electrophoresis and hybridization with a complementary radiolabeled DNA probe
Southern Blotting The first blotting technique to be devised is known as Southern blotting after its originator E. M. Southern
This technique is capable of detecting a single specific restriction fragment in the highly complex mixture of fragments produced by cleavage of the entire human genome with a restriction enzyme
Northern Blotting One of the most basic ways to characterize a cloned gene is to determine when and where in an organism the gene is expressed
Expression of a particular gene can be followed by assaying for the corresponding mRNA by Northern blotting, named, in a play on words, after the related method of Southern blotting
The first step in producing large amounts of a low-abundance protein is to obtain a cDNA clone encoding the full-length protein by methods discussed previously
The second step is to engineer plasmid vectors that will express large amounts of the encoded protein when it is inserted into E. coli cells
One disadvantage of bacterial expression systems is that many eukaryotic proteins undergo various modifications (e.g., glycosylation, hydroxylation) after their synthesis on ribosomes
To get around this limitation, cloned genes are introduced into cultured animal cells, a process called transfection
Two common methods for transfecting animal cells differ in whether the recombinant vector DNA is or is not integrated into the host-cell genomic DNA
In both methods, cultured animal cells must be treated to facilitate their initial uptake of a recombinant plasmid vector
Transient Transfection: The simplest of the two expression methods, called transient transfection, employs a vector similar to the yeast shuttle vectors described previously
Stable Transfection (Transformation) If an introduced vector integrates into the genome of the host cell, the genome is permanently altered and the cell is said to be transformed
Integration most likely is accomplished by mammalian enzymes that function normally in DNA repair and recombination
Epitope Tagging: In addition to their use in producing proteins that are modified after translation, eukaryotic expression vectors provide an easy way to study the intracellular localization of eukaryotic proteins
In this method, a cloned cDNA is modified by fusing it to a short DNA sequence encoding an amino acid sequence recognized by a known monoclonal antibody
Proteins with similar functions often contain similar amino acid sequences that correspond to important functional domains in the three-dimensional structure of the proteins
By comparing the amino acid sequence of the protein encoded by a newly cloned gene with the sequences of proteins of known function, an investigator can look for sequence similarities that provide clues to the function of the encoded protein
Even when a protein shows no significant similarity to other proteins with the BLAST algorithm, it may nevertheless share a short sequence with other proteins that are functionally important
Such short segments recurring in many different proteins referred to as motifs, generally have similar functions
BLAST searches for related protein sequences may reveal that proteins belong to a protein family. (The corresponding genes constitute a gene family)
Protein families are thought to arise by two different evolutionary processes, gene duplication, and speciation
All the different members of the tubulin family are sufficiently similar in sequence to suggest a common ancestral sequence
All these sequences are considered to be homologous
More specifically, sequences that presumably diverged as a result of gene duplication (e.g., the - and-tubulin sequences) are described as paralogous
The complete genomic sequence of an organism contains within it the information needed to deduce the sequence of every protein made by the cells of that organism
For organisms such as bacteria and yeast, whose genomes have few introns and short intergenic regions
Most protein-coding sequences can be found simply by scanning the genomic sequence for open reading frames (ORFs) of significant length
The best gene-finding algorithms combine all the available data that might suggest the presence of a gene at a particular genomic site
The combination of genomic sequencing and gene-finding computer algorithms has yielded the complete inventory of protein-coding genes for a variety of organisms
The functions of about half the proteins encoded in these genomes are known or have been predicted on the basis of sequence comparisons
One of the surprising features of this comparison is that the number of protein-coding genes within different organisms does not seem proportional to our intuitive sense of their biological complexity
Monitoring the expression of thousands of genes simultaneously is possible with DNA microarray analysis
A DNA microarray consists of thousands of individual, closely packed gene-specific sequences attached to the surface of a glass microscopic slide
Preparation of DNA Microarrays In one method for preparing microarrays, a ≈1-kb portion of the coding region of each gene analyzed is individually amplified by PCR
In an alternative method, multiple DNA oligonucleotides, usually at least 20 nucleotides in length, are synthesized from an initial nucleotide that is covalently bound to the surface of a glass slide
The synthesis of an oligonucleotide of a specific sequence can be programmed in a small region on the surface of the slide
Effect of Carbon Source on Gene Expression in Yeast: The initial step in a microarray expression study is to prepare fluorescently labeled cDNAs corresponding to the mRNAs expressed by the cells understudy
Firm conclusions rarely can be drawn from a single microarray experiment about whether genes that exhibit similar changes in expression are co-regulated and hence likely to be closely related functionally
Genes that appear to be co-regulated in a single microarray expression experiment may undergo changes in expression for very different reasons and may actually have very different biological functions
A solution to this problem is to combine the information from a set of expression array experiments to find genes that are similarly regulated under a variety of conditions or over a period of time
The elucidation of DNA and protein sequences in recent years has led to the identification of many genes, using sequence patterns in genomic DNA and the sequence similarity of the encoded proteins with proteins of known function
Modifying the genome of the yeast Saccharomyces is particularly easy for two reasons: yeast cells readily take up exogenous DNA under certain conditions, and the introduced DNA is efficiently exchanged for the homologous chromosomal site in the recipient cell
This specific, targeted recombination of identical stretches of DNA allows any gene in yeast chromosomes to be replaced with a mutant allele
Disruption of yeast genes by this method is proving particularly useful in assessing the role of proteins identified by ORF analysis of the entire genomic DNA sequence
A large consortium of scientists has replaced each of the approximately 6000 genes identified by ORF analysis with the kanMX disruption construct and determined which gene disruptions lead to nonviable haploid spores
Although disruption of an essential gene required for cell growth will yield non-viable spores
This method provides little information about what the encoded protein actually does in cells
A useful promoter for this purpose is the yeast GAL1 promoter, which is active in cells grown on galactose but completely inactive in cells grown on glucose
In this approach, the coding sequence of an essential gene (X) ligated to the GAL1 promoter is inserted into a yeast shuttle vector
In an early application of this method, researchers explored the function of cytosolic Hsc70 genes in yeast
Haploid cells with a disruption in all four redundant Hsc70 genes were non-viable unless the cells carried a vector containing a copy of the Hsc70 gene that could be expressed from the GAL1 promoter on galactose medium
Many of the methods for disrupting genes in yeast can be applied to genes of higher eukaryotes
These genes can be introduced into the germline via homologous recombination to produce animals with a gene knockout, or simply “knockout”
Gene-targeted knockout mice are generated by a two-stage procedure
In the first stage, a DNA construct containing a disrupted allele of a particular target gene is introduced into embryonic stem (ES) cells
These cells, which are derived from the blastocyst, can be grown in culture through many generations
In the second stage in the production of knockout mice, ES cells heterozygous for a knockout mutation in gene X are injected into a recipient wild-type mouse blastocyst, which subsequently is transferred into a surrogate pseudopregnant female mouse
Investigators often are interested in examining the effects of knockout mutations in a particular tissue of the mouse, at a specific stage in development, or both
Mice carrying a germ-line knockout may have defects in numerous tissues or die before the developmental stage of interest
To address this problem, mouse geneticists have devised a clever technique to inactivate target genes in specific types of somatic cells or at particular times during development
This technique employs site-specific DNA recombination sites (called loxP sites) and the enzyme Cre that catalyzes recombination between them
The loxP-Cre recombination system is derived from bacteriophage P1, but this site-specific recombination system also functions when placed in mouse cells
For certain genes, the difficulties in producing homozygous knockout mutants can be avoided by the use of an allele carrying a dominant-negative mutation
These alleles are genetically dominant; that is, they produce a mutant phenotype even in cells carrying a wild-type copy of the gene
But unlike other types of dominant alleles, dominant-negative alleles produce a phenotype equivalent to that of a loss-of-function mutation
Useful dominant-negative alleles have been identified for a variety of genes and can be introduced into cultured cells by transfection or into the germline of mice or other organisms
Researchers are exploiting a recently discovered phenomenon known as RNA interference (RNAi) to inhibit the function of specific genes
This approach is technically simpler than the methods described above for disrupting genes
To use RNAi for intentional silencing of a gene of interest, investigators first produce dsRNA based on the sequence of the gene to be inactivated
This dsRNA is injected into the gonad of an adult worm, where it has access to the developing embryos
As the embryos develop, the mRNA molecules corresponding to the injected dsRNA are rapidly destroyed
The resulting worms display a phenotype similar to the one that would result from disruption of the corresponding gene itself
Initially, the phenomenon of RNAi was quite mysterious to geneticists. Recent studies have shown that specialized RNA-processing enzymes cleave dsRNA into short segments, which base-pair with endogenous mRNA
Inherited human diseases are the phenotypic consequence of defective human genes
Although a “disease” gene may result from a new mutation that arose in the preceding generation, most cases of inherited diseases are caused by preexisting mutant alleles that have been passed from one generation to the next for many generations
The genes responsible for inherited diseases must be found without any prior knowledge or reasonable hypotheses about the nature of the affected gene or its encoded protein
Human genetic diseases that result from a mutation in one specific gene exhibit several inheritance patterns depending on the nature and chromosomal location of the alleles that cause them
One characteristic pattern is that exhibited by a dominant allele in an autosome (that is, one of the 22 human chromosomes that are not a sex chromosome)
A recessive allele in an autosome exhibits a quite different segregation pattern. For an autosomal recessive allele, both parents must be heterozygous carriers of the allele in order for their children to be at risk of being affected by the disease
The independent segregation of chromosomes during meiosis provides the basis for determining whether genes are on the same or different chromosomes
Genetic traits that segregate together during meiosis more frequently than expected from random segregation are controlled by genes located on the same chromosome
The presence of many different already mapped genetic traits, or markers, distributed along the length of a chromosome facilitates the mapping of a new mutation by assessing its possible linkage to these marker genes in appropriate crosses
The more markers that are available, the more precisely a mutation can be mapped
Many different genetic markers are needed to construct a high-resolution genetic map
In the experimental organisms commonly used in genetic studies, numerous markers with easily detectable phenotypes are readily available for genetic mapping of mutations
Restriction fragment length polymorphisms (RFLPs) were the first type of molecular markers used in linkage studies
RFLPs arise because mutations can create or destroy the sites recognized by specific restriction enzymes, leading to variations between individuals in the length of restriction fragments produced from identical regions of the genome
How the allele conferring a particular dominant trait (e.g., familial hypercholesterolemia) might be mapped
The first step is to obtain DNA samples from all the members of a family containing individuals that exhibit the disease
The DNA from each affected and unaffected individual then is analyzed to determine the identity of a large number of known DNA polymorphisms (either SSR or SNP markers can be used)
The segregation pattern of each DNA polymorphism within the family is then compared with the segregation of the disease under study to find those polymorphisms that tend to segregate along with the disease
Lastly, computer analysis of the segregation data is used to calculate the likelihood of linkage between each DNA polymorphism and the disease-causing allele
A phenomenon called linkage disequilibrium is the basis for an alternative strategy, which in some cases can afford a higher degree of resolution in mapping studies
Although linkage mapping can usually locate a human disease gene to a region containing about 7.5 x 105 base pairs, as many as 50 different genes may be located in a region of this size
The ultimate objective of a mapping study is to locate the gene within a cloned segment of DNA and then to determine the nucleotide sequence of this fragment
In many cases, point mutations that give rise to disease-causing alleles may result in no detectable change in the level of expression or electrophoretic mobility of mRNAs
So if the comparison of the mRNAs expressed in normal and affected individuals reveals no detectable differences in the candidate mRNAs, a search for point mutations in the DNA regions encoding the mRNAs is undertaken
Most of the inherited human diseases that are now understood at the molecular level are monogenetic traits
That is, a clearly discernible disease state is produced by the presence of a defect in a single gene
Monogenic diseases caused by a mutation in one specific gene exhibit
Many other inherited diseases show more complicated patterns of inheritance, making the identification of the underlying genetic cause much more difficult
Human geneticists used two different approaches to identify the many genes associated with retinitis pigmentosa
A further complication in the genetic dissection of human diseases is posed by diabetes, heart disease, obesity, predisposition to cancer, and a variety of mental disorders that have at least some heritable properties
Models of human disease in experimental organisms may also contribute to unraveling the genetics of complex traits such as obesity or diabetes
The information encoded in the DNA sequence of genes specifies the sequence and therefore the structure and function of every protein molecule in a cell
The power of genetics as a tool for studying cells and organisms lies in the ability of researchers to selectively alter every copy of just one type of protein in a cell by making a change in the gene for that protein.
The different forms, or variants, of a gene, are referred to as alleles. Geneticists commonly refer to the numerous naturally occurring genetic variants that exist in populations, particularly human populations, as alleles
A fundamental genetic difference between experimental organisms is whether their cells carry a single set of chromosomes or two copies of each chromosome
The former are referred to as haploid; the latter, as diploid
Since diploid organisms carry two copies of each gene, they may carry identical alleles, that is, be homozygous for a gene, or carry different alleles, that is, be heterozygous for a gene
Recessive alleles usually result from a mutation that inactivates the affected gene, leading to a partial or complete loss of function
Dominant mutations in certain genes are associated with a loss of function
Geneticists exploit the normal life cycle of an organism to test for the dominance or recessivity of alleles
Like somatic cells, premeiotic germ cells are diploid, containing two homologs of each morphologic type of chromosome
The two homologs constituting each pair of homologous chromosomes are descended from different parents, and thus their genes may exist in different allelic forms.
Geneticists usually strive to begin breeding experiments with strains that are homozygous for the genes under examination
In such true-breeding strains, every individual will receive the same allele from each parent and therefore the composition of alleles will not change from one generation to the next
The procedures used to identify and isolate mutants, referred to as genetic screens, depend on whether the experimental organism is haploid or diploid and, if the latter, whether the mutation is recessive or dominant
Genes that encode proteins essential for life are among the most interesting and important ones to study
In haploid yeast cells, essential genes can be studied through the use of conditional mutations
Among the most common conditional mutations are temperature-sensitive mutations, which can be isolated in bacteria and lower eukaryotes but not in warm-blooded eukaryotes
Once temperature-sensitive mutants were isolated, further analysis revealed that they indeed were defective in cell division
In S. cerevisiae, cell division occurs through a bidding process, and the size of the bud, which is easily visualized by light microscopy, indicates a cell’s position in the cell cycle.
In diploid organisms, phenotypes resulting from recessive mutations can be observed only in individuals homozygous for the mutant alleles
Since mutagenesis in a diploid organism typically changes only one allele of a gene, yielding heterozygous mutants, genetic screens must include inbreeding steps to generate progeny that are homozygous for the mutant alleles
In the genetic approach to studying a particular cellular process, researchers often isolate multiple recessive mutations that produce the same phenotype
A common test for determining whether these mutations are in the same gene or in different genes exploits the phenomenon of genetic complementation, that is, the restoration of the wild-type phenotype by the mating of two different mutants
Complementation analysis of a set of mutants exhibiting the same phenotype can distinguish the individual genes in a set of functionally related genes, all of which must function to produce a given phenotypic trait
Based on careful analysis of mutant phenotypes associated with a particular cellular process, researchers often can deduce the order in which a set of genes and their protein products function
Ordering of Biosynthetic Pathways: A simple example of the first type of process is the biosynthesis of a metabolite such as the amino acid tryptophan in bacteria
Ordering of Signaling Pathways: The expression of many eukaryotic genes is regulated by signaling pathways that are initiated by extracellular hormones, growth factors, or other signals
Such signaling pathways may include numerous components, and double-mutant analysis often can provide insight into the functions and interactions of these components
The only prerequisite for obtaining useful information from this type of analysis is that the two mutations must have opposite effects on the output of the same regulated pathway
Two other types of genetic analysis can provide additional clues about how proteins that function in the same cellular process may interact with one another in the living cell
Suppressor Mutations: The first type of analysis is based on genetic suppression
To understand this phenomenon, suppose that point mutations lead to structural changes in one protein (A) that disrupt its ability to associate with another protein (B) involved in the same cellular process
Similarly, mutations in protein B lead to small structural changes that inhibit its ability to interact with protein A
Synthetic Lethal Mutations: A phenomenon, called synthetic lethality, produces a phenotypic effect opposite to that of suppression
In this case, the deleterious effect of one mutation is greatly exacerbated (rather than suppressed) by a second mutation in the same or a related gene
One situation in which such synthetic lethal mutations can occur
A variety of techniques often referred to as recombinant DNA technology, are used in DNA cloning, which permits researchers to prepare large numbers of identical DNA molecules
Recombinant DNA is simply any DNA molecule composed of sequences derived from different sources
A major objective of DNA cloning is to obtain discrete, small regions of an organism’s DNA that constitute specific genes
Only relatively small DNA molecules can be cloned in any of the available vectors
Cutting DNA Molecules into Small Fragments: Restriction enzymes are endonucleases produced by bacteria that typically recognize specific 4- to 8-bp sequences, called restriction sites, and then cleave both DNA strands at this site
Restriction sites commonly are short palindromic sequences; that is, the restriction-site sequence is the same on each DNA strand when read in the 5’ → 3’ direction
Many restriction enzymes make staggered cuts in the two DNA strands at their recognition site, generating fragments that have a single-stranded “tail” at both ends
Inserting DNA Fragments into Vectors: DNA fragments with either sticky ends or blunt ends can be inserted into vector DNA with the aid of DNA ligases
During normal DNA replication, DNA ligase catalyzes the end-to-end joining (ligation) of short fragments of DNA, called Okazaki fragments
Plasmids are circular, double-stranded DNA (dsDNA) molecules that are separate from a cell’s chromosomal DNA
These extrachromosomal DNAs, which occur naturally in bacteria and in lower eukaryotic cells (e.g., yeast), exist in a parasitic or symbiotic relationship with their host cell
The plasmids most commonly used in recombinant DNA technology are those that replicate in E. coli. Investigators have engineered these plasmids to optimize their use as vectors in DNA cloning
DNA fragments from a few base pairs up to ≈20 kb commonly are inserted into plasmid vectors
If special precautions are taken to avoid manipulations that might mechanically break DNA, even longer DNA fragments can be inserted into a plasmid vector
When a recombinant plasmid with an inserted DNA fragment transforms an E. coli cell, all the antibiotic-resistant progeny cells that arise from the initially transformed cell will contain plasmids with the same inserted DNA
Vectors constructed from bacteriophage are about a thousand times more efficient than plasmid vectors in cloning large numbers of DNA fragments
Phage vectors have been widely used to generate DNA libraries, comprehensive collections of DNA fragments representing the genome or expressed mRNAs of an organism
A λ virion consists of a head, which contains the phage DNA genome, and a tail, which functions in infecting E. coli host cells
The λ genes encoding the head and tail proteins, as well as various proteins involved in phage DNA replication and cell lysis, are grouped in discrete regions of the ≈50-kb viral genome
It is technically feasible to use λ phage cloning vectors to generate a genomic library, that is, a collection of λ clones that collectively represent all the DNA sequences in the genome of a particular organism
The first step in preparing a cDNA library is to isolate the total mRNA from the cell type or tissue of interest
Because of their poly(A) tails, mRNAs are easily separated from the much more prevalent rRNAs and tRNAs present in a cell extract by use of a column to which short strings of thymidylate (oligo-dTs) are linked to the matrix
Each plaque arises from a single recombinant phage, all the progeny λ phages that develop are genetically identical and constitute a clone carrying a cDNA derived from a single mRNA; collectively they constitute a λ cDNA library
One feature of cDNA libraries arises because different genes are transcribed at very different rates
Both genomic and cDNA libraries of various organisms contain hundreds of thousands to upwards of a million individual clones in the case of higher eukaryotes
The basis for screening with oligonucleotide probes is the hybridization
The ability of complementary single-stranded DNA or RNA molecules to associate (hybridize) specifically with each other via base pairing
Clearly, the identification of specific clones by the membrane-hybridization technique depends on the availability of complementary radiolabeled probes
For an oligonucleotide to be useful as a probe, it must be long enough for its sequence to occur uniquely in the clone of interest and not in any other clones
Chemical synthesis of single-stranded DNA probes of defined sequence can be accomplished by the series of reactions
In some cases, a DNA library can be screened for the ability to express a functional protein that complements a recessive mutation
Such a screening strategy would be an efficient way to isolate a cloned gene that corresponds to an interesting recessive mutation identified in an experimental organism
Libraries constructed for the purpose of screening among yeast gene sequences usually are constructed from genomic DNA rather than cDNA
To increase the probability that all regions of the yeast genome are successfully cloned and represented in the plasmid library
The genomic DNA usually is only partially digested to yield overlapping restriction fragments of ≈10 kb
In order to manipulate or sequence a cloned DNA fragment, it first must be separated from the vector DNA
This can be accomplished by cutting the recombinant DNA clone with the same restriction enzyme used to produce the recombinant vectors originally
Near neutral pH, DNA molecules carry a large negative charge and therefore move toward the positive electrode during gel electrophoresis
A common method for visualizing separated DNA bands on a gel is to incubate the gel in a solution containing the fluorescent dye ethidium bromide
This planar molecule binds to DNA by intercalating between the base pairs
The complete characterization of any cloned DNA fragment requires the determination of its nucleotide sequence. F. Sanger and his colleagues developed the method
Now most commonly used to determine the exact nucleotide sequence of DNA fragments up to ≈500 nucleotides long
If the nucleotide sequences at the ends of a particular DNA region are known, the intervening fragment can be amplified directly by the polymerase chain reaction (PCR)
The PCR depends on the ability to alternately denature (melt) double-stranded DNA molecules and renature (anneal) complementary single strands in a controlled fashion
Direct Isolation of a Specific Segment of Genomic DNA: For organisms in which all or most of the genome has been sequenced, PCR amplification starting with the total genomic DNA often is the easiest way to obtain a specific DNA region of interest for cloning
Preparation of Probes: Preparation of such probes by PCR amplification requires chemical synthesis of only two relatively short primers corresponding to the two ends of the target sequence
Tagging of Genes by Insertion Mutations: Another useful application of the PCR is to amplify a “tagged” gene from the genomic DNA of a mutant strain
This approach is a simpler method for identifying genes associated with a particular mutant phenotype than the screening of a library by functional complementation
The key to this use of PCR is the ability to produce mutations by insertion of a known DNA sequence into the genome of an experimental organism
Two very sensitive methods for detecting a particular DNA or RNA sequence within a complex mixture combine separation by gel electrophoresis and hybridization with a complementary radiolabeled DNA probe
Southern Blotting The first blotting technique to be devised is known as Southern blotting after its originator E. M. Southern
This technique is capable of detecting a single specific restriction fragment in the highly complex mixture of fragments produced by cleavage of the entire human genome with a restriction enzyme
Northern Blotting One of the most basic ways to characterize a cloned gene is to determine when and where in an organism the gene is expressed
Expression of a particular gene can be followed by assaying for the corresponding mRNA by Northern blotting, named, in a play on words, after the related method of Southern blotting
The first step in producing large amounts of a low-abundance protein is to obtain a cDNA clone encoding the full-length protein by methods discussed previously
The second step is to engineer plasmid vectors that will express large amounts of the encoded protein when it is inserted into E. coli cells
One disadvantage of bacterial expression systems is that many eukaryotic proteins undergo various modifications (e.g., glycosylation, hydroxylation) after their synthesis on ribosomes
To get around this limitation, cloned genes are introduced into cultured animal cells, a process called transfection
Two common methods for transfecting animal cells differ in whether the recombinant vector DNA is or is not integrated into the host-cell genomic DNA
In both methods, cultured animal cells must be treated to facilitate their initial uptake of a recombinant plasmid vector
Transient Transfection: The simplest of the two expression methods, called transient transfection, employs a vector similar to the yeast shuttle vectors described previously
Stable Transfection (Transformation) If an introduced vector integrates into the genome of the host cell, the genome is permanently altered and the cell is said to be transformed
Integration most likely is accomplished by mammalian enzymes that function normally in DNA repair and recombination
Epitope Tagging: In addition to their use in producing proteins that are modified after translation, eukaryotic expression vectors provide an easy way to study the intracellular localization of eukaryotic proteins
In this method, a cloned cDNA is modified by fusing it to a short DNA sequence encoding an amino acid sequence recognized by a known monoclonal antibody
Proteins with similar functions often contain similar amino acid sequences that correspond to important functional domains in the three-dimensional structure of the proteins
By comparing the amino acid sequence of the protein encoded by a newly cloned gene with the sequences of proteins of known function, an investigator can look for sequence similarities that provide clues to the function of the encoded protein
Even when a protein shows no significant similarity to other proteins with the BLAST algorithm, it may nevertheless share a short sequence with other proteins that are functionally important
Such short segments recurring in many different proteins referred to as motifs, generally have similar functions
BLAST searches for related protein sequences may reveal that proteins belong to a protein family. (The corresponding genes constitute a gene family)
Protein families are thought to arise by two different evolutionary processes, gene duplication, and speciation
All the different members of the tubulin family are sufficiently similar in sequence to suggest a common ancestral sequence
All these sequences are considered to be homologous
More specifically, sequences that presumably diverged as a result of gene duplication (e.g., the - and-tubulin sequences) are described as paralogous
The complete genomic sequence of an organism contains within it the information needed to deduce the sequence of every protein made by the cells of that organism
For organisms such as bacteria and yeast, whose genomes have few introns and short intergenic regions
Most protein-coding sequences can be found simply by scanning the genomic sequence for open reading frames (ORFs) of significant length
The best gene-finding algorithms combine all the available data that might suggest the presence of a gene at a particular genomic site
The combination of genomic sequencing and gene-finding computer algorithms has yielded the complete inventory of protein-coding genes for a variety of organisms
The functions of about half the proteins encoded in these genomes are known or have been predicted on the basis of sequence comparisons
One of the surprising features of this comparison is that the number of protein-coding genes within different organisms does not seem proportional to our intuitive sense of their biological complexity
Monitoring the expression of thousands of genes simultaneously is possible with DNA microarray analysis
A DNA microarray consists of thousands of individual, closely packed gene-specific sequences attached to the surface of a glass microscopic slide
Preparation of DNA Microarrays In one method for preparing microarrays, a ≈1-kb portion of the coding region of each gene analyzed is individually amplified by PCR
In an alternative method, multiple DNA oligonucleotides, usually at least 20 nucleotides in length, are synthesized from an initial nucleotide that is covalently bound to the surface of a glass slide
The synthesis of an oligonucleotide of a specific sequence can be programmed in a small region on the surface of the slide
Effect of Carbon Source on Gene Expression in Yeast: The initial step in a microarray expression study is to prepare fluorescently labeled cDNAs corresponding to the mRNAs expressed by the cells understudy
Firm conclusions rarely can be drawn from a single microarray experiment about whether genes that exhibit similar changes in expression are co-regulated and hence likely to be closely related functionally
Genes that appear to be co-regulated in a single microarray expression experiment may undergo changes in expression for very different reasons and may actually have very different biological functions
A solution to this problem is to combine the information from a set of expression array experiments to find genes that are similarly regulated under a variety of conditions or over a period of time
The elucidation of DNA and protein sequences in recent years has led to the identification of many genes, using sequence patterns in genomic DNA and the sequence similarity of the encoded proteins with proteins of known function
Modifying the genome of the yeast Saccharomyces is particularly easy for two reasons: yeast cells readily take up exogenous DNA under certain conditions, and the introduced DNA is efficiently exchanged for the homologous chromosomal site in the recipient cell
This specific, targeted recombination of identical stretches of DNA allows any gene in yeast chromosomes to be replaced with a mutant allele
Disruption of yeast genes by this method is proving particularly useful in assessing the role of proteins identified by ORF analysis of the entire genomic DNA sequence
A large consortium of scientists has replaced each of the approximately 6000 genes identified by ORF analysis with the kanMX disruption construct and determined which gene disruptions lead to nonviable haploid spores
Although disruption of an essential gene required for cell growth will yield non-viable spores
This method provides little information about what the encoded protein actually does in cells
A useful promoter for this purpose is the yeast GAL1 promoter, which is active in cells grown on galactose but completely inactive in cells grown on glucose
In this approach, the coding sequence of an essential gene (X) ligated to the GAL1 promoter is inserted into a yeast shuttle vector
In an early application of this method, researchers explored the function of cytosolic Hsc70 genes in yeast
Haploid cells with a disruption in all four redundant Hsc70 genes were non-viable unless the cells carried a vector containing a copy of the Hsc70 gene that could be expressed from the GAL1 promoter on galactose medium
Many of the methods for disrupting genes in yeast can be applied to genes of higher eukaryotes
These genes can be introduced into the germline via homologous recombination to produce animals with a gene knockout, or simply “knockout”
Gene-targeted knockout mice are generated by a two-stage procedure
In the first stage, a DNA construct containing a disrupted allele of a particular target gene is introduced into embryonic stem (ES) cells
These cells, which are derived from the blastocyst, can be grown in culture through many generations
In the second stage in the production of knockout mice, ES cells heterozygous for a knockout mutation in gene X are injected into a recipient wild-type mouse blastocyst, which subsequently is transferred into a surrogate pseudopregnant female mouse
Investigators often are interested in examining the effects of knockout mutations in a particular tissue of the mouse, at a specific stage in development, or both
Mice carrying a germ-line knockout may have defects in numerous tissues or die before the developmental stage of interest
To address this problem, mouse geneticists have devised a clever technique to inactivate target genes in specific types of somatic cells or at particular times during development
This technique employs site-specific DNA recombination sites (called loxP sites) and the enzyme Cre that catalyzes recombination between them
The loxP-Cre recombination system is derived from bacteriophage P1, but this site-specific recombination system also functions when placed in mouse cells
For certain genes, the difficulties in producing homozygous knockout mutants can be avoided by the use of an allele carrying a dominant-negative mutation
These alleles are genetically dominant; that is, they produce a mutant phenotype even in cells carrying a wild-type copy of the gene
But unlike other types of dominant alleles, dominant-negative alleles produce a phenotype equivalent to that of a loss-of-function mutation
Useful dominant-negative alleles have been identified for a variety of genes and can be introduced into cultured cells by transfection or into the germline of mice or other organisms
Researchers are exploiting a recently discovered phenomenon known as RNA interference (RNAi) to inhibit the function of specific genes
This approach is technically simpler than the methods described above for disrupting genes
To use RNAi for intentional silencing of a gene of interest, investigators first produce dsRNA based on the sequence of the gene to be inactivated
This dsRNA is injected into the gonad of an adult worm, where it has access to the developing embryos
As the embryos develop, the mRNA molecules corresponding to the injected dsRNA are rapidly destroyed
The resulting worms display a phenotype similar to the one that would result from disruption of the corresponding gene itself
Initially, the phenomenon of RNAi was quite mysterious to geneticists. Recent studies have shown that specialized RNA-processing enzymes cleave dsRNA into short segments, which base-pair with endogenous mRNA
Inherited human diseases are the phenotypic consequence of defective human genes
Although a “disease” gene may result from a new mutation that arose in the preceding generation, most cases of inherited diseases are caused by preexisting mutant alleles that have been passed from one generation to the next for many generations
The genes responsible for inherited diseases must be found without any prior knowledge or reasonable hypotheses about the nature of the affected gene or its encoded protein
Human genetic diseases that result from a mutation in one specific gene exhibit several inheritance patterns depending on the nature and chromosomal location of the alleles that cause them
One characteristic pattern is that exhibited by a dominant allele in an autosome (that is, one of the 22 human chromosomes that are not a sex chromosome)
A recessive allele in an autosome exhibits a quite different segregation pattern. For an autosomal recessive allele, both parents must be heterozygous carriers of the allele in order for their children to be at risk of being affected by the disease
The independent segregation of chromosomes during meiosis provides the basis for determining whether genes are on the same or different chromosomes
Genetic traits that segregate together during meiosis more frequently than expected from random segregation are controlled by genes located on the same chromosome
The presence of many different already mapped genetic traits, or markers, distributed along the length of a chromosome facilitates the mapping of a new mutation by assessing its possible linkage to these marker genes in appropriate crosses
The more markers that are available, the more precisely a mutation can be mapped
Many different genetic markers are needed to construct a high-resolution genetic map
In the experimental organisms commonly used in genetic studies, numerous markers with easily detectable phenotypes are readily available for genetic mapping of mutations
Restriction fragment length polymorphisms (RFLPs) were the first type of molecular markers used in linkage studies
RFLPs arise because mutations can create or destroy the sites recognized by specific restriction enzymes, leading to variations between individuals in the length of restriction fragments produced from identical regions of the genome
How the allele conferring a particular dominant trait (e.g., familial hypercholesterolemia) might be mapped
The first step is to obtain DNA samples from all the members of a family containing individuals that exhibit the disease
The DNA from each affected and unaffected individual then is analyzed to determine the identity of a large number of known DNA polymorphisms (either SSR or SNP markers can be used)
The segregation pattern of each DNA polymorphism within the family is then compared with the segregation of the disease under study to find those polymorphisms that tend to segregate along with the disease
Lastly, computer analysis of the segregation data is used to calculate the likelihood of linkage between each DNA polymorphism and the disease-causing allele
A phenomenon called linkage disequilibrium is the basis for an alternative strategy, which in some cases can afford a higher degree of resolution in mapping studies
Although linkage mapping can usually locate a human disease gene to a region containing about 7.5 x 105 base pairs, as many as 50 different genes may be located in a region of this size
The ultimate objective of a mapping study is to locate the gene within a cloned segment of DNA and then to determine the nucleotide sequence of this fragment
In many cases, point mutations that give rise to disease-causing alleles may result in no detectable change in the level of expression or electrophoretic mobility of mRNAs
So if the comparison of the mRNAs expressed in normal and affected individuals reveals no detectable differences in the candidate mRNAs, a search for point mutations in the DNA regions encoding the mRNAs is undertaken
Most of the inherited human diseases that are now understood at the molecular level are monogenetic traits
That is, a clearly discernible disease state is produced by the presence of a defect in a single gene
Monogenic diseases caused by a mutation in one specific gene exhibit
Many other inherited diseases show more complicated patterns of inheritance, making the identification of the underlying genetic cause much more difficult
Human geneticists used two different approaches to identify the many genes associated with retinitis pigmentosa
A further complication in the genetic dissection of human diseases is posed by diabetes, heart disease, obesity, predisposition to cancer, and a variety of mental disorders that have at least some heritable properties
Models of human disease in experimental organisms may also contribute to unraveling the genetics of complex traits such as obesity or diabetes