Human Genome Evolution and Replication Notes
Human Genome Evolution
- Anatomy of the human genome.
- Genome replication and evolution.
- Genetic Ancestry.
Human Genome
- We have trillions of cells, each containing 46 chromosomes.
- The human genome comprises 3.2 billion base pairs of DNA.
- There are 20,000 to 25,000 protein-coding genes and tens of thousands of RNA genes.
Organisation of the Human Genome
Mitochondrial Genome
- The mitochondrial genome is 16,569 base pairs long with 93% coding sequence.
- It contains protein-coding genes for 13 respiratory chain subunits.
- It also includes tRNA genes for each amino acid and 12S and 16S rRNA genes. (Refer to mitochondrial genetic code).
- Human cells have approximately 5,000 to 10,000 mitochondrial DNA (mtDNA) copies on average.
- mtDNA is inherited maternally (sperm cells contribute only nuclear DNA).
- mtDNA shows maternal inheritance.
- Genetic drift has caused differences in mtDNA codon usage compared to the universal genetic code.
The Nuclear Genome
- The nuclear genome consists of 24 chromosomes (22 autosomes, X and Y).
- It contains 3000 Mb of euchromatin (light bands) which has been sequenced.
- There is 200 Mb of constitutive heterochromatin (dark bands).
- There is variation across the genome in AT/GC content.
Human RNA Genes
- Ribosomal RNA (rRNA) genes are involved in translation.
- Transfer RNA (tRNA) genes are involved in translation.
- Small nuclear RNA (snRNA) genes are involved in spliceosome differential splicing.
- Small nucleolar RNA (snoRNA) genes are found in the nucleolus and edit bases in RNA sequence.
- Regulatory RNA molecules
- MicroRNAs are transcribed as 70-80 nucleotide sequences that fold in on themselves, are exported, and have multiple binding sites for mRNA.
Transfer RNA (tRNA) Genes
- There are 497 nuclear tRNA genes grouped into 49 families.
- There are 324 pseudogenes (false genes), which are similar but non-functional. Having more copies helps limit the effect of mutation.
- They are dispersed throughout the genome.
- Two major clusters are located on chromosomes 1 and 6.
Regulatory RNA Molecules
- miRNAs: Approximately 2000 miRNAs control the expression of about a third of protein-coding genes.
- Moderate to large-sized non-coding RNAs (e.g., XIST involved in X-inactivation) produce RNA that acts as an adaptor and condenses chromatin, stopping gene expression.
- Antisense regulatory RNAs (e.g., TSIX involved in regulating XIST).
MicroRNAs (miRNAs)
- miRNAs are ~22 nucleotide-long RNA molecules acting as antisense regulators.
- They derive from ~70 nucleotide-long precursors having a hairpin structure cleaved by a ribonuclease III (dicer).
- Many are developmentally regulated and control development.
- They bind to complementary sequences in the 3’ UTRs of target mRNAs.
miRNAs regulate gene expression at a post-transcriptional level
- They act through translation repression and mRNA destabilization.
Functional Gene Families
- Functionally identical genes: identical gene copies tend to be clustered.
- Functionally similar genes: tend to be clustered.
- Functionally related genes.
Interspersed Repetitive Non-coding DNA
- Up to 45% of the human genome comprises transposon-derived repeats.
- Most arose through RNA intermediates.
- Retrotransposons.
- DNA transposons.
- LTR transposons.
LINE-1: A Retrotransposon
- Autonomous transposable elements.
- Encode gene products necessary for retrotransposition.
- RNA-binding protein (p40)
- Endonuclease/reverse transcriptase
- LINE1: transcribed then translated
- LINE1 RNA forms a complex with its encoded proteins and moves to the nucleus
Pseudogenes and Gene Fragments
- Non-processed pseudogenes: defective gene copies copied at the DNA level by tandem gene duplication.
- Processed pseudogenes: genes copied at the cDNA level by retrotransposition by the LINE1 machinery.
Summary: Human Genome Organisation
- Mitochondrial and nuclear genomes
- RNA genes and gene families
- Protein-coding gene families
- Repetitive sequence elements
Genome Evolution: DNA Replication
- The coiled DNA is unwound by helicase.
- The leading strand acts as a continuous template for DNA synthesis by DNA polymerase .
- The lagging strand is copied discontinuously as Okazaki fragments by Pol .
- Synthesis is primed by RNA primers produced by the Pol at the replication fork.
- Ligase joins the Okazaki fragments following removal of the primers by FEN1 after displacement by the extending DNA pol .
The DNA Polymerase Active Site
Two metal ions are held in place by two highly conserved Aspartate residues.
Metal ion A: reduces the affinity of the 3' OH for its H producing a nucleophilic 3'O-.
Metal ion B: co-ordinates the negative charge of the and phosphates of the incoming dNTP and stabilises the pyrophosphate resulting from the catalysis.
Correct base pairing provides the optimum positioning for catalysis (a).
Mispairing makes the spacing catalytically unfavourable i.e. a mismatch slows catalysis (b).
Spontaneous replication errors
Environmental mutagens
Mechanisms of DNA Repair
Evolution of Gene Families
- Gene duplications (facilitated by the multiple copies of genome-wide repeats).
- Exon duplication (facilitated by the multiple copies of genome-wide repeats).
- Exon shuffling (facilitated by autonomously transposing elements (genome-wide repeats)).
Homologous Recombination
- Occurs between sequences that have significant homology.
- Cross-strand Exchange occurs during meiosis.
- Increases genetic diversity down the generations.
Hardy-Weinberg
- If the genotype proportions in the following generations are the same as the parental generation, then there is no evolution.
- The population is at Hardy-Weinberg Equilibrium.
- If genotype proportions are not at equilibrium, then evolution is occurring.
- Assumptions: infinite number of randomly mating sexually reproducing diploid organisms; no selection, no mutation, no migration.
Mutations
- Germline vs. somatic: evolution only follows from changes in the germline.
- Fate of a mutation: fixation in a population (rare) or loss (more likely). (If a mutation has no advantage, the probability of fixation is equal to its frequency in a population - so the smaller the population, the greater the chance of fixation).
Evolution of the Globin Gene Family
- Involved in oxygen homeostasis.
- Iron-containing metalloproteins.
- Heme group with a porphyrin ring coordinating an iron ion.
Evolution of the Globin Gene Family
- Involved in oxygen homeostasis.
- Iron-containing metalloproteins.
- Heme group with a porphyrin ring coordinating an iron ion.
- Multiple gene duplication events throughout evolutionary history.
- Mutation produces sequence divergence. The further back in time the duplication happened, the more the sequences have diverged.
Evolution of the Coagulation Cascade Proteins
- Platelet Activation and Factors for Clot Formation
Exon shuffling (facilitated by autonomously transposing elements) genome-wide repeats).
- LINE1 transposons facilitate 3' transduction.
- Exon duplication (facilitated by the multiple copies of genome-wide repeats).
Domain Structure of Coagulation Proteins
Evolution of Gene Families
- Gene duplications (facilitated by the multiple copies of genome-wide repeats).
- Exon duplication (facilitated by the multiple copies of genome-wide repeats).
- Exon shuffling (facilitated by autonomously transposing elements (genome-wide repeats)).
DNA Markers Autosomal markers:
- SNPS (single nucleotide polymorphisms)
- Microsatellites (short tandem repeats (STRs))
- Alu insertions Non-recombining sequences:
- Mitochondrial DNA
- Y chromosome STRs Haploid sequences (single copy) having a haplotype (does not recombine). Variation is derived from mutation.
Tandem Repeat Variants
- Simple sequence length polymorphisms (SSLPs), such as short tandem repeats, are made up of multiple copies of a simple repeating DNA sequence (such GATA, a common tetranucleotide repeat). They occur on all fo our chromosomes.
- An allele has a specific number of repeats: e.g. allele 1 has 4 repeats and allele 2 has 8 repeats.
- SSLPs can arise through slippage during replication of repetitive template sequence.
Multiplex PCR of STRs: Electropherogram
- STR analysis can be used in human identification.
- It can also be used in diagnostics to look for maternal contamination of prenatal samples or follow the efficacy of stem cell transplantation.
- UK DNA17 marker Set
Ancestry Informative SNP Markers
- An ancestry informative marker (AIM) has alleles that exhibit different frequencies in different populations.
- We have around 15 million SNPs in our genome, a proportion of which are ancestry informative.
The Modern Human Genome
- Tracing the peopling of the world through genomics
Next-generation sequencing (NGS)
- A generic term that includes a number of different sequencing technologies
- The common strategy in all these technologies is to fragment the DNA to be sequenced into small overlapping fragments.
- Each fragment is sequenced and the overall sequence pieced back together.
- The number of times a base is read in different fragments is called the read depth.
The Modern Human Genome
- Genomes have been sequenced from modern human populations around the world and ancient Neanderthal and Denisovans.
- There is evidence that ancient hominin populations (modern humans, Neanderthals, Denisovans, and possibly as yet undiscovered hominins) split, diverged, and then reconnected (introgression).
- Our genetic variation can be traced back to multiple ancient populations.
Genome Replication and Evolution
- Replication and genome instability.
- Homologous recombination and genome evolution.
- Ancestry and DNA markers.
- Gene flow among hominins.