Human Genome Evolution and Replication Notes

Human Genome Evolution

  • Anatomy of the human genome.
  • Genome replication and evolution.
  • Genetic Ancestry.

Human Genome

  • We have trillions of cells, each containing 46 chromosomes.
  • The human genome comprises 3.2 billion base pairs of DNA.
  • There are 20,000 to 25,000 protein-coding genes and tens of thousands of RNA genes.

Organisation of the Human Genome

Mitochondrial Genome

  • The mitochondrial genome is 16,569 base pairs long with 93% coding sequence.
  • It contains protein-coding genes for 13 respiratory chain subunits.
  • It also includes tRNA genes for each amino acid and 12S and 16S rRNA genes. (Refer to mitochondrial genetic code).
  • Human cells have approximately 5,000 to 10,000 mitochondrial DNA (mtDNA) copies on average.
  • mtDNA is inherited maternally (sperm cells contribute only nuclear DNA).
  • mtDNA shows maternal inheritance.
  • Genetic drift has caused differences in mtDNA codon usage compared to the universal genetic code.

The Nuclear Genome

  • The nuclear genome consists of 24 chromosomes (22 autosomes, X and Y).
  • It contains 3000 Mb of euchromatin (light bands) which has been sequenced.
  • There is 200 Mb of constitutive heterochromatin (dark bands).
  • There is variation across the genome in AT/GC content.

Human RNA Genes

  • Ribosomal RNA (rRNA) genes are involved in translation.
  • Transfer RNA (tRNA) genes are involved in translation.
  • Small nuclear RNA (snRNA) genes are involved in spliceosome differential splicing.
  • Small nucleolar RNA (snoRNA) genes are found in the nucleolus and edit bases in RNA sequence.
  • Regulatory RNA molecules
  • MicroRNAs are transcribed as 70-80 nucleotide sequences that fold in on themselves, are exported, and have multiple binding sites for mRNA.

Transfer RNA (tRNA) Genes

  • There are 497 nuclear tRNA genes grouped into 49 families.
  • There are 324 pseudogenes (false genes), which are similar but non-functional. Having more copies helps limit the effect of mutation.
  • They are dispersed throughout the genome.
  • Two major clusters are located on chromosomes 1 and 6.

Regulatory RNA Molecules

  • miRNAs: Approximately 2000 miRNAs control the expression of about a third of protein-coding genes.
  • Moderate to large-sized non-coding RNAs (e.g., XIST involved in X-inactivation) produce RNA that acts as an adaptor and condenses chromatin, stopping gene expression.
  • Antisense regulatory RNAs (e.g., TSIX involved in regulating XIST).

MicroRNAs (miRNAs)

  • miRNAs are ~22 nucleotide-long RNA molecules acting as antisense regulators.
  • They derive from ~70 nucleotide-long precursors having a hairpin structure cleaved by a ribonuclease III (dicer).
  • Many are developmentally regulated and control development.
  • They bind to complementary sequences in the 3’ UTRs of target mRNAs.

miRNAs regulate gene expression at a post-transcriptional level

  • They act through translation repression and mRNA destabilization.

Functional Gene Families

  • Functionally identical genes: identical gene copies tend to be clustered.
  • Functionally similar genes: tend to be clustered.
  • Functionally related genes.

Interspersed Repetitive Non-coding DNA

  • Up to 45% of the human genome comprises transposon-derived repeats.
  • Most arose through RNA intermediates.
  • Retrotransposons.
  • DNA transposons.
  • LTR transposons.

LINE-1: A Retrotransposon

  • Autonomous transposable elements.
  • Encode gene products necessary for retrotransposition.
  • RNA-binding protein (p40)
  • Endonuclease/reverse transcriptase
  • LINE1: transcribed then translated
  • LINE1 RNA forms a complex with its encoded proteins and moves to the nucleus

Pseudogenes and Gene Fragments

  • Non-processed pseudogenes: defective gene copies copied at the DNA level by tandem gene duplication.
  • Processed pseudogenes: genes copied at the cDNA level by retrotransposition by the LINE1 machinery.

Summary: Human Genome Organisation

  • Mitochondrial and nuclear genomes
  • RNA genes and gene families
  • Protein-coding gene families
  • Repetitive sequence elements

Genome Evolution: DNA Replication

  • The coiled DNA is unwound by helicase.
  • The leading strand acts as a continuous template for DNA synthesis by DNA polymerase ee.
  • The lagging strand is copied discontinuously as Okazaki fragments by Pol dd.
  • Synthesis is primed by RNA primers produced by the Pol aa at the replication fork.
  • Ligase joins the Okazaki fragments following removal of the primers by FEN1 after displacement by the extending DNA pol dd.

The DNA Polymerase Active Site

  • Two metal ions are held in place by two highly conserved Aspartate residues.

  • Metal ion A: reduces the affinity of the 3' OH for its H producing a nucleophilic 3'O-.

  • Metal ion B: co-ordinates the negative charge of the bb and gg phosphates of the incoming dNTP and stabilises the pyrophosphate resulting from the catalysis.

  • Correct base pairing provides the optimum positioning for catalysis (a).

  • Mispairing makes the spacing catalytically unfavourable i.e. a mismatch slows catalysis (b).

  • Spontaneous replication errors

  • Environmental mutagens

Mechanisms of DNA Repair

Evolution of Gene Families

  • Gene duplications (facilitated by the multiple copies of genome-wide repeats).
  • Exon duplication (facilitated by the multiple copies of genome-wide repeats).
  • Exon shuffling (facilitated by autonomously transposing elements (genome-wide repeats)).

Homologous Recombination

  • Occurs between sequences that have significant homology.
  • Cross-strand Exchange occurs during meiosis.
  • Increases genetic diversity down the generations.

Hardy-Weinberg

  • If the genotype proportions in the following generations are the same as the parental generation, then there is no evolution.
  • The population is at Hardy-Weinberg Equilibrium.
  • If genotype proportions are not at equilibrium, then evolution is occurring.
  • Assumptions: infinite number of randomly mating sexually reproducing diploid organisms; no selection, no mutation, no migration.

Mutations

  • Germline vs. somatic: evolution only follows from changes in the germline.
  • Fate of a mutation: fixation in a population (rare) or loss (more likely). (If a mutation has no advantage, the probability of fixation is equal to its frequency in a population - so the smaller the population, the greater the chance of fixation).

Evolution of the Globin Gene Family

  • Involved in oxygen homeostasis.
  • Iron-containing metalloproteins.
  • Heme group with a porphyrin ring coordinating an iron ion.

Evolution of the Globin Gene Family

  • Involved in oxygen homeostasis.
  • Iron-containing metalloproteins.
  • Heme group with a porphyrin ring coordinating an iron ion.
  • Multiple gene duplication events throughout evolutionary history.
  • Mutation produces sequence divergence. The further back in time the duplication happened, the more the sequences have diverged.

Evolution of the Coagulation Cascade Proteins

  • Platelet Activation and Factors for Clot Formation

Exon shuffling (facilitated by autonomously transposing elements) genome-wide repeats).

  • LINE1 transposons facilitate 3' transduction.
  • Exon duplication (facilitated by the multiple copies of genome-wide repeats).

Domain Structure of Coagulation Proteins

Evolution of Gene Families

  • Gene duplications (facilitated by the multiple copies of genome-wide repeats).
  • Exon duplication (facilitated by the multiple copies of genome-wide repeats).
  • Exon shuffling (facilitated by autonomously transposing elements (genome-wide repeats)).

DNA Markers Autosomal markers:

  • SNPS (single nucleotide polymorphisms)
  • Microsatellites (short tandem repeats (STRs))
  • Alu insertions Non-recombining sequences:
  • Mitochondrial DNA
  • Y chromosome STRs Haploid sequences (single copy) having a haplotype (does not recombine). Variation is derived from mutation.

Tandem Repeat Variants

  • Simple sequence length polymorphisms (SSLPs), such as short tandem repeats, are made up of multiple copies of a simple repeating DNA sequence (such GATA, a common tetranucleotide repeat). They occur on all fo our chromosomes.
  • An allele has a specific number of repeats: e.g. allele 1 has 4 repeats and allele 2 has 8 repeats.
  • SSLPs can arise through slippage during replication of repetitive template sequence.

Multiplex PCR of STRs: Electropherogram

  • STR analysis can be used in human identification.
  • It can also be used in diagnostics to look for maternal contamination of prenatal samples or follow the efficacy of stem cell transplantation.
  • UK DNA17 marker Set

Ancestry Informative SNP Markers

  • An ancestry informative marker (AIM) has alleles that exhibit different frequencies in different populations.
  • We have around 15 million SNPs in our genome, a proportion of which are ancestry informative.

The Modern Human Genome

  • Tracing the peopling of the world through genomics

Next-generation sequencing (NGS)

  • A generic term that includes a number of different sequencing technologies
  • The common strategy in all these technologies is to fragment the DNA to be sequenced into small overlapping fragments.
  • Each fragment is sequenced and the overall sequence pieced back together.
  • The number of times a base is read in different fragments is called the read depth.

The Modern Human Genome

  • Genomes have been sequenced from modern human populations around the world and ancient Neanderthal and Denisovans.
  • There is evidence that ancient hominin populations (modern humans, Neanderthals, Denisovans, and possibly as yet undiscovered hominins) split, diverged, and then reconnected (introgression).
  • Our genetic variation can be traced back to multiple ancient populations.

Genome Replication and Evolution

  • Replication and genome instability.
  • Homologous recombination and genome evolution.
  • Ancestry and DNA markers.
  • Gene flow among hominins.