KM

Genome Content and Organisation Notes

Genes and Genomes: An Overview

  • Genome: Entire DNA content, including genes and intergenic regions.

  • Transcriptome: Entire set of gene transcripts (mRNAs) in a cell.

  • Proteome: Entire set of proteins encoded by a genome.

The Human Genome

  • Size: 3.3 x 10^9 basepairs (over 3 billion).

  • Chromosomes: 24 (22 autosomes, 2 sex chromosomes).

  • Protein-coding exons: Roughly 2%.

  • Number of genes: Approximately 20,000.

Mitochondrial Genome

  • Size: 16,569 base pairs.

  • Structure: One circular chromosome.

  • Copies per mitochondrion: 10,000.

  • Genes: 37 (13 protein-coding, 2 ribosomal RNA, 22 transfer RNAs).

Genome Size and Genetic Complexity

  • No simple correlation between genome size (C value) and genetic complexity (gene number).

  • C-value paradox: the amount of DNA in a haploid genome does not correlate with the complexity of the organism.

  • Gene density and the proportion of coding vs. non-coding DNA are important factors.

Genome Content Percentages

  • Exons: 2%

  • Introns: 24%

  • Large duplications: 5%

  • Simple repeats: 3%

  • Retrotransposons: 45%

  • Other intergenic DNA: 21%

Repetitive vs. Non-Repetitive DNA

  • Repetitive DNA: Sequences (5-5000 bp) present in multiple copies, usually non-coding.

    • Examples: centromeres, telomeres.

  • Non-repetitive DNA: Unique sequences, one copy per haploid genome.

    • Includes >98% of protein-encoding genes, gene clusters.

Non-Repetitive DNA

  • Protein-coding genes and genes encoding certain RNAs.

  • Gene clusters/families: Sets of genes with similar functions (e.g., globin genes).

  • Pseudogenes: Non-functional gene sequences similar to functional genes.

Gene Families

  • Arise through duplication during evolution.

  • Related on sequence level, perform similar functions.

  • Example: Globin family (myoglobin vs. hemoglobin).

Pseudogenes

  • Non-repetitive DNA components.

  • Sequence similar to functional genes but do not produce functional products.

  • Types: processed (from reverse transcription) and non-processed (from gene duplication or mutation).

Repetitive DNA Details

  • Present in multiple copies.

  • Located in specific regions (centromeres, telomeres).

  • Often lack function.

Types of Repetitive DNA

  • Highly repetitive: Satellite DNA, tandem repeats.

  • Middle repetitive: Interspersed retrotransposons (SINEs, LINEs), multiple copy genes.

Repetition Frequency (f)

  • Number of times a DNA sequence appears in a haploid genome.

  • Different types of repetitive DNA have different f values.

  • Highly repetitive: >10^6 (simple sequence).

  • Middle repetitive: 10^3-10^5.

Repetitive DNA: Satellite DNA

  • Satellite: e.g., human alphoid DNA repeat.

  • Minisatellite: e.g., human telomeres (TTAGGG repeats).

  • Microsatellite: Found in genes; e.g., CAG repeats in Huntington's disease.

Repetitive DNA: Interspersed Genome-Wide Repeats

  • LINEs (Long Interspersed Nuclear Elements): e.g., human LINE-1.

  • SINEs (Short Interspersed Nuclear Elements): e.g., human Alu sequence (10% of the human genome).

Summary of Genome Organization

  • Cells contain nuclear and mitochondrial genomes.

  • Genome size does not correlate well with genome complexity.

  • Genomes are classified into repetitive and non-repetitive sequences.

  • Non-repetitive sequences are the majority in higher mammals.

  • Non-repetitive sequences have structural roles in genome organization.