CH8.1 - Genomes and Genomics
Genomes and Genomics
Genome Definition: All hereditary information specifying an organism.
Haploids: Complete hereditary information.
Diploids: Complete haploid complement.
Examples of Genomes:
Viral Genome (SFP10 Bacteriophage):
Double-stranded DNA.
Size: 157 kilobases (157,950 base pairs).
Haploid (one copy of each gene).
E. coli Genome:
Size: 4.6 megabases (4,639,221 base pairs).
Haploid (one copy of each gene).
Human Genome:
Diploid (2n).
Karyotype: Shows metaphase-arrested chromosomes.
Two copies of each autosome (chromosomes 1-22).
Sex chromosomes (X and Y in males).
Genome: 22 autosomes + 2 sex chromosomes (one of each pair).
Increase in Genome Sequencing:
Exponential increase over the past 25 years.
Data from NCBI (National Center for Biotechnology Information).
1980s: Few genomes sequenced.
Exponential increase in viruses, prokaryotes, and eukaryotes.
Landmark Genome Sequencing Events:
First genome sequenced: Bacteriophage (small genome).
First cellular organism: Haemophilus influenza (bacteria, 1995).
First eukaryotic organism: Saccharomyces cerevisiae (yeast, 1996).
Human genome: Completed in 2003.
Took 13 years and $1 billion.
Now: Less than 24 hours and around $1,000.
Genome Annotation:
Locating genes and critical sequences (regulatory sequences).
Assigning putative functions.
Computational Approaches:
Identifying genes and regulatory sequences by comparison with previously studied genomes.
Algorithms to identify gene-like features:
Start codon.
Shine-Dalgarno sequence.
Codons and stop codons.
Consensus sequences (promoter sequences).
Minus 10 consensus (TATAAT).
Minus 35 consensus.
Intrinsic terminator (base complementarity followed by T's).
Comparative Genomics:
Using previously studied genomes to assign gene functions.
Looking for genes with homology (sequence similarity).
Homologs: Genes with sequence similarity; may suggest evolutionary relationship or similar function.
Orthologs: Homologs with clear functional relationship (same function in different species).
Synteny:
Conservation of gene order on the chromosome in different species.
Example: Homologs between humans and mice with similar order on chromosomes.
Unassigned Functions:
Some genes still have no assigned function, leaving room for discovery.
Types of Sequences in Genomes:
Coding sequences: Direct synthesis of proteins (only 1.5% in humans).
Non-coding sequences: Much more abundant than coding sequences.
Transposons and retrotransposons: ~45% of the human genome.
SINEs (Short Interspersed Nuclear Elements).
LINEs (Long Interspersed Nuclear Elements).
Additional repetitive sequences.
Miscellaneous unique sequences.
Introns: Non-coding sequences that interrupt exons in pre-mRNA (removed during mRNA processing).
Variation in Human Genomes:
Genotypic differences (nucleotide level) may or may not produce phenotypic differences.
Single Nucleotide Polymorphism (SNP):
Single nucleotide change.
Approximately one SNP per 1,000 base pairs in the human genome.
Example: Four individuals with a SNP; one has T, others have C.
Haplotype:
Groups of SNPs or other genetic variations close together on a chromosome.
Tend to segregate together (not separated by recombination).
Tag SNP:
Particular SNP bordering a haplotype.
Indicates what the rest of the haplotype sequence looks like.
Haplotypes as markers for certain populations.
Human vs. Chimpanzee Genomes:
Total genomic difference: 4%.
Differences include SNPs, chromosomal insertions, duplications, and other arrangements.
Duplications can come from transposons.
Determining if differences occurred in humans or chimpanzees:
Compare sequences to a distantly related outgroup (e.g., orangutan).
Example: Gene X sequence differs between humans and chimps, orangutan matches chimps, so the change occurred in the human lineage.
Molecular Basis of Human Genetic Disease:
Linkage Analysis:
Mapping disease condition relative to genetic polymorphisms (SNPs, deletions).
Example: Early-onset Alzheimer's linked to PS1 gene on chromosome 14.
Process:
Look at pedigrees of affected families.
Collect DNA from affected and unaffected individuals.
Follow which SNPs or haplotypes the disease segregates with.
SNPs and genetic markers from chromosome 14 (D14S43 SNP) segregated with Alzheimer's.
Researchers focused on the D14S43 region of chromosome 14, finding 19 expressed genes.
Mutations in S182 (later named PS1) were present in affected individuals, not in unaffected.
Looking at affected versus unaffected genomes and disease segregation with SNPs and other genetic markers.