1/44
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
what is a gene?
a defined stretch of DNA encoding a specific protein or ribonucleic acid
or can be considered as an open reading frame (ORF), as identified in genome sequences often by computers
ENCODE project gene definition
a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and/or other functional sequence regions
what is a genome?
the entire genetic component of a living organism
does a species have a genome?
different strains of E coli often share <40% of their genes
prokaryotic genomes are evolutionary mosaics
concept of prokaryotic species is problematic
is there a human genome?
yes and no
publicly sequenced genome is a composite of data from 4 DNA samples chosen at random from dozens of samples provided by anonymous volunteers
genes vary between individuals and within individuals
is the genome of an individual stable?
no, copy number variation (CNV) between and within individuals (i.e. in the brain)
cancer
genomics
sequencing genomes of cultured microorganisms, animal/plant tissue samples, etc.
re-sequencing
a part of genomics, determining the sequence of a genome for the purposes of comparison to a reference genome (e.g. multiple human genomes, strains of E coli, etc)
transcriptomics
traditional approach: sequencing of ESTs (expressed sequence tags—cDNA derived from mRNAs)
evolved into RNA-seq technology
RNA-seq technology
whole transcriptosome shotgun sequencing, which uses next-gen technologies to provide a comprehensive picture of the RNA present in a sample
has begun to replace microarray technologies as a tool of choice for gene expression studies
metagenomics
culture-independent sequencing
DNA isolated from a community of organisms, shotgun sequenced, and assembled
goal is to gain insight into the composition of the microbial community
community can be anything: soil, water, air, gut contents, feces, etc
proteomics
generally refers to large-scale experimental analysis of proteins in complex mixtures, but smaller partially purified samples can also be examined
large-scale studies combine fragmentation of proteins, separation of proteins by liquid chromatography and detection by tandem mass spectrometry
requires a reference genome with predicted proteins, which are digested in silico. peptide fragments from the biological sample are compared to the in silico predictions—if a match, then protein is present
metabolomics
study of small-molecule metabolite profiles using gas chromatography, high performance liquid chromatography, etc for sample separation, and mass spectrometry for detection
can provide insight into the physiology of the cell at the time the sample was taken
where do genes and genomes come from?
from the first living things
through comparing extant (living) and extinct (via fossils and indirect evidence), we infer that shared traits originated in a common ancestor
feasibility experiments
test hypotheses about the origin of life by demonstration of similar events in the lab
e.g. Miller-Urey experiment of 1952: water + methane + ammonia + hydrogen + electricity = amino acids
relics
examine fossils (biological or chemical) or putatively primitive features of modern organisms
e.g. organisms that were around ~3 billion years ago
fossilized stromatolites suggest cyanobacteria were around ~2.7 BYA
2-methylhopanoids serve as membrane biomarkers for cyanobacterial oxygenic photosynthesis at around that time
comparative biology
compare and contrast properties of living beings, infer what the biology of the last universal common ancestor (LUCA) might have been like
e.g. which came first, RNA or DNA? what was the genetic code like?
ancient DNA research
genomes of extinct organisms (e.g. neanderthals, mammoths, etc) have been sequenced but DNA degrades very quickly (upper limit of DNA preservation is thought to be ~1 million years)
RNA world
era during which the genetic information resided in the sequence of RNA molecules and the phenotype derived from the catalytic properties of RNA
preceded DNA and protein based life
which came first, DNA or protein?
need DNA to store information to make proteins, but need protein to synthesize DNA
likely, neither came first
likely was RNA with catalytic activity
RNA with catalytic activity and origin of life
can replicate itself and synthesize peptides
with mutation, chemical selection leads to proteins, enzymes, complex systems, which eventually leads to cells and natural selection
compartmentalization and the RNA world
important, served to bring metabolites in close proximity
hard to reason how complex metabolism evolved otherwise
RNA world stages
pre-biotic chemistry → nucleic acid precursors → polymerization catalyzed by minerals → random sequence RNAs, some with very modest self-replication ability → better and better mutant replicators selected (darwinian evolution begins here) → RNAs with metabolic activity, specialized replicators → self-assembled membrane, metabolism, RNA organism
examples of RNA-mediated biochemistry found
all across the tree of life (e.g. importance of non-coding RNAs in gene expression, genome defense, etc)
RNA may have been preceded by
some other replicating, catalytic molecule, in the same way that DNA and proteins seem to have been preceded by RNA
while the modern cellular world is one of ribonucleoproteins,
most enzymes are proteins (20 amino acid character states, much greater potential for structural/functional diversity)
last universal common ancestor (LUCA)
can make inferences on this based on protein/DNA sequence similarity, such as that it had a double stranded DNA genome, genes involved in housekeeping function and core metabolic functions
housekeeping functions for genes
transcription, translation, DNA replication, protein folding and turnover, etc
core metabolic functions of genes
amino acid metabolism purine/pyrimidine biosynthesis, carbon metabolism
where do new genes come from?
from pre-existing genes (either from within the genome, or duplication, or acquired from another organism by horizontal gene transfer, HGT)
from non-coding DNA (‘from scratch’)
gene duplication examples
globin gene family- have myoglobin branch, alpha family branch, beta family branch, this all influences arrangement in humans
over evolutionary time, duplication, mutation, transposition, duplications and mutations occurred to get us to these families
tubulin gene family
HSP90 gene family- eukaryotes have cytosolic and ER isoforms that evolved by duplication
% of duplicate genes in homo sapiens
38%
homologs
in evolutionary biology, genes/proteins that share common ancestry
orthologs
genes/proteins in different species that evolved from a common ancestral gene by speciation
genes diverging due to species lineages separating
these genes/proteins typically have the same function in different species
paralogs
genes within a genome that are related to one another by gene duplication, often evolve new functions
genes evolving in parallel with species after a duplication
DNA-based duplication
usually involved transposable element or other type of repeated element
may occur by unequal crossing-over mediated by these elements
different fates for resulting genes
may acquire new functions by evolving new expression patterns or novel biochemical protein/RNA functions
RNA-based duplication
termed retroposition or retroduplication
new retroposed gene copies may arise through reverse transcription of mRNA from parental source genes
functional retrogenes with new functional properties may evolve from these copies after acquisition or evolution of promoters in their 5’ flanking regions that may drive transcription
DNA-based gene fusion
origin of new chimeric gene or transcript structures
partial duplication (hence fission) of ancestral source genes precedes juxtaposition of partial duplicates and subsequent fusion (presumably mediated by evolution of novel splicing signals and/or transcription termination/polyadenylation sites)
transcription-mediated gene fusion
novel transcript structures may arise from intergenic splicing after evolution of novel splicing signals and transcriptional readthrough from upstream gene
new chimeric mRNAs may sometimes be reserve transcribed to yield new chimeric retrogenes
what causes new chimeric transcripts?
exons, transcriptional start sites, constitutive splicing, splicing of ancestral gene structures, intergenic splicing
orphan genes
genes without obvious homologs in the genomes of other organisms (OFRans)
can evolve by gene duplication coupled with extreme sequence divergence (evolve to the point that homology cannot be detected using sequence and structure based comparisons), or non-coding DNA
how to find a new gene (comparative genomics)
one should show that corresponding DNA region is present in related species and synthetic in these species
region should still be alignable- the species compared should be sufficiently close to each other to ensure that even neutrally diverging sequences have not yet acquired too many mutation
finally, all outgroups should not have a sign of a gene in respective position, possibly apart of the most closely related ones, which could have a protogene or an RNA gene in the position
life cycles of genes
gene → pseudo-gene (loss of translation, transcription → gene death) → non-genic sequence
gene → proto-gene (gain of transcription, translation → gene birth) → non-genic sequence
gene → single copy → multiple copies (via duplication) → function x, y
gene duplication is a significant force in
evolution of cells and their genomes
for any given gene family/genome, birth rate and fate of duplicate genes is influenced by:
intrinsic rate of duplication (how much repetitive DNA? abundance of mobile elements? can vary greatly)
intrinsic features of DNA recombination machinery (some more recombinagenic than others)
structure of gene families (present in arrays like rRNA or dispersed in genome? impacts frequency of gene conversion/unequal crossing over)
population size of organism (impacts likelihood that duplicate gene will become fixed)
impact of newly evolved duplicate genes on organism (positive, negative, neutral)