1/232
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What are the levels of biological systems?
Gene, protein, metabolite - What is the function of the gene?
Network, transcription, protein complex, metabolic pathways - Which genes are involved in the process?
Cell/Single-celled organism, genome, proteome, metabolome - What is the metabolic capacity of the cell/organism? How does it replicate, grow, divide, interact with its environment?
Multicellular organism, symbiosis, singaling, metabolism - How do different cells/cell types/tissues interact?
Community/Population, metagenome, foodwebs, symbiosis - Which organisms are present and what are their ecological roles?
Ciliates
Have two nuclei - micronucleus with germline cells and macronucleus with somatic cells = nuclear dimorphism. They are the exception to the central dogma. During reproduction, MAC breaks down and new MAC is formed from the MIC, resetting the genetic state of the organism.
Forward genetics
Start with observable phenotype, work to identify gene(s) responsible. Phenotype → genotype.
Reverse genetics
Start with known gene or genetic variation and investigate its impact on the phenotype. Genotype → phenotype.
Workflow of forward genetics
Random mutagenesis: e.g. treat with drug/chemical
Whole genome re-sequencing: sequence whole genome of wt and rare phenotype(s).
alt. Map gene causing phenotype: by crossing strains carrying genetic markers.
Workflow of reverse genetics
Gene of interest (GOI)
Introduce target changes (e.g. KO)
Observe effect (e.g. phenotype)
Interpret results (link gene to trait)
Monogenic disease
Disease caused by one single gene mutation
Polygenic disease
Disease caused by several gene mutations
Paired-end sequencing
Improves mapping and assembly. Sequence from both ends, produce two reads for each fragment. Higher coverage, better accuracy, good for long fragments.
De novo assembly
New assembly, we assemble without reference genome.
Reference mapping
Assemble genome by aligning it to an existing genome (reference genome)
Workflow of de novo assembly
Short and/or long reads, assemble contigs by adding several reads together, assemble scaffold or chromosome.
Contig
Continuous piece of genomic sequence.
Scaffold
Several contigs added together.
Workflow of reference mapping
Reads, align to reference genome, insight into coverage and depth, identify SNPs and CNVs.
Reporter gene
Put a read into the genome to be able to visualise where the gene is expressed
Enzymatic reporters
Enzymes that act as reporters, convert substrate to colored product e.g. lacZ, luciferase or GOS.
Fluorescent reporters
Reporters that emit light of different wavelength than light absorbed, e.g. green fluorescent protein (GFP).
Transcriptional reporters
Reporters that cause GFP to be expressed only under conditions where the promoter of the GOI is active.
Workflow of shotgun metagenomic
Environmental sample
Isolation of prokaryotic cells
Cell lysis and DNA isolation
High-throughput sequencing
Genomic assembly
Microbial community analysis
Differences between Bacteria, Archaea and Eukarya
Nucleus: Bacteria no, archaea no, eukarya yes.
Organelles: Bacteria no, archaea no, eukarya yes.
Operons: Bacteria yes, archaea yes, eukarya no.
Ester lipids: Bacteria yes, archaea no, eukarya yes.
Ether lipids: Bacteria no, archaea yes, eukarya no.
Peptidoglycan: Bacteria yes, archaea no, eukarya no.
Co-translational trancription: Bacteria yes, archaea yes, eukarya no.
mRNA splicing: Bacteria no, archaea no, eukarya yes.
The lac operon
Lactose binds to the repressor and removes it from the promoter region which allows RNA pol to transcribe the mRNA that is used to create the three proteins needed to build the lactose enzyme.
Phospholipids
Ether: In archaea - R-O-R, linked by C-O bond. Mono/bilayer is stiffer, less ordered and thicker.
Ester: In bacteria and eukarya. R-CO-O-R, linked by O-C-O bond. Mono/bilayer is softer, more ordered and thinner.
Peptidoglycan cell wall
Only bacteria have it, gram positive or gram negative. Rigid, mesh-like structure which gives shape, strength and protection against osmotic pressure.
Gram positive
No outermembrane, no lipoproteins, thick peptidoglycan layer, only cytoplasmic membrane (1).
Gram negative
Outer membrane, lipoproteins, thin peptidoglycan layer, cytoplasmic membrane.
Homology-based annotation
Previously used to predict gene function, if a proteins sequence is similar to another with known function we assume they might share function. Uses protein sequence (codons for same amino acid).
Codon usage bias
Certain codons are used more frequently than others resulting in the same amino acid in different species. E.g. CTG in humans but AGG in E. coli —> both code Arginin.
Homologs
Genes of common origin
Orthologs
Genes resulting from a speciation event
Paralogs
Genes resulting from duplication event
Gene neighborhood
Genes that are functionally related tent to be organised in operons.
Amplicon sequencing
Target is specific marker genes like 16S rRNA, ITS and 18S. It is cost effective, has large sample sets, well established pipelines and is good for hypothesis generation. But it only has genus level resolution, no functional information, there is a PCR bias and can’t detect viruses.
Workflow amplicon sequencing
Extract DNA
Amplify marker with PCR
Sequence amplicons
Compare to reference database
Shotgun sequencing
Target is all the DNA in the sample. It has strain-level resolution, gives functional information, there is no PCR bias and you get a comprehensive overview. But it is expensive, computationally intensive, requires more DNA and a complex analysis.
Workflow shotgun sequencing
Extract DNA
Fragment DNA randomly
Sequence all fragments
Assemble/map
Alpha diversity
Within sample diversity. How many species? How evenly distributed?
Beta diversity
Between sample diversity. How different are communities?
Promoter structure
Core promoter - ~100 bp, TSS, TATA box, RNA pol binding sites
Proximal promoter - ~250-500 bp, primary TF binding sites
Extended promoter - 1-5 kb upstream, captures distal regulatory elements that influence transcription, e.g. enhancers.
Holoenzyme
Bacterial RNA pol core enzyme together with sigma factor.
Motifs
Short, recurring patterns of nucleotides in a DNA sequence that carry biological meaning. Serve as specific binding sites for proteins or signals for essential cellular processes. E.g. helix-turn-helix, homeodomain, zinc fingers.
Microarrays
A collection of microscopic DNA spots attached to a solid surface. Known DNA sequences (probes) are fixed in specific grid positions. DNA or RNA samples are applied to the chip where complementary strands in the sample bind to corresponding probes. Scanners measure fluorescent or chemiluminescent light emitted from the binding sites to determine sample composition.
Epigenetics
Accessory chemical modification on the DNA or proteins that pack DNA. E.g. DNA methylation, small RNAs, histone modifications and chromatin structure.
Euchromatin
True chromatin, the one expressing
Heterochromatin
Other chromatin, the one not expressing.
Nucleosome
DNA packed on a histone
Core histones
H2A, H2B, H3 and H4.
CpGs
Cystein followed by guanine separated by a single phosphate group.
CpG islands
Regions in which CpGs occur in CG-dense regions
Epigenetic re-programming
When the genome, during the pre-implantation period, is depleted of methylation, to later be restored. Starts with migration of primordial germ cells (PGCs).
Genomic imprinting
Phenomenon that results in monoallelic gene expression according to parental origin. In some genes, the maternal copy is silenced and others the paternal.
Sanger sequencing
A special kind of PCR where we build new DNA strands with ddNTPs which stop synthesis at random points. By collecting and separating these fragments we can read the DNA sequence base by base.
Phred score 10
Means that 1 in 10 bases are a wrong call, accuracy is 90%
Phred score 20
Means that 1 in 100 bases are a wrong call, accuracy is 99%
Phred score 30
Means that 1 in 1,000 bases are a wrong call, accuracy is 99.9%
Phred score 40
Means that 1 in 10,000 bases are a wrong call, accuracy is 99.99%
Phred score 50
Means that 1 in 100,000 bases are a wrong call, accuracy is 99.999%.
Properties of a good assembly
Read length longer than repeated regions, high coverage and high quality.
Paired-end sequencing
Fragment DNA into smaller fragments (200-800 bps), attach adapters to both ends and thus get two reads per fragment.
Mate-pair sequencing
Fragment DNA into bigger fragments (2-5 kbps), ends are biotinylated causing circular DNA. Fragment into smaller fragments (200-600 bps), attach adapters to both ends. The smaller fragments contain the junction between the ends of the original fragment, we select only pieces with the junction. We create reads from these fragments in which one will have the A ends and one the B ends, when aligning we get the long fragment in between.
Sequencing coverage
Meaning how many reads we need to cover the whole reference genome. Having longer reads (like in Sanger) means we need fewer reads, while having shorter reads (like in Illumina) we need more reads.
Which sequencing technologies need amplification?
Illumina and IonTorrent, the signal is too weak for them to read without amplification.
Which sequencing technologies do not need amplification?
PacBio and Nanopore, they can read the signal from one single DNA molecule.
DNA barcoding
= DNA multiplexing, is when we add known sequences to the DNA strand so that we can pool samples together meaning we can sequence more samples for less of a cost (multiple samples in one run).
Dual indexing
When we barcode on both ends of the DNA.
Illumina sequencing
Done by DNA synthesis, uses 4 (or 2) colors, has high accuracy and capacity, uses short reads.
Workflow Illumina sequencing
Sample prep - adding adapters, motifs with sequencing binding site, indices and oligo complements.
Cluster generation - amplify fragments, oligos in flow cell, strand attaches to oligo, polymerase synthesises complementary strand, bridge amplification = strand folds to other oligo, is synthesised over and over, reverse strands are washed off.
Sequencing - fluorescently tagged nucleotides are added to synthesise strand, clusters are excited by light source and emits fluorescent signal. Nr of cycles determine read length.
Beijing Genome Institute sequencing
= BGI/MGI sequencing. Similar to Illumina but has different cluster generation - DNA Nanoballs (DNB).
Workflow of BGS sequencing
DNA extraction
DNA fragmentation
End repair
Adaptor ligation
Single strand separation
Circularization
Make DNB
Load DNB
Pattern array
cPAS sequencing
IonTorrent sequencing
Is cheap, fast and has a good cost per base. But has lower data output and homopolymer errors.
Workflow of IonTorrent sequencing
DNA fragmentation
Attach fragment to bead
Copy fragment until bead is covered
Bead flow across semiconductor chip into a well
Flooding chip with one nucleotide at a time
When nucleotide is incorporated, hydrogen ion is released - base is called.
PacBio sequencing
Immobilised DNA pol, four fluorophores, high error rate, long reads, can detect DNA modification (methylation).
Workflow of PacBio sequencing
Give DNA pol phospholinked nucleotides, dluorescent signal is cleared and detected
Nanopotonic visualisation chamber, ZMW - detects light when nucleotide is incorporated (longer signal).
Nanopore sequencing
Single-stranded DNA, high error rate.
Workflow of Nanopore sequencing
DNA or RNA are attached with a motor protein and adapter at end.
Strand attaches to tether which guides it into the nanopore
DNA strand is separated, one strand into nanopore
Bases yield signal in the form of ionic current when passing through the pore.
Polycistronic transcript
Lots of genes in a small bundle to be transcribed
Differences between human and yeast mtDNA
Yeast = Longer, 4 origins of replication (bi-directional), contains introns and non-coding regions, DNA is transcribed in smaller units.
Human = Shorter, 2 origins of replication (unidirectional), no introns or non-coding regions, DNA is transcribed pretty much all at once.
Mitochondrial vs nuclear genome
Nuclear: Longer, linear, has histones, mendelian inheritance, most non-coding, universal codon usage, monocistronic, replication depends on mitosis, one copy per cell.
Mitochondrial: Shorter (~16kb), circular, no histones, maternal inheritance, most coding, not always universal codon usage, polycistronic, replication independent of mitosis, multiple copies per cell.
Applications of metabolomics
Metabolic profiles for diseases, identify disease phenotypes, diagnose and assess, identify function of genes, monitor gene knock-outs, monitor metabolic flux, monitor enzyme/pathway kinetics, monitor gene/environment interasctions, track effects of drugs, diet, treatments etc.
Targeted analysis vs global profiling
Targeted analysis = aim to get only target analytes, remove all other compounds - usually impossible. Multistep procedures, optimized for best recovery of the compounds.
Global profiling = aim to get all compounds that can be analysed with selected technique. Large range of compounds, impossible to get optimal recovery for all.
Metabolites
E.g. lipids, organic acids, ketones, aldehydes, amines, amino acids…
Attributes of Machine Learning
Automated learning - learning and improvement from data without rule-based programming.
Pattern recognition - identifies patterns and makes predictions or decisions.
Adaptibility - adapts and evolves as they’re exposed to more data, become increasingly accurate and effective.
Random forrest
Make one tree per feature, decision is based on majority - if 800 out of 1000 trees show a certain feature for disease we assume it to be correct. Build several trees at the same time.
XGBoost
Different decision trees, choose majority, compare trees to ensure error rate. Create a new tree only if there’s an error in the first one.
k-fold cross-validation
Define k-fold, how many ways to you want to split your data? With all mixtures of data, each 10% will be used to test vs train but not at the same time.1
10-fold cross-validation
Split data into 10, use 90% to train and 10% to test. Is validated 10 times and trained 10 times.
Nested k-fold cross validation
Cross-validation within cross-validation.
Nested 10-fold cross-validation
Split data into 10, use 90% to train and 10% to test, split the 90% and do the same again. Validated 100 times, trained 100 times.
Supervised learning
Systems are trained on labelled data to predict outputs for new, unseen inputs.
Unsupervised learning
Systems are trained using unlabelled information and allowing it to act without guidance.
Reinforcement learning
Systems are trained by taking actions in an environment and receiving rewards or penalties.
Direct measurement of gene expression
Extract RNA from tissues using qRT-PCR or RNA-seq. Measure how much RNA is there, or if RNA is there or not.
Indirect measurement of gene expression
Measure promoter activity using reporter genes. Build a construct, insert construct into transgenic animal, observe expression using GFP, visualise real-time using fluorescent reporters.
Transcriptional fusion
A gene construct that links the promoter of a gene to a reporter gene.
Luciferase
One of the most famous reporter genes, naturally very stable, by adding a PEST domain we reduce its half-time.
Northern blot
= RNA blot, direct measuring with mRNA molecules in which RNA molecules are separated by size and transferred to a membrane of detection using a labeled probe that hybridises to the RNA of interest.
In situ hybridisation
Direct method to measure mRNA molecules in which a labeled RNA or DNA probe is hybridised to the complementary RNA molecule of interest, allowing the visualisation of the location and expression pattern of the target RNA.
RT-PCR
A qualitative way to detect presence or absence of RNA
qRT-PCR
A quantitative way to measure the relative or absolute expression levels of RNA.
No reverse transcriptase
NRT - control without transcriptase to check for contamination, if NRT is amplified there’s contamination.