Functional genomics

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/232

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 8:58 AM on 6/3/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

233 Terms

1
New cards

What are the levels of biological systems?

Gene, protein, metabolite - What is the function of the gene?
Network, transcription, protein complex, metabolic pathways - Which genes are involved in the process?
Cell/Single-celled organism, genome, proteome, metabolome - What is the metabolic capacity of the cell/organism? How does it replicate, grow, divide, interact with its environment?
Multicellular organism, symbiosis, singaling, metabolism - How do different cells/cell types/tissues interact?
Community/Population, metagenome, foodwebs, symbiosis - Which organisms are present and what are their ecological roles?

2
New cards

Ciliates

Have two nuclei - micronucleus with germline cells and macronucleus with somatic cells = nuclear dimorphism. They are the exception to the central dogma. During reproduction, MAC breaks down and new MAC is formed from the MIC, resetting the genetic state of the organism.

3
New cards

Forward genetics

Start with observable phenotype, work to identify gene(s) responsible. Phenotype → genotype.

4
New cards

Reverse genetics

Start with known gene or genetic variation and investigate its impact on the phenotype. Genotype → phenotype.

5
New cards

Workflow of forward genetics

  1. Random mutagenesis: e.g. treat with drug/chemical

  2. Whole genome re-sequencing: sequence whole genome of wt and rare phenotype(s).

  1. alt. Map gene causing phenotype: by crossing strains carrying genetic markers.

6
New cards

Workflow of reverse genetics

  1. Gene of interest (GOI)

  2. Introduce target changes (e.g. KO)

  3. Observe effect (e.g. phenotype)

  4. Interpret results (link gene to trait)

7
New cards

Monogenic disease

Disease caused by one single gene mutation

8
New cards

Polygenic disease

Disease caused by several gene mutations

9
New cards

Paired-end sequencing

Improves mapping and assembly. Sequence from both ends, produce two reads for each fragment. Higher coverage, better accuracy, good for long fragments.

10
New cards

De novo assembly

New assembly, we assemble without reference genome.

11
New cards

Reference mapping

Assemble genome by aligning it to an existing genome (reference genome)

12
New cards

Workflow of de novo assembly

Short and/or long reads, assemble contigs by adding several reads together, assemble scaffold or chromosome.

13
New cards

Contig

Continuous piece of genomic sequence.

14
New cards

Scaffold

Several contigs added together.

15
New cards

Workflow of reference mapping

Reads, align to reference genome, insight into coverage and depth, identify SNPs and CNVs.

16
New cards

Reporter gene

Put a read into the genome to be able to visualise where the gene is expressed

17
New cards

Enzymatic reporters

Enzymes that act as reporters, convert substrate to colored product e.g. lacZ, luciferase or GOS.

18
New cards

Fluorescent reporters

Reporters that emit light of different wavelength than light absorbed, e.g. green fluorescent protein (GFP).

19
New cards

Transcriptional reporters

Reporters that cause GFP to be expressed only under conditions where the promoter of the GOI is active.

20
New cards

Workflow of shotgun metagenomic

  1. Environmental sample

  2. Isolation of prokaryotic cells

  3. Cell lysis and DNA isolation

  4. High-throughput sequencing

  5. Genomic assembly

  6. Microbial community analysis

21
New cards

Differences between Bacteria, Archaea and Eukarya

Nucleus: Bacteria no, archaea no, eukarya yes.

Organelles: Bacteria no, archaea no, eukarya yes.

Operons: Bacteria yes, archaea yes, eukarya no.

Ester lipids: Bacteria yes, archaea no, eukarya yes.

Ether lipids: Bacteria no, archaea yes, eukarya no.

Peptidoglycan: Bacteria yes, archaea no, eukarya no.

Co-translational trancription: Bacteria yes, archaea yes, eukarya no.

mRNA splicing: Bacteria no, archaea no, eukarya yes.

22
New cards

The lac operon

Lactose binds to the repressor and removes it from the promoter region which allows RNA pol to transcribe the mRNA that is used to create the three proteins needed to build the lactose enzyme.

23
New cards

Phospholipids

Ether: In archaea - R-O-R, linked by C-O bond. Mono/bilayer is stiffer, less ordered and thicker.

Ester: In bacteria and eukarya. R-CO-O-R, linked by O-C-O bond. Mono/bilayer is softer, more ordered and thinner.

24
New cards

Peptidoglycan cell wall

Only bacteria have it, gram positive or gram negative. Rigid, mesh-like structure which gives shape, strength and protection against osmotic pressure.

25
New cards

Gram positive

No outermembrane, no lipoproteins, thick peptidoglycan layer, only cytoplasmic membrane (1).

26
New cards

Gram negative

Outer membrane, lipoproteins, thin peptidoglycan layer, cytoplasmic membrane.

27
New cards

Homology-based annotation

Previously used to predict gene function, if a proteins sequence is similar to another with known function we assume they might share function. Uses protein sequence (codons for same amino acid).

28
New cards

Codon usage bias

Certain codons are used more frequently than others resulting in the same amino acid in different species. E.g. CTG in humans but AGG in E. coli —> both code Arginin.

29
New cards

Homologs

Genes of common origin

30
New cards

Orthologs

Genes resulting from a speciation event

31
New cards

Paralogs

Genes resulting from duplication event

32
New cards

Gene neighborhood

Genes that are functionally related tent to be organised in operons.

33
New cards

Amplicon sequencing

Target is specific marker genes like 16S rRNA, ITS and 18S. It is cost effective, has large sample sets, well established pipelines and is good for hypothesis generation. But it only has genus level resolution, no functional information, there is a PCR bias and can’t detect viruses.

34
New cards

Workflow amplicon sequencing

  1. Extract DNA

  2. Amplify marker with PCR

  3. Sequence amplicons

  4. Compare to reference database

35
New cards

Shotgun sequencing

Target is all the DNA in the sample. It has strain-level resolution, gives functional information, there is no PCR bias and you get a comprehensive overview. But it is expensive, computationally intensive, requires more DNA and a complex analysis.

36
New cards

Workflow shotgun sequencing

  1. Extract DNA

  2. Fragment DNA randomly

  3. Sequence all fragments

  4. Assemble/map

37
New cards

Alpha diversity

Within sample diversity. How many species? How evenly distributed?

38
New cards

Beta diversity

Between sample diversity. How different are communities?

39
New cards

Promoter structure

Core promoter - ~100 bp, TSS, TATA box, RNA pol binding sites

Proximal promoter - ~250-500 bp, primary TF binding sites

Extended promoter - 1-5 kb upstream, captures distal regulatory elements that influence transcription, e.g. enhancers.

40
New cards

Holoenzyme

Bacterial RNA pol core enzyme together with sigma factor.

41
New cards

Motifs

Short, recurring patterns of nucleotides in a DNA sequence that carry biological meaning. Serve as specific binding sites for proteins or signals for essential cellular processes. E.g. helix-turn-helix, homeodomain, zinc fingers.

42
New cards

Microarrays

A collection of microscopic DNA spots attached to a solid surface. Known DNA sequences (probes) are fixed in specific grid positions. DNA or RNA samples are applied to the chip where complementary strands in the sample bind to corresponding probes. Scanners measure fluorescent or chemiluminescent light emitted from the binding sites to determine sample composition.

43
New cards

Epigenetics

Accessory chemical modification on the DNA or proteins that pack DNA. E.g. DNA methylation, small RNAs, histone modifications and chromatin structure.

44
New cards

Euchromatin

True chromatin, the one expressing

45
New cards

Heterochromatin

Other chromatin, the one not expressing.

46
New cards

Nucleosome

DNA packed on a histone

47
New cards

Core histones

H2A, H2B, H3 and H4.

48
New cards

CpGs

Cystein followed by guanine separated by a single phosphate group.

49
New cards

CpG islands

Regions in which CpGs occur in CG-dense regions

50
New cards

Epigenetic re-programming

When the genome, during the pre-implantation period, is depleted of methylation, to later be restored. Starts with migration of primordial germ cells (PGCs).

51
New cards

Genomic imprinting

Phenomenon that results in monoallelic gene expression according to parental origin. In some genes, the maternal copy is silenced and others the paternal.

52
New cards

Sanger sequencing

A special kind of PCR where we build new DNA strands with ddNTPs which stop synthesis at random points. By collecting and separating these fragments we can read the DNA sequence base by base.

53
New cards

Phred score 10

Means that 1 in 10 bases are a wrong call, accuracy is 90%

54
New cards

Phred score 20

Means that 1 in 100 bases are a wrong call, accuracy is 99%

55
New cards

Phred score 30

Means that 1 in 1,000 bases are a wrong call, accuracy is 99.9%

56
New cards

Phred score 40

Means that 1 in 10,000 bases are a wrong call, accuracy is 99.99%

57
New cards

Phred score 50

Means that 1 in 100,000 bases are a wrong call, accuracy is 99.999%.

58
New cards

Properties of a good assembly

Read length longer than repeated regions, high coverage and high quality.

59
New cards

Paired-end sequencing

Fragment DNA into smaller fragments (200-800 bps), attach adapters to both ends and thus get two reads per fragment.

60
New cards

Mate-pair sequencing

Fragment DNA into bigger fragments (2-5 kbps), ends are biotinylated causing circular DNA. Fragment into smaller fragments (200-600 bps), attach adapters to both ends. The smaller fragments contain the junction between the ends of the original fragment, we select only pieces with the junction. We create reads from these fragments in which one will have the A ends and one the B ends, when aligning we get the long fragment in between.

61
New cards

Sequencing coverage

Meaning how many reads we need to cover the whole reference genome. Having longer reads (like in Sanger) means we need fewer reads, while having shorter reads (like in Illumina) we need more reads.

62
New cards

Which sequencing technologies need amplification?

Illumina and IonTorrent, the signal is too weak for them to read without amplification.

63
New cards

Which sequencing technologies do not need amplification?

PacBio and Nanopore, they can read the signal from one single DNA molecule.

64
New cards

DNA barcoding

= DNA multiplexing, is when we add known sequences to the DNA strand so that we can pool samples together meaning we can sequence more samples for less of a cost (multiple samples in one run).

65
New cards

Dual indexing

When we barcode on both ends of the DNA.

66
New cards

Illumina sequencing

Done by DNA synthesis, uses 4 (or 2) colors, has high accuracy and capacity, uses short reads.

67
New cards

Workflow Illumina sequencing

  1. Sample prep - adding adapters, motifs with sequencing binding site, indices and oligo complements.

  2. Cluster generation - amplify fragments, oligos in flow cell, strand attaches to oligo, polymerase synthesises complementary strand, bridge amplification = strand folds to other oligo, is synthesised over and over, reverse strands are washed off.

  3. Sequencing - fluorescently tagged nucleotides are added to synthesise strand, clusters are excited by light source and emits fluorescent signal. Nr of cycles determine read length.

68
New cards

Beijing Genome Institute sequencing

= BGI/MGI sequencing. Similar to Illumina but has different cluster generation - DNA Nanoballs (DNB).

69
New cards

Workflow of BGS sequencing

  1. DNA extraction

  2. DNA fragmentation

  3. End repair

  4. Adaptor ligation

  5. Single strand separation

  6. Circularization

  7. Make DNB

  8. Load DNB

  9. Pattern array

  10. cPAS sequencing

70
New cards

IonTorrent sequencing

Is cheap, fast and has a good cost per base. But has lower data output and homopolymer errors.

71
New cards

Workflow of IonTorrent sequencing

  1. DNA fragmentation

  2. Attach fragment to bead

  3. Copy fragment until bead is covered

  4. Bead flow across semiconductor chip into a well

  5. Flooding chip with one nucleotide at a time

  6. When nucleotide is incorporated, hydrogen ion is released - base is called.

72
New cards

PacBio sequencing

Immobilised DNA pol, four fluorophores, high error rate, long reads, can detect DNA modification (methylation).

73
New cards

Workflow of PacBio sequencing

  1. Give DNA pol phospholinked nucleotides, dluorescent signal is cleared and detected

  2. Nanopotonic visualisation chamber, ZMW - detects light when nucleotide is incorporated (longer signal).

74
New cards

Nanopore sequencing

Single-stranded DNA, high error rate.

75
New cards

Workflow of Nanopore sequencing

  1. DNA or RNA are attached with a motor protein and adapter at end.

  2. Strand attaches to tether which guides it into the nanopore

  3. DNA strand is separated, one strand into nanopore

  4. Bases yield signal in the form of ionic current when passing through the pore.

76
New cards

Polycistronic transcript

Lots of genes in a small bundle to be transcribed

77
New cards

Differences between human and yeast mtDNA

Yeast = Longer, 4 origins of replication (bi-directional), contains introns and non-coding regions, DNA is transcribed in smaller units.

Human = Shorter, 2 origins of replication (unidirectional), no introns or non-coding regions, DNA is transcribed pretty much all at once.

78
New cards

Mitochondrial vs nuclear genome

Nuclear: Longer, linear, has histones, mendelian inheritance, most non-coding, universal codon usage, monocistronic, replication depends on mitosis, one copy per cell.

Mitochondrial: Shorter (~16kb), circular, no histones, maternal inheritance, most coding, not always universal codon usage, polycistronic, replication independent of mitosis, multiple copies per cell.

79
New cards

Applications of metabolomics

Metabolic profiles for diseases, identify disease phenotypes, diagnose and assess, identify function of genes, monitor gene knock-outs, monitor metabolic flux, monitor enzyme/pathway kinetics, monitor gene/environment interasctions, track effects of drugs, diet, treatments etc.

80
New cards

Targeted analysis vs global profiling

Targeted analysis = aim to get only target analytes, remove all other compounds - usually impossible. Multistep procedures, optimized for best recovery of the compounds.

Global profiling = aim to get all compounds that can be analysed with selected technique. Large range of compounds, impossible to get optimal recovery for all.

81
New cards

Metabolites

E.g. lipids, organic acids, ketones, aldehydes, amines, amino acids…

82
New cards

Attributes of Machine Learning

  1. Automated learning - learning and improvement from data without rule-based programming.

  2. Pattern recognition - identifies patterns and makes predictions or decisions.

  3. Adaptibility - adapts and evolves as they’re exposed to more data, become increasingly accurate and effective.

83
New cards

Random forrest

Make one tree per feature, decision is based on majority - if 800 out of 1000 trees show a certain feature for disease we assume it to be correct. Build several trees at the same time.

84
New cards

XGBoost

Different decision trees, choose majority, compare trees to ensure error rate. Create a new tree only if there’s an error in the first one.

85
New cards

k-fold cross-validation

Define k-fold, how many ways to you want to split your data? With all mixtures of data, each 10% will be used to test vs train but not at the same time.1

86
New cards

10-fold cross-validation

Split data into 10, use 90% to train and 10% to test. Is validated 10 times and trained 10 times.

87
New cards

Nested k-fold cross validation

Cross-validation within cross-validation.

88
New cards

Nested 10-fold cross-validation

Split data into 10, use 90% to train and 10% to test, split the 90% and do the same again. Validated 100 times, trained 100 times.

89
New cards

Supervised learning

Systems are trained on labelled data to predict outputs for new, unseen inputs.

90
New cards

Unsupervised learning

Systems are trained using unlabelled information and allowing it to act without guidance.

91
New cards

Reinforcement learning

Systems are trained by taking actions in an environment and receiving rewards or penalties.

92
New cards

Direct measurement of gene expression

Extract RNA from tissues using qRT-PCR or RNA-seq. Measure how much RNA is there, or if RNA is there or not.

93
New cards

Indirect measurement of gene expression

Measure promoter activity using reporter genes. Build a construct, insert construct into transgenic animal, observe expression using GFP, visualise real-time using fluorescent reporters.

94
New cards

Transcriptional fusion

A gene construct that links the promoter of a gene to a reporter gene.

95
New cards

Luciferase

One of the most famous reporter genes, naturally very stable, by adding a PEST domain we reduce its half-time.

96
New cards

Northern blot

= RNA blot, direct measuring with mRNA molecules in which RNA molecules are separated by size and transferred to a membrane of detection using a labeled probe that hybridises to the RNA of interest.

97
New cards

In situ hybridisation

Direct method to measure mRNA molecules in which a labeled RNA or DNA probe is hybridised to the complementary RNA molecule of interest, allowing the visualisation of the location and expression pattern of the target RNA.

98
New cards

RT-PCR

A qualitative way to detect presence or absence of RNA

99
New cards

qRT-PCR

A quantitative way to measure the relative or absolute expression levels of RNA.

100
New cards

No reverse transcriptase

NRT - control without transcriptase to check for contamination, if NRT is amplified there’s contamination.