1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
origins of bioinformatics
the earliest foundations (1950–1970) focused primarily on protein sequence analysis, before DNA could be easily sequenced
COMPROTEIN
the first known bioinformatics software (early 1960s), developed by Margaret Dayhoff, designed to assemble whole protein sequences (de novo) from small Edman peptide fragments.
paradigm shift (1970–1980)
bioinformatics began shifting its focus from protein analysis to DNA analysis after the deciphering of the genetic code and the invention of efficient sequencing methods (Sanger)
Needleman-Wunsch (1970)
developed the first dynamic programming algorithm for performing Homology: Orthologyprotein sequence alignments
homology: orthology
defined by walter m. fitch (1970) as homology resulting from a speciation event
dayhoff/pam matrix
developed the first probabilistic model of amino acid substitutions (Point Accepted Mutations, PAMs) in 1978, using probability to measure evolutionary change
De Novo Sequencing
the determination of a full-genome sequence without using a known template or reference sequence
massively parallel /multiplexing
Massively Parallel involves multiple processors working simultaneously. Multiplexing combines multiple inputs/samples into a single sequence run.
overfitting
occurs when a model built on training data shows high accuracy (e.g., classifying stream condition using OTUs) but significantly decreased accuracy when applied to separate validation data, indicating the model is too specific to the initial dataset features
sanger (dideoxy) read length
long reads (~600–1000 bp)
sanger (dideoxy) throughput/cost
low throughput, typically for single samples
sanger (dideoxy) accuracy/output
reads suffer quality loss at the beginning and end of the sequence
sanger (dideoxy) key feature/mechanism
based on chain-terminating dideoxynucleotides (ddNTPs)
illumina (MiSeq) read length
short reads (100-300bp)
illumina (MiSeq) throughput/cost
high/massively parallel throughput
illumina (MiSeq) accuracy/output
high accuracy
illumina (MiSeq) key feature/mechanism
uses Sequencing by Synthesis / Bridge Amplification where fragments attached to a flow cell are amplified into clusters
Oxford Nanopore (MinION) read length
ultra-long reads (up to 1,000,000 bp/millions of bases)
Oxford Nanopore (MinION) throughput/cost
high throughput, portable, USB-powered
Oxford Nanopore (MinION) accuracy/output
moderate error rate compared to other platforms
Oxford Nanopore (MinION) key feature/mechanism
DNA strand passes through a nanopore; changes in electrical current are decoded into the DNA sequence (called basecalling).
FASTQ file
a file format that incorporates both the nucleotide sequence and the associated quality scores
phred score (Q)
a measure of sequence quality determination. a Phred score of 20 (Q20) implies a probability of less than 1% error per base, meaning 99% accuracy in the base call. Q30 implies 99.9% accuracy
coverage
the average number of reads that align to, or "cover," known reference bases. 50X genome coverage is recommended for robust taxonomic work
single-end reads
sequence in one direction of the fragment
paired-end reads
report sequences from both directions of a DNA fragment, which is valuable for assembly
3 domains of life
archaea, bacteria, and eukarya
why might we move to two domains?
based on the discovery that eukaryotes evolved from the domain Archaea, rather than separately, invoking the idea that there are now 2 domains: bacteria and archaea, with eukarya branching inside of archaea
membrane bond differences
The cell membranes of Bacteria and Eukarya contain Ester linkages in their phospholipids; Archaea use Ether bonds in their membrane lipids.
what is the cell wall made of?
Peptidoglycan: The polymer forming a mesh-like layer outside the bacterial cell membrane, providing protection against osmotic pressure (preventing cell bursting). It is the target of many antibiotics.
Archaea do not possess peptidoglycan
porins
channels found in the outer membrane of Gram-negative bacteria, facilitating the exchange of nutrients
chemoautotrophy
(metabolic diversity) relies on inorganic compounds for carbon and energy and is utilized only by prokaryotes (bacteria and archaea)
Classic Biological Species Concept
a species is a group of organisms that can interbreed naturally and produce viable, fertile offspring, and are reproductively isolated from other groups
meaningless for microbes primarily because they reproduce asexually.
Horizontal Gene Transfer (HGT)
Mechanisms for new gene acquisition in bacteria, including Transformation (naked DNA uptake), Transduction (DNA transfer via viruses/phage), and Conjugation (DNA transfer via cell-to-cell contact).
chemotaxis
(bacterial movement) describes movement toward chemical attractants or away from repellents
The Species Problem for Microbes
long-standing difficulty in defining what constitutes a microbial species
based on 16S rRNA gene similarity, organisms are often grouped by operational definitions.
Operational Taxonomic Unit (OTU)
Typically defined by sequences being 97% or 99% similar
Amplicon Sequence Variant (ASV)
sequences that are 100% identical and function as unique identifiers for taxa
16S rRNA Gene Sequencing
This marker gene is commonly used for taxonomic diversity studies (metabaroding). The V3-V4 hypervariable region (approx. 464 bp) is typically targeted in short-read sequencing (Illumina MiSeq). Full-length 16S sequencing (V1-V9 regions, approx. 1465 bp) is possible with long-read platforms like MinION.
Long Read Advantage for 16S rRNA Gene Sequencing
Sequencing the near full-length 16S rRNA gene (MinION) generally provides significantly higher taxonomic resolution at the species level compared to short-read sequencing that analyzes only partial regions (Illumina MiSeq)
Phyla Dominance in Streams
Proteobacteria dominated both water and sediment stream samples in the Maryland study
Sediment microbial communities
proved much better at predicting ecological condition (BIBI scores) than water column samples
Alpha Diversity
Measures diversity within a single sample
Measures include species richness and evenness
Indices often used: Shannon, Simpson, Chao1, ACE
species richness
number of phenotypes
species evenness
relative abundance/distribution of individuals per phylotype
beta diversity
measures the difference or change in diversity of species between communities (e.g., between two different environments)
a High Beta diversity measure indicates low similarity between the two communities
Indices include Jaccard, Bray-Curtis, Euclidian, and UniFrac Distances
quorum sensing (QS)
a system of cell-to-cell communication where bacteria regulate specific group behaviors in response to population density
QS uses small signal molecules called autoinducers
Regulated behaviors include biofilm formation, bioluminescence, and virulence production
Microbial Resilience / Dysbiosis (AKP)
As corals undergo stress, their microbial community becomes destabilized, often resulting in increased variability (or dispersion) in microbial community composition. This pattern of increased variability in stressed individuals is sometimes related to the Anna Karenina Principle (AKP)
Anna Karenina Principle (AKP)
healthy microbiomes are similar across individuals, while disease-associated microbiomes are often unique and vary from person to person