1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
What is a locus?
A specific fixed position on a chromosome where a gene or genetic marker is located.
basically just the "address" of a gene on a chromosome
What is an allele?
One of two or more versions of a gene at a given locus.
different "versions" of the same gene (e.g. brown vs blue eye color)
What is a genotype?
The specific combination of alleles an organism carries at a locus (e.g. AA, Aa, aa).
what alleles you actually have
What is a phenotype?
The observable trait or characteristic expressed from the genotype.
what you actually look like / express
What is a SNP?
Single Nucleotide Polymorphism — a single base position in the genome that varies between individuals in a population.
one letter difference in DNA between people
What is allele frequency?
The proportion of a specific allele among all alleles at that locus in a population.
how common one version of a gene is in a population
What is genotype frequency?
The proportion of a specific genotype among all individuals in a population.
how common a specific allele combo (AA, Aa, aa) is in a population
What are the 5 assumptions of Hardy-Weinberg Equilibrium?
No mutation, no natural selection, random mating, no genetic drift (large population), no gene flow (no migration).
perfectly ideal population that never changes — doesn't exist in reality
What do HWE equations look like?
p + q = 1 (allele frequencies). p² + 2pq + q² = 1 (genotype frequencies). p² = AA, 2pq = Aa, q² = aa.
math to predict how often genotypes should appear if nothing is messing with the population
What are the forces of evolution / HWE violations?
Mutation, natural selection, genetic drift, gene flow, non-random mating.
anything that causes a real population to deviate from the ideal HWE prediction
What is linkage equilibrium?
When alleles at two loci are randomly associated — knowing one allele tells you nothing about the other.
two loci are independent of each other, no association
What is linkage disequilibrium (LD)?
When alleles at two loci are non-randomly associated more or less often than expected by chance.
two alleles show up together more than they should by random chance
What is the D calculation for LD?
D = frequency(AB) − frequency(A) × frequency(B). D=0 means equilibrium. D≠0 means disequilibrium.
D is just: observed frequency minus what you'd expect if they were independent
What is a haplotype?
A set of alleles on the same chromosome that tend to be inherited together due to linkage.
a chunk of chromosome that gets passed down together as a unit
How does crossing-over relate to linkage?
Crossing-over breaks up linkage over time. Loci far apart recombine more, moving toward equilibrium. Loci close together stay in LD longer.
recombination shuffles the deck — close genes stay linked, far genes separate
What causes differences in population genetic structure?
Genetic drift, natural selection, gene flow, mutation, founder effects, population bottlenecks.
different evolutionary pressures acting differently on isolated or separate populations
What is GWAS?
Genome-Wide Association Study — scans the entire genome across many individuals to find SNPs statistically associated with a trait or disease.
massive scan to find which DNA differences correlate with a disease or trait
What does an LD pattern tell you in GWAS?
Regions of high LD mean nearby SNPs are correlated — a significant SNP likely tags a whole haplotype block, not just one variant.
one significant SNP probably represents a whole region inherited together, not just that one spot
How does variation arise in populations?
Through mutation (creates new alleles), recombination (reshuffles existing variation), and gene flow (introduces variation from other populations).
new variation = mutation. reshuffled variation = recombination. imported variation = gene flow
What is comparative genomics?
The study of similarities and differences in genome structure, function, and evolution across different species or strains.
comparing whole genomes to see what's conserved, what's different, and why
What are the reasons for studying comparative genomics?
Identify conserved genes and functions, understand evolution, find unique genes, identify virulence factors, study genome rearrangements.
find what genes matter, what makes organisms different, and how they evolved
What is MAUVE?
A tool for aligning multiple whole genomes to identify conserved regions, rearrangements, inversions, and insertions/deletions across genomes.
lines up whole genomes side by side to spot what moved, flipped, or changed
What is phylogenomics?
Building evolutionary trees using whole genome data or large sets of genes rather than a single marker gene.
like phylogenetics but uses the whole genome instead of one gene
How is phylogenomics different from phylogenetics?
Phylogenetics uses one or a few marker genes. Phylogenomics uses whole genome or hundreds of genes — more accurate and higher resolution.
more data = more accurate tree
How do you build a bacterial phylogenomic tree in BV-BRC?
Select genomes → create a group → use the Phylogenetic Tree service → BV-BRC identifies conserved single-copy genes across genomes → builds tree.
group your genomes, run the tree service, it finds shared genes and builds the tree for you
How do you build a viral phylogenomic tree in BV-BRC?
Similar workflow but uses whole genome alignment since viral genomes are smaller — select genomes, run genome tree service, interpret output.
same idea but viral genomes are tiny so you can align the whole thing
What is the purpose of creating a group in BV-BRC?
To organize selected genomes so you can run comparative analyses (trees, MAUVE alignment, feature comparisons) on that specific set.
just a way to bundle your genomes together before running any analysis
What does Similar Genome Finder do in BV-BRC?
Finds genomes in the database most similar to your query genome based on Mash distance (genome-wide similarity).
upload your genome, it tells you what's most similar in the database
How is Similar Genome Finder different from Genome Search?
Genome Search lets you browse/filter by metadata (organism name, taxonomy). Similar Genome Finder ranks by actual sequence similarity to your genome.
Search = browse by name/taxonomy. Finder = ranked by actual sequence similarity
When assembling a viral genome, how do you identify which contigs are the actual viral genome?
BLAST the contigs against a viral database — contigs that hit your target virus with high identity and coverage are the viral genome. Discard host or low-hit contigs.
BLAST everything, keep contigs that match your virus, toss the rest
What is metagenome analysis?
Sequencing all genetic material from an environmental sample (no culturing) to identify what organisms are present and their functional potential.
sequence everything in a sample at once to see who's there without growing anything
BLAST in BV-BRC vs NCBI — what's the difference?
BV-BRC BLAST searches within the BV-BRC database (bacterial/viral genomes). NCBI BLAST searches the broader GenBank/RefSeq database across all organisms.
BV-BRC = bacteria/viruses only. NCBI = everything
How do you align multiple genomes using MAUVE in BV-BRC?
Services tab → Genome Alignment (MAUVE) → select reference genome first → add comparison genomes (up to 20, must already be in BV-BRC) → submit → results show colored blocks = conserved regions across genomes, inverted blocks = inversions/rearrangements
colored blocks that line up = conserved. blocks that are flipped or out of order = rearrangements or inversions between genomes
How do you choose the correct assembly strategy?
Short reads only (Illumina) → SPAdes. Long reads only (PacBio/Nanopore) → Flye or Canu. Both short + long → Unicycler hybrid or SPAdes hybrid. Unicycler hybrid generally best result.
match your assembler to your read type — hybrid reads = best assembly
How do you interpret the assembly report and access the FASTA file?
Jobs page → click your job → View icon → click AssemblyReport.html to see Bandage plot (visualizes assembly graph) and QUAST report (contig stats: number of contigs, N50, coverage, depth). Click the .fasta file row to access the actual assembled sequences.
QUAST = quality stats. Bandage = visual of how contigs connected. FASTA = the actual assembled genome sequences
What are the steps to annotate a genome in BV-BRC?
Services tab → Genomics → Genome Annotation → upload contigs (FASTA file from assembly) → select taxonomy/organism → submit. Results page contains annotated genome files in multiple formats.
annotation always starts from contigs — you must assemble first, then annotate
What does the Annotation Results page contain?
Annotated genome files in multiple formats (FASTA, GenBank, GFF), genome quality report, specialty genes, functional categories, and a phylogenetic tree of closest relatives.
it gives you the full picture — what genes are there, quality, and where your organism fits evolutionarily
How do you find antimicrobial resistance genes in BV-BRC?
Annotate genome → open your genome → Specialty Genes tab → filter by Antibiotic Resistance → see genes mapped from CARD/NDARO databases
annotation does the work automatically — you're just navigating to where the results live
How do you find the closest related species to your genome in BV-BRC?
Services tab → Similar Genome Finder → enter your genome name/ID or upload FASTA → submit → results ranked by Mash/MinHash distance → closest genomes at top of list
Mash distance = whole genome similarity score — lower distance = more closely related
Bacterial genome tree vs viral genome tree in BV-BRC — what's the difference?
Bacterial → Services → Bacterial Genome Tree → uses conserved single-copy protein families (PGFams) across genomes. Viral → Services → Viral Genome Tree → uses whole genome alignment via MAFFT. Both use RAxML/PhyML/FastTree to build the final tree.
bacterial genomes too big for whole alignment so they use shared genes instead. viral genomes small enough to align the whole thing