Bioinformatics final

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/39

There's no tags or description

Looks like no tags are added yet.

Last updated 5:50 PM on 4/30/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

40 Terms

New cards

What is a locus?

A specific fixed position on a chromosome where a gene or genetic marker is located.

basically just the "address" of a gene on a chromosome

New cards

What is an allele?

One of two or more versions of a gene at a given locus.

different "versions" of the same gene (e.g. brown vs blue eye color)

New cards

What is a genotype?

The specific combination of alleles an organism carries at a locus (e.g. AA, Aa, aa).

what alleles you actually have

New cards

What is a phenotype?

The observable trait or characteristic expressed from the genotype.

what you actually look like / express

New cards

What is a SNP?

Single Nucleotide Polymorphism — a single base position in the genome that varies between individuals in a population.

one letter difference in DNA between people

New cards

What is allele frequency?

The proportion of a specific allele among all alleles at that locus in a population.

how common one version of a gene is in a population

New cards

What is genotype frequency?

The proportion of a specific genotype among all individuals in a population.

how common a specific allele combo (AA, Aa, aa) is in a population

New cards

What are the 5 assumptions of Hardy-Weinberg Equilibrium?

No mutation, no natural selection, random mating, no genetic drift (large population), no gene flow (no migration).

perfectly ideal population that never changes — doesn't exist in reality

New cards

What do HWE equations look like?

p + q = 1 (allele frequencies). p² + 2pq + q² = 1 (genotype frequencies). p² = AA, 2pq = Aa, q² = aa.

math to predict how often genotypes should appear if nothing is messing with the population

New cards

What are the forces of evolution / HWE violations?

Mutation, natural selection, genetic drift, gene flow, non-random mating.

anything that causes a real population to deviate from the ideal HWE prediction

New cards

What is linkage equilibrium?

When alleles at two loci are randomly associated — knowing one allele tells you nothing about the other.

two loci are independent of each other, no association

New cards

What is linkage disequilibrium (LD)?

When alleles at two loci are non-randomly associated more or less often than expected by chance.

two alleles show up together more than they should by random chance

New cards

What is the D calculation for LD?

D = frequency(AB) − frequency(A) × frequency(B). D=0 means equilibrium. D≠0 means disequilibrium.

D is just: observed frequency minus what you'd expect if they were independent

New cards

What is a haplotype?

A set of alleles on the same chromosome that tend to be inherited together due to linkage.

a chunk of chromosome that gets passed down together as a unit

New cards

How does crossing-over relate to linkage?

Crossing-over breaks up linkage over time. Loci far apart recombine more, moving toward equilibrium. Loci close together stay in LD longer.

recombination shuffles the deck — close genes stay linked, far genes separate

New cards

What causes differences in population genetic structure?

Genetic drift, natural selection, gene flow, mutation, founder effects, population bottlenecks.

different evolutionary pressures acting differently on isolated or separate populations

New cards

What is GWAS?

Genome-Wide Association Study — scans the entire genome across many individuals to find SNPs statistically associated with a trait or disease.

massive scan to find which DNA differences correlate with a disease or trait

New cards

What does an LD pattern tell you in GWAS?

Regions of high LD mean nearby SNPs are correlated — a significant SNP likely tags a whole haplotype block, not just one variant.

one significant SNP probably represents a whole region inherited together, not just that one spot

New cards

How does variation arise in populations?

Through mutation (creates new alleles), recombination (reshuffles existing variation), and gene flow (introduces variation from other populations).

new variation = mutation. reshuffled variation = recombination. imported variation = gene flow

New cards

What is comparative genomics?

The study of similarities and differences in genome structure, function, and evolution across different species or strains.

comparing whole genomes to see what's conserved, what's different, and why

New cards

What are the reasons for studying comparative genomics?

Identify conserved genes and functions, understand evolution, find unique genes, identify virulence factors, study genome rearrangements.

find what genes matter, what makes organisms different, and how they evolved

New cards

What is MAUVE?

A tool for aligning multiple whole genomes to identify conserved regions, rearrangements, inversions, and insertions/deletions across genomes.

lines up whole genomes side by side to spot what moved, flipped, or changed

New cards

What is phylogenomics?

Building evolutionary trees using whole genome data or large sets of genes rather than a single marker gene.

like phylogenetics but uses the whole genome instead of one gene

New cards

How is phylogenomics different from phylogenetics?

Phylogenetics uses one or a few marker genes. Phylogenomics uses whole genome or hundreds of genes — more accurate and higher resolution.

more data = more accurate tree

New cards

How do you build a bacterial phylogenomic tree in BV-BRC?

Select genomes → create a group → use the Phylogenetic Tree service → BV-BRC identifies conserved single-copy genes across genomes → builds tree.

group your genomes, run the tree service, it finds shared genes and builds the tree for you

New cards

How do you build a viral phylogenomic tree in BV-BRC?

Similar workflow but uses whole genome alignment since viral genomes are smaller — select genomes, run genome tree service, interpret output.

same idea but viral genomes are tiny so you can align the whole thing

New cards

What is the purpose of creating a group in BV-BRC?

To organize selected genomes so you can run comparative analyses (trees, MAUVE alignment, feature comparisons) on that specific set.

just a way to bundle your genomes together before running any analysis

New cards

What does Similar Genome Finder do in BV-BRC?

Finds genomes in the database most similar to your query genome based on Mash distance (genome-wide similarity).

upload your genome, it tells you what's most similar in the database

New cards

How is Similar Genome Finder different from Genome Search?

Genome Search lets you browse/filter by metadata (organism name, taxonomy). Similar Genome Finder ranks by actual sequence similarity to your genome.

Search = browse by name/taxonomy. Finder = ranked by actual sequence similarity

New cards

When assembling a viral genome, how do you identify which contigs are the actual viral genome?

BLAST the contigs against a viral database — contigs that hit your target virus with high identity and coverage are the viral genome. Discard host or low-hit contigs.

BLAST everything, keep contigs that match your virus, toss the rest

New cards

What is metagenome analysis?

Sequencing all genetic material from an environmental sample (no culturing) to identify what organisms are present and their functional potential.

sequence everything in a sample at once to see who's there without growing anything

New cards

BLAST in BV-BRC vs NCBI — what's the difference?

BV-BRC BLAST searches within the BV-BRC database (bacterial/viral genomes). NCBI BLAST searches the broader GenBank/RefSeq database across all organisms.

BV-BRC = bacteria/viruses only. NCBI = everything

New cards

How do you align multiple genomes using MAUVE in BV-BRC?

Services tab → Genome Alignment (MAUVE) → select reference genome first → add comparison genomes (up to 20, must already be in BV-BRC) → submit → results show colored blocks = conserved regions across genomes, inverted blocks = inversions/rearrangements

colored blocks that line up = conserved. blocks that are flipped or out of order = rearrangements or inversions between genomes

New cards

How do you choose the correct assembly strategy?

Short reads only (Illumina) → SPAdes. Long reads only (PacBio/Nanopore) → Flye or Canu. Both short + long → Unicycler hybrid or SPAdes hybrid. Unicycler hybrid generally best result.

match your assembler to your read type — hybrid reads = best assembly

New cards

How do you interpret the assembly report and access the FASTA file?

Jobs page → click your job → View icon → click AssemblyReport.html to see Bandage plot (visualizes assembly graph) and QUAST report (contig stats: number of contigs, N50, coverage, depth). Click the .fasta file row to access the actual assembled sequences.

QUAST = quality stats. Bandage = visual of how contigs connected. FASTA = the actual assembled genome sequences

New cards

What are the steps to annotate a genome in BV-BRC?

Services tab → Genomics → Genome Annotation → upload contigs (FASTA file from assembly) → select taxonomy/organism → submit. Results page contains annotated genome files in multiple formats.

annotation always starts from contigs — you must assemble first, then annotate

New cards

What does the Annotation Results page contain?

Annotated genome files in multiple formats (FASTA, GenBank, GFF), genome quality report, specialty genes, functional categories, and a phylogenetic tree of closest relatives.

it gives you the full picture — what genes are there, quality, and where your organism fits evolutionarily

New cards

How do you find antimicrobial resistance genes in BV-BRC?

Annotate genome → open your genome → Specialty Genes tab → filter by Antibiotic Resistance → see genes mapped from CARD/NDARO databases

annotation does the work automatically — you're just navigating to where the results live

New cards

How do you find the closest related species to your genome in BV-BRC?

Services tab → Similar Genome Finder → enter your genome name/ID or upload FASTA → submit → results ranked by Mash/MinHash distance → closest genomes at top of list

Mash distance = whole genome similarity score — lower distance = more closely related

New cards

Bacterial genome tree vs viral genome tree in BV-BRC — what's the difference?

Bacterial → Services → Bacterial Genome Tree → uses conserved single-copy protein families (PGFams) across genomes. Viral → Services → Viral Genome Tree → uses whole genome alignment via MAFFT. Both use RAxML/PhyML/FastTree to build the final tree.

bacterial genomes too big for whole alignment so they use shared genes instead. viral genomes small enough to align the whole thing