Bio 397 - Bioinformatics Final Exam

0.0(0)
Studied by 3 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/86

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:56 AM on 5/6/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

87 Terms

1
New cards

what is the principle of operation of Sanger technology

chain termination

2
New cards

What is Chain Termination?

Uses fluorescently labeled ddNTPs to stop DNA synthesis @ specific bases

  • ddNTPS = dideoxyribonucleoside triphosphates = modified DNA building blocks

3
New cards

what is the principle of operation of Illumina technology

sequencing by synthesis

4
New cards

What is Sequencing by Synthesis?


Reversible terminator fluorescent nucleotides are added; a camera records the "flash" of each base.

5
New cards

what is the principle of operation of PacBio technology

SMRT (Single Molecule Real Time)

6
New cards

What is SMRT (Single Molecule Real Time)?

Uses a ZMW (Zero-Mode Waveguide) to observe a single polymerase incorporate bases in real-time

7
New cards

what is the principle of operation of MinION technology

nanopore

8
New cards

What is Nanopore?

Measures changes in electrical current as a single-stranded DNA molecule passes through a protein pore

9
New cards

what uses fluerescently labled ddNTPs to stop DNA synthesis at specific base

Sanger

10
New cards

what uses reversible terminator fluorescent nucleotides are added; a camera records the "flash" of each base

Illumina

11
New cards

what uses a ZMW (zero ode waveguide) to observe a single polymerase incorporate bases in real-tine

PacBio

12
New cards

what measures changes in electrical current as a single-stranded DNA molecule passes through a protein pore

MinION

13
New cards

what are pros to sanger technology

very high accuracy, long reads (800bp)

14
New cards

what are cons to sanger technology

low throughput, very expensive per base. used for single genes/plasmids

15
New cards

what are pros of illumina technology

massive throughput (billions of reads), lowest cost per base, very accurate

16
New cards

what are cons of illumina

very short reads (150-350bp), difficult to assemble repetitive regions

17
New cards

what are pros of pacbio

long reads, can span repetitive regions and detect base modifications

18
New cards

what are cons of pacbio

historically higher error rates, lower throughput than illumina, higher cost

19
New cards

what are pros of minion

ultra long reads, real time data, highly portable (USB)

20
New cards

what are cons of minion

high error rate compared to illumina; requires high-quality dna input

21
New cards

De Novo Assembly

the process of taking short, overlapping DNA reads and stitching them together into contigs (contiguous sequences) without a reference genome

22
New cards

what type of reads make assembly easier

longer reads (PacBio/MinION)

23
New cards

why do longer reads make assembly easier?

they bridge repetitive "dead zones" that short illumina reads cannot

24
New cards

True or False: 200bp for Illumina sequencing is good as it only produce the
short reads but for PacBio it is low quality as the sequencing technique always produce
longer reads.

True

25
New cards

Ambiguous Bases


Represented by the letter N. A high "N" count indicates areas

where the sequencer couldn't determine the base. Lower is better.

26
New cards

what does a high N count indicate

areas where sequencer couldnt determine the base

27
New cards

what type of N count is better

lower

28
New cards

what is the PHRED Quality Score (Q)

measure of the quality of the identification of the nucleobases

29
New cards

what is the standard benchmark for "high quality"

Q30 (>99.9% accuracy)

30
New cards

Coverage (Depth)

the average number of times a base is sequenced

31
New cards

What does a higher coverage increase?

confidence in consensus sequence

32
New cards

Contig Length & N50

  • statistical measure of assembly "contiguity."

  • It is the length of the shortest contig such that all contigs of that length or longer sum to 50% of the total assembly

33
New cards

N50 is a statistical measure of…

assembly "contiguity"

34
New cards

What type of N50 indicates a better assembly?

higher N50

35
New cards

What occurs due to repetitive sequences or regions that are difficult to sequence?

gaps

  • ex: high GC content

36
New cards

What are the 2 things used to close gaps?

Primer Walking & Hybrid Assembly

37
New cards

Primer Walking

designing specific primers at the end of a contig & using Sanger sequencing to "walk" into the gap

38
New cards

Hybrid Assembly

Combining Illumina (for accuracy) with PacBio/MinION (to span long gaps)

39
New cards

What gap-closing method is normally used & why?

Hybrid Assembly

  • cheap & efficient

40
New cards

What are the 2 main strategies for Hybrid Assembly?

Short-read First (Scaffolding) . . . OR . . .Long-read First (Polishing)

41
New cards

Short-read First (Scaffolding)

  1. You assemble the highly accurate Illumina reads into contigs.

  2. You then use the Long Reads as "bridges" to tell the assembler which contigs sit
    next to each other, creating a scaffold.

42
New cards

Step 1 in short-read first (scaffolding)

assemble highly accurate illumina reads into contigs

43
New cards

Step 2 in short-read first (scaffolding)

use long reads as "bridges: to tell assembler which contigs sit next to each other, creating scaffold

44
New cards

Long-read First (Polishing)

  1. You assemble the genome using only the Long Reads. This creates a very
    "complete" genome (few gaps) but with many small "typos" (Indels).

  2. You then map the Short Reads onto that assembly to "polish" it—correcting
    the 1% error rate of the long reads with the 99.9% accuracy of Illumina.

45
New cards

Step 1 in long-read first (polishing)

assemble genome using only long reads

46
New cards

Step 2 of long-read first (polishing)

map short reads onto that assembly to "polish" it

47
New cards

.

.

48
New cards

What is the role of a polisher in Hybrid Assembly?

fixes typos

49
New cards

What is the role of the bridge in Hybrid Assembly?

spans the repeats

50
New cards

Short Reads (Illumina) Overview

  • Accuracy: High (Q30+)

  • Contiguity: Fragmented (many gaps)

  • Cost: Cheap

  • Role: Polisher

51
New cards

Long Reads (PacBio/Nanopore) Overview:

  • Accuracy: lower

  • Contiguity: High (can produce “closed” genomes)

  • Cost: Expensive

  • Role: Bridge

52
New cards

Node

represents a common ancestor where lineage splits

53
New cards

Branches

the evolution of lineage over time

54
New cards

Tips (leaves)

represent the existing taxa (species/sequences) being compared

55
New cards

Clade (monophyletic group)

a group consisting of a common ancestor and all descendants

56
New cards

Root

the common ancestor of all sequences in the tree

57
New cards

Branch Length

the horizontal length correlates with the amount of genetic change (mutations) over time

58
New cards

What does the horizontal length correlate with in a phylogenetic tree?

the amount of genetic change (mutations) over time

59
New cards

Sister Taxa

2 lineages that emerged from same immediate node

60
New cards

Transcriptomics

the study of the transcriptome— the sum of all RNA transcripts in a cell

61
New cards

What is the sum of all RNA transcripts in a cell?

transcriptome

62
New cards

Transcriptomics: Microarray

  • Detection: hybridization — pre-designed probes

  • Novelty: can only detect known genes on the chip

  • Dynamic Range: low (signals saturate/are too weak)

  • Cost: fixed per sample

63
New cards

Trancriptomics: RNA-Seq

  • Detection: sequences all cDNA

  • Novelty: can discover new transcripts, isoforms, & non-coding RNAs

  • Dynamic Range: high (can detect low & very high expression)

  • Cost: depends on sequencing depth

64
New cards

Microarrary uses what type of probes?

pre-designed (hybridization)

65
New cards

Microarray can only detect what type of genes?

known genes on the chip

66
New cards

What is the cost of Microarray?

fixed per sample

67
New cards

RNA-seq has what type of sequencing?

direct sequencing (sequences all cDNA)

68
New cards

What type of method can discover new transcripts, isoforms, and non-coding RNAs?

RNA-seq

69
New cards

The logic that more mRNA molecules in the sample means…?

more cDNA produced

70
New cards

The logic that more cDNA is produced means…?

more reads mapped to that gene during sequencing

71
New cards

what is often used for differential expression?

log fold change (LogFC)

72
New cards

What does Log2FC of 1 mean? Of -1? Of 0?

  • 1: expression has doubled

  • -1: expression has halved

  • 0: expression is the same

73
New cards

What is the concept of bulk RNA-seq?

take a whole piece of tissue —> grind it up —> sequence total RNA

74
New cards

What do you get from bulk RNA-seq?

an average expression profile for entire sample

75
New cards

What are the pros of bulk RNA-seq?

  • High sensitivity for low-abundance transcripts,

  • Cost-effective

  • Robust for comparing “Condition A” vs “Condition B”

76
New cards

What is the con of bulk RNA-seq?

Masks heterogeneity.

  • if a rare cell type is the only one expressing a gene, its signal might be drowned out by the “average”

77
New cards

What is the concept of Single-Cell RNA-seq / scRNA-seq?

dissociate the tissue into individual cells & “barcode” RNA from each cell before sequencing

78
New cards

What do you get from scRNA-seq?

list of every gene expressed in every individual cell

79
New cards

What are the pros of scRNA-seq?

  • Can identify rare cell types

  • Allow for Pseudotime analysis (tracing how a cell changes over time, like during development or cancer progression…)

80
New cards

What are the cons of scRNA-seq?

  • You lose the "where":

    • Once you dissociate the cells, you don't know which cell
      was sitting next to which

  • Prone to “dropout”

    • a gene is expressed but the sequencer misses it b/c starting material is so small

81
New cards

What is the concept of Spatial Transcriptomics?

You sequence the RNA while it is still attached to a thin slice of the tissue, using a slide with "spatial barcodes" that record coordinates

82
New cards

What do you get from Spatial Transcriptomics?

gene expression data mapped directly onto physical structure of tissue

83
New cards

Pros of Spatial Transcriptomics

  • Crucial for understanding microenvironments

    • (e.g., how immune cells interact
      with the edge of a tumor).

  • Provides "geographic" context that scRNA-seq lacks

84
New cards

Cons of Spatial Transcriptomics

  • Expensive

  • Many platforms have lower resolution

    • (each "spot" might contain 5–10 cells rather than just one)

85
New cards

Bulk RNA-Seq overview

  • Resolution: Tissue level (average)

  • Cell Heterogeneity: Hidden/masked

  • Spatial Context: Lost

  • Best For: General biomarkers, comparing treatments

86
New cards

Single-Cell RNA-seq (scRNA-seq) overview

  • Resolution: Cellular level

  • Cell Heterogeneity: Fully visible

  • Spatial Context: Lost

  • Best For: Finding new cell types, developmental lineages

87
New cards

Spatial RNA-seq overview

  • Resolution: Tissue architecture

  • Cell Heterogeneity: Visible in context

  • Spatial Context: Preserved

  • Best For: Tumor microenvironments, brain mapping