COMPSCI 260 Unit 2 Midterm Review

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/49

flashcard set

Earn XP

Description and Tags

Comprehensive vocabulary flashcards covering genomic sequencing, string algorithms, and sequence alignment concepts from the COMPSCI 260 Unit 2 midterm.

Last updated 6:33 AM on 4/29/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

1
New cards

Frederick Sanger

The person whose separate contributions to both protein and DNA sequencing resulted in receiving two separate Nobel Prizes and whose name is in a central DNA sequencing method.

2
New cards

Leroy Hood

The person who improved DNA sequencing by replacing radio-labeled ddNTPs with fluorophore-conjugated ddNTPs, reducing the reactions required from four down to one.

3
New cards

Gene Myers

The co-inventor of suffix arrays and author of BLAST who suggested that paired-end reads would allow the human genome to be sequenced by WGS sequencing.

4
New cards

James Watson

A Nobel Prize winner for determining the double-helix structure of DNA who served as the initial leader of the public human genome sequencing effort at NIH.

5
New cards

Francis Collins

The Director of the NIH Human Genome Research Institute who led the Human Genome Project to completion and later led the entire NIH for 12 years.

6
New cards

Bacterial Artificial Chromosome (BAC)

A technology used to develop the hierarchical strategy to facilitate genome assembly, overcoming the challenges posed by the repetitive property of the human genome.

7
New cards

Whole Genome Shotgun (WGS) sequencing

A sequencing strategy that involves sequencing sufficient numbers of paired-end reads to assemble a genome, used by Celera Genomics.

8
New cards

Coverage

The ratio of the total number of nucleotides sequenced in all reads divided by the total length of the genome GG, or the average number of times each position is sequenced.

9
New cards

Expected number of unsequenced nucleotides

A value calculated using the expression GimeseCG imes e^{-C}, where GG is the genome length and CC is the coverage.

10
New cards

DNA Motive Force

The force experienced by DNA molecules in an electric field because DNA molecules are negatively charged.

11
New cards

DNA Migration in Gel

The process where shorter DNA molecules move faster than longer DNA molecules through a gel when an electric field is applied.

12
New cards

DNA Migration in a Vacuum

The condition where DNA molecules of different lengths will move at the same rate when an electric field is applied.

13
New cards

V(i, j) in Global Alignment

The score of the optimal global alignment of the prefix of sequence XX of length ii and the prefix of sequence YY of length jj.

14
New cards

BLOSUM62 Diagonal Entries

Values in the scoring matrix representing match scores; these are generally positive because matching the same amino acid is biologically conserved and favored.

15
New cards

BLOSUM62 Off-Diagonal Entries

Values representing substitution scores; some are positive if the two amino acids have similar physicochemical properties.

16
New cards

Gap Column Score

The penalty applied in sequence alignment when a character is aligned against a null or dash symbol.

17
New cards

Burrows-Wheeler Transform (BWT)

A string transformation developed by Burrows and Wheeler, often used in bioinformatics to facilitate efficient searching.

18
New cards

Suffix Array

A data structure consisting of a sorted list of all suffixes of a string, which can be used to quickly produce the BWT.

19
New cards

Mantra (Prefix of Suffix)

A central mantra of string processing stating that every substring is the prefix of some suffix.

20
New cards

Mantra (Suffix of Prefix)

A central mantra of string processing stating that every substring is the suffix of some prefix.

21
New cards

Cyclic Permutation

A string rearrangement formed by shifting characters from one end of the string to the other, used in constructing the BWT matrix.

22
New cards

End-of-string token ($)

A special character added to the end of a string to denote its termination and ensure proper sorting in transforms like BWT.

23
New cards

0-indexing

A convention where the first character of a string is at index 0, used for BWT and FM-index problems.

24
New cards

Burrows-Wheeler Matrix (BWM)

A matrix containing all cyclic permutations of a string sorted lexicographically.

25
New cards

Inverse BWT

The process used to reverse the Burrows-Wheeler transformation and recover the original input string.

26
New cards

Optimal Local Alignment

The highest scoring alignment between any substring of sequence XX and any substring of sequence YY, identified as the maximum value in a DP table.

27
New cards

Optimal Global Alignment

The highest scoring alignment that spans the entire length of both sequence XX and sequence YY, found at the bottom-right cell of the DP table.

28
New cards

FM-index

A compressed indexed structure based on the BWT that allows for fast string matching in a reference genome.

29
New cards

Thermus aquaticus

The genus and species of the bacterium that is the source of Taq polymerase, a thermostable enzyme used in DNA sequencing and PCR.

30
New cards

Taq polymerase

An early example of a thermostable polymerase required for DNA sequencing experiments.

31
New cards

Match Column

In sequence alignment, a column where both sequences contain the same character.

32
New cards

Mismatch Column

In sequence alignment, a column where both sequences contain different characters.

33
New cards

Hierarchical BAC-based strategy

An assembly method that uses cloned segments of DNA to manage the complexity of repetitive genomic sequences.

34
New cards

Paired-end reads

Sequence reads from both ends of a DNA fragment used to bridge repetitive regions during genome assembly.

35
New cards

Celera Genomics

A private organization that utilized WGS sequencing and paired-end reads to sequence the human genome.

36
New cards

BLAST

A widely used sequence alignment tool authored by several researchers including Gene Myers.

37
New cards

NIH Human Genome Research Institute

The organization led by Francis Collins that played a major role in the Human Genome Project.

38
New cards

Depth of Coverage

An equivalent term for the average number of times each position in the genome is sequenced during an assembly project.

39
New cards

Lexicographical Order

The alphabetical sorting method used to arrange cyclic permutations in a BWT matrix or suffixes in a Suffix Array.

40
New cards

Range Search (FM-index)

The process of finding appearances of a query string by iteratively narrowing the range of indices in the first and last columns of the BWM.

41
New cards

Approximate Seconds in a Year

A memorable approximation defined as π×107\pi \times 10^7 seconds.

42
New cards

Nobel Prize Organizations

Entities including the International Committee of the Red Cross or the UNHCR which have won multiple Nobel Prizes.

43
New cards

ddNTPs

Dideoxynucleotide triphosphates used in Sanger sequencing to terminate DNA chain elongation.

44
New cards

Fluorophore-conjugated

A modification to ddNTPs using fluorescent dyes to allow for single-reaction automated sequencing.

45
New cards

Repetitive Property

The quality of the human genome that made de novo assembly difficult using only short reads.

46
New cards

Lexical First Column (F)

The first column of the BWT matrix, which contains all characters of the input string sorted lexicographically.

47
New cards

BWT Column (L)

The last column of the BWM, which constitutes the actual Burrows-Wheeler Transform of the string.

48
New cards

Prefix

A substring that starts at the beginning of a sequence.

49
New cards

Suffix

A substring that extends to the end of a sequence.

50
New cards

Scoring Function

A defined set of values for matches, mismatches, and gaps used to calculate the optimal alignment in dynamic programming.