1/49
Comprehensive vocabulary flashcards covering genomic sequencing, string algorithms, and sequence alignment concepts from the COMPSCI 260 Unit 2 midterm.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Frederick Sanger
The person whose separate contributions to both protein and DNA sequencing resulted in receiving two separate Nobel Prizes and whose name is in a central DNA sequencing method.
Leroy Hood
The person who improved DNA sequencing by replacing radio-labeled ddNTPs with fluorophore-conjugated ddNTPs, reducing the reactions required from four down to one.
Gene Myers
The co-inventor of suffix arrays and author of BLAST who suggested that paired-end reads would allow the human genome to be sequenced by WGS sequencing.
James Watson
A Nobel Prize winner for determining the double-helix structure of DNA who served as the initial leader of the public human genome sequencing effort at NIH.
Francis Collins
The Director of the NIH Human Genome Research Institute who led the Human Genome Project to completion and later led the entire NIH for 12 years.
Bacterial Artificial Chromosome (BAC)
A technology used to develop the hierarchical strategy to facilitate genome assembly, overcoming the challenges posed by the repetitive property of the human genome.
Whole Genome Shotgun (WGS) sequencing
A sequencing strategy that involves sequencing sufficient numbers of paired-end reads to assemble a genome, used by Celera Genomics.
Coverage
The ratio of the total number of nucleotides sequenced in all reads divided by the total length of the genome G, or the average number of times each position is sequenced.
Expected number of unsequenced nucleotides
A value calculated using the expression Gimese−C, where G is the genome length and C is the coverage.
DNA Motive Force
The force experienced by DNA molecules in an electric field because DNA molecules are negatively charged.
DNA Migration in Gel
The process where shorter DNA molecules move faster than longer DNA molecules through a gel when an electric field is applied.
DNA Migration in a Vacuum
The condition where DNA molecules of different lengths will move at the same rate when an electric field is applied.
V(i, j) in Global Alignment
The score of the optimal global alignment of the prefix of sequence X of length i and the prefix of sequence Y of length j.
BLOSUM62 Diagonal Entries
Values in the scoring matrix representing match scores; these are generally positive because matching the same amino acid is biologically conserved and favored.
BLOSUM62 Off-Diagonal Entries
Values representing substitution scores; some are positive if the two amino acids have similar physicochemical properties.
Gap Column Score
The penalty applied in sequence alignment when a character is aligned against a null or dash symbol.
Burrows-Wheeler Transform (BWT)
A string transformation developed by Burrows and Wheeler, often used in bioinformatics to facilitate efficient searching.
Suffix Array
A data structure consisting of a sorted list of all suffixes of a string, which can be used to quickly produce the BWT.
Mantra (Prefix of Suffix)
A central mantra of string processing stating that every substring is the prefix of some suffix.
Mantra (Suffix of Prefix)
A central mantra of string processing stating that every substring is the suffix of some prefix.
Cyclic Permutation
A string rearrangement formed by shifting characters from one end of the string to the other, used in constructing the BWT matrix.
End-of-string token ($)
A special character added to the end of a string to denote its termination and ensure proper sorting in transforms like BWT.
0-indexing
A convention where the first character of a string is at index 0, used for BWT and FM-index problems.
Burrows-Wheeler Matrix (BWM)
A matrix containing all cyclic permutations of a string sorted lexicographically.
Inverse BWT
The process used to reverse the Burrows-Wheeler transformation and recover the original input string.
Optimal Local Alignment
The highest scoring alignment between any substring of sequence X and any substring of sequence Y, identified as the maximum value in a DP table.
Optimal Global Alignment
The highest scoring alignment that spans the entire length of both sequence X and sequence Y, found at the bottom-right cell of the DP table.
FM-index
A compressed indexed structure based on the BWT that allows for fast string matching in a reference genome.
Thermus aquaticus
The genus and species of the bacterium that is the source of Taq polymerase, a thermostable enzyme used in DNA sequencing and PCR.
Taq polymerase
An early example of a thermostable polymerase required for DNA sequencing experiments.
Match Column
In sequence alignment, a column where both sequences contain the same character.
Mismatch Column
In sequence alignment, a column where both sequences contain different characters.
Hierarchical BAC-based strategy
An assembly method that uses cloned segments of DNA to manage the complexity of repetitive genomic sequences.
Paired-end reads
Sequence reads from both ends of a DNA fragment used to bridge repetitive regions during genome assembly.
Celera Genomics
A private organization that utilized WGS sequencing and paired-end reads to sequence the human genome.
BLAST
A widely used sequence alignment tool authored by several researchers including Gene Myers.
NIH Human Genome Research Institute
The organization led by Francis Collins that played a major role in the Human Genome Project.
Depth of Coverage
An equivalent term for the average number of times each position in the genome is sequenced during an assembly project.
Lexicographical Order
The alphabetical sorting method used to arrange cyclic permutations in a BWT matrix or suffixes in a Suffix Array.
Range Search (FM-index)
The process of finding appearances of a query string by iteratively narrowing the range of indices in the first and last columns of the BWM.
Approximate Seconds in a Year
A memorable approximation defined as π×107 seconds.
Nobel Prize Organizations
Entities including the International Committee of the Red Cross or the UNHCR which have won multiple Nobel Prizes.
ddNTPs
Dideoxynucleotide triphosphates used in Sanger sequencing to terminate DNA chain elongation.
Fluorophore-conjugated
A modification to ddNTPs using fluorescent dyes to allow for single-reaction automated sequencing.
Repetitive Property
The quality of the human genome that made de novo assembly difficult using only short reads.
Lexical First Column (F)
The first column of the BWT matrix, which contains all characters of the input string sorted lexicographically.
BWT Column (L)
The last column of the BWM, which constitutes the actual Burrows-Wheeler Transform of the string.
Prefix
A substring that starts at the beginning of a sequence.
Suffix
A substring that extends to the end of a sequence.
Scoring Function
A defined set of values for matches, mismatches, and gaps used to calculate the optimal alignment in dynamic programming.