1/46
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Why are sequence alignments important in bioinformatics?
They are used to identify genes, infer evolutionary relationships, predict function, and detect disease-related variants.
Why does sequence similarity imply shared function or ancestry?
because sequences are inherited and conserved through evolution, often preserving function.
Why were 3rd-generation sequencing technologies developed?
Because NGS produces short reads
that make genome assembly difficult,
while 3rd-generation methods produce long reads.
What is the key feature of PacBio SMRT sequencing?
reads single DNA molecules in real time 🕰
and produces long reads with errors
that can be corrected by concensus
What is the key feature of Oxford Nanopore sequencing?
It reads single DNA molecules as they pass through a nanopore,
detecting bases via changes in electrical current.
What are 4 major advantages of Oxford Nanopore sequencing?
Long reads,
no PCR or cloning,
portability,
and usefulness in field-based and rapid sequencing.
Why do random errors in long-read sequencing become less problematic?
Because random errors average out
when multiple reads combined into a sequence.
What is genome assembly?
The process of joining overlapping sequencing reads to reconstruct a genome sequence.
What is genome alignment?
The process of aligning sequencing reads to a known reference genome.
Why is alignment generally preferred over assembly in humans?
Because a high-quality human reference genome exists, making alignment faster and more efficient.
What biological question does sequence comparison fundamentally address?
Whether sequences share evolutionary ancestry and functional similarity.
What are the three main types of sequence comparisons?
Pairwise alignment (one-to-one), database searching (one-to-many), and multiple sequence alignment (many-to-many).
What is global alignment best suited for?
Comparing sequences of similar length that are closely related across their entire length.
What is local alignment best suited for?
Finding short regions of high similarity within otherwise dissimilar sequences.
Why can low global similarity still be biologically meaningful?
Because short conserved regions may indicate shared functional domains.
What is a dot plot?
A matrix-based visual method that compares every position in one sequence to every position in another.
What does a diagonal line in a dot plot represent?
Strong similarity between sequences.
What does a broken diagonal in a dot plot indicate?
Related sequences with mutations such as insertions or deletions.
What does a short diagonal or small square in a dot plot indicate?
Partial similarity between sequences.
What does the absence of diagonals in a dot plot indicate?
Unrelated sequences.
Why do dot plots contain noise?
Random matches occur by chance, especially in DNA sequences.
What is the approximate level of random matches in DNA vs protein dot plots?
DNA ~25% random matches; protein ~5% random matches.
How can noise in dot plots be reduced?
By using a sliding window and requiring a minimum number of matches.
What types of biological features influence window size choice in dot plots?
Exon size, protein domains, enzyme active sites, and promoters.
What types of mutations can be detected using dot plots?
Substitutions, insertions, deletions, duplications, inversions, translocations, and indels.
How do insertions or deletions appear on dot plots?
As shift indels or breaks in the diagonal.
Why are amino acid alignments more sensitive than nucleotide alignments?
Because the genetic code is degenerate and proteins better reflect conserved function.
Why are protein alignments preferred for distantly related species?
Protein sequences retain functional conservation even when DNA sequences diverge.
What is simple identity scoring in sequence alignment?
Matches score 1, mismatches score 0, and the total score is the number of matches.
Why is identity scoring alone insufficient?
Different alignments can produce different scores depending on shifts, leading to ambiguity.
Why are gaps introduced in sequence alignments?
To reflect biological insertions and deletions and improve alignment quality.
Why must gaps be penalised?
Because excessive gaps can create biologically unrealistic alignments.
What is a gap opening penalty?
A large penalty applied when a gap is first introduced.
What is a gap extension penalty?
A smaller penalty applied when an existing gap is extended.
Why are fewer large gaps preferred over many small gaps?
Because they better reflect biological mutation processes.
What are transitions in nucleotide substitutions?
Substitutions between purines (A↔G) or between pyrimidines (C↔T).
What are transversions in nucleotide substitutions?
Substitutions between a purine and a pyrimidine.
Why should transitions be penalised less than transversions?
Because transitions occur more frequently in evolution.
Why are substitution matrices used instead of simple match/mismatch scoring?
used because different amino acid substitutions
occur with different frequencies
and have different biochemical impacts,
which simple match/mismatch scoring cannot capture.
What are PAM matrices based on?
on global alignments
of closely related proteins
and model how amino acids change over evolutionary time.
What is a limitation of PAM matrices?
Extrapolation can be inaccurate for distant relationships.
What are BLOSUM matrices based on?
Local conserved blocks of protein regions without extrapolation.
Why is BLOSUM62 commonly used?
It performs well for detecting local similarity in protein alignments.
What scoring patterns are typical in BLOSUM62?
Identical amino acids score highly, chemically similar substitutions score moderately, and rare substitutions score negatively.
When should BLOSUM matrices be preferred over PAM matrices?
For local alignments and similarity searches.
What are the three core components of alignment scoring?
Matches, gap penalties, and substitution matrices.