bioinformatics lecture 3 (IMPORTANT)

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/46

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

47 Terms

1
New cards

Why are sequence alignments important in bioinformatics?

They are used to identify genes, infer evolutionary relationships, predict function, and detect disease-related variants.

2
New cards

Why does sequence similarity imply shared function or ancestry?

because sequences are inherited and conserved through evolution, often preserving function.

3
New cards

Why were 3rd-generation sequencing technologies developed?

Because NGS produces short reads

that make genome assembly difficult,

while 3rd-generation methods produce long reads.

4
New cards

What is the key feature of PacBio SMRT sequencing?

reads single DNA molecules in real time 🕰

and produces long reads with errors

that can be corrected by concensus

5
New cards

What is the key feature of Oxford Nanopore sequencing?

It reads single DNA molecules as they pass through a nanopore,

detecting bases via changes in electrical current.

6
New cards

What are 4 major advantages of Oxford Nanopore sequencing?

Long reads,

no PCR or cloning,

portability,

and usefulness in field-based and rapid sequencing.

7
New cards

Why do random errors in long-read sequencing become less problematic?

Because random errors average out

when multiple reads combined into a sequence.

8
New cards

What is genome assembly?

The process of joining overlapping sequencing reads to reconstruct a genome sequence.

9
New cards

What is genome alignment?

The process of aligning sequencing reads to a known reference genome.

10
New cards

Why is alignment generally preferred over assembly in humans?

Because a high-quality human reference genome exists, making alignment faster and more efficient.

11
New cards

What biological question does sequence comparison fundamentally address?

Whether sequences share evolutionary ancestry and functional similarity.

12
New cards

What are the three main types of sequence comparisons?

Pairwise alignment (one-to-one), database searching (one-to-many), and multiple sequence alignment (many-to-many).

13
New cards

What is global alignment best suited for?

Comparing sequences of similar length that are closely related across their entire length.

14
New cards

What is local alignment best suited for?

Finding short regions of high similarity within otherwise dissimilar sequences.

15
New cards

Why can low global similarity still be biologically meaningful?

Because short conserved regions may indicate shared functional domains.

16
New cards

What is a dot plot?

A matrix-based visual method that compares every position in one sequence to every position in another.

17
New cards

What does a diagonal line in a dot plot represent?

Strong similarity between sequences.

18
New cards

What does a broken diagonal in a dot plot indicate?

Related sequences with mutations such as insertions or deletions.

19
New cards

What does a short diagonal or small square in a dot plot indicate?

Partial similarity between sequences.

20
New cards

What does the absence of diagonals in a dot plot indicate?

Unrelated sequences.

21
New cards

Why do dot plots contain noise?

Random matches occur by chance, especially in DNA sequences.

22
New cards

What is the approximate level of random matches in DNA vs protein dot plots?

DNA ~25% random matches; protein ~5% random matches.

23
New cards

How can noise in dot plots be reduced?

By using a sliding window and requiring a minimum number of matches.

24
New cards

What types of biological features influence window size choice in dot plots?

Exon size, protein domains, enzyme active sites, and promoters.

25
New cards

What types of mutations can be detected using dot plots?

Substitutions, insertions, deletions, duplications, inversions, translocations, and indels.

26
New cards

How do insertions or deletions appear on dot plots?

As shift indels or breaks in the diagonal.

27
New cards

Why are amino acid alignments more sensitive than nucleotide alignments?

Because the genetic code is degenerate and proteins better reflect conserved function.

28
New cards

Why are protein alignments preferred for distantly related species?

Protein sequences retain functional conservation even when DNA sequences diverge.

29
New cards

What is simple identity scoring in sequence alignment?

Matches score 1, mismatches score 0, and the total score is the number of matches.

30
New cards

Why is identity scoring alone insufficient?

Different alignments can produce different scores depending on shifts, leading to ambiguity.

31
New cards

Why are gaps introduced in sequence alignments?

To reflect biological insertions and deletions and improve alignment quality.

32
New cards

Why must gaps be penalised?

Because excessive gaps can create biologically unrealistic alignments.

33
New cards

What is a gap opening penalty?

A large penalty applied when a gap is first introduced.

34
New cards

What is a gap extension penalty?

A smaller penalty applied when an existing gap is extended.

35
New cards

Why are fewer large gaps preferred over many small gaps?

Because they better reflect biological mutation processes.

36
New cards

What are transitions in nucleotide substitutions?

Substitutions between purines (A↔G) or between pyrimidines (C↔T).

37
New cards

What are transversions in nucleotide substitutions?

Substitutions between a purine and a pyrimidine.

38
New cards

Why should transitions be penalised less than transversions?

Because transitions occur more frequently in evolution.

39
New cards

Why are substitution matrices used instead of simple match/mismatch scoring?

used because different amino acid substitutions

occur with different frequencies

and have different biochemical impacts,

which simple match/mismatch scoring cannot capture.

40
New cards

What are PAM matrices based on?

on global alignments

of closely related proteins

and model how amino acids change over evolutionary time.

41
New cards

What is a limitation of PAM matrices?

Extrapolation can be inaccurate for distant relationships.

42
New cards

What are BLOSUM matrices based on?

Local conserved blocks of protein regions without extrapolation.

43
New cards

Why is BLOSUM62 commonly used?

It performs well for detecting local similarity in protein alignments.

44
New cards

What scoring patterns are typical in BLOSUM62?

Identical amino acids score highly, chemically similar substitutions score moderately, and rare substitutions score negatively.

45
New cards

When should BLOSUM matrices be preferred over PAM matrices?

For local alignments and similarity searches.

46
New cards

What are the three core components of alignment scoring?

Matches, gap penalties, and substitution matrices.

47
New cards