Alignment and Assembly

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

Sequence alignment

The process of arranging DNA, RNA, or protein sequences to identify regions of similarity

2
New cards

Importance of alignment in NGS

Reads are short and fragmented; alignment maps them to a reference genome for interpretation

3
New cards

Downstream analyses requiring alignment

Variant calling, gene annotation, gene expression analysis

4
New cards

Global alignment

Aligns entire sequences end-to-end (e.g., Needleman–Wunsch)

5
New cards

Local alignment

Aligns best matching subsequences (e.g., Smith–Waterman)

6
New cards

Example of local alignment tool

BLAST

7
New cards

Dot plot purpose

Visualises all possible alignments; diagonals indicate similarity

8
New cards

Dynamic programming in alignment

Computes optimal scores using match, mismatch, and gap penalties

9
New cards

Indexing in alignment

Preprocessing the reference genome to speed up sequence searching

10
New cards

Features of a high-quality alignment

Many matches, few mismatches, few gaps

11
New cards

SAM acronym

Sequence Alignment/Map

12
New cards

What a SAM file contains

Information on where and how each read aligns to the reference genome

13
New cards

Why BAM files are used

Binary compressed SAM files that save space

14
New cards

Examples of analyses using SAM/BAM

Mutation detection, genome assembly, gene expression studies

15
New cards

Genome assembly

Reconstructing a genome from short reads

16
New cards

De novo assembly

Building a genome from scratch without a reference

17
New cards

Reference-guided assembly

Aligning reads to an existing reference genome; faster and less computationally intensive

18
New cards

Reads definition

Short DNA fragments produced by a sequencer

19
New cards

Contigs definition

Contiguous sequences formed from overlapping reads

20
New cards

Scaffolds definition

Ordered and oriented groups of contigs

21
New cards

DNA features making assembly difficult

High GC content and repeat regions

22
New cards

Recommended read depth for bacterial assembly

About 50×

23
New cards

Solution to poor coverage

Repeat sequencing to increase depth

24
New cards

Hybrid assembly

Combining short and long reads for improved assembly

25
New cards

N50 definition

Length of the smallest contig such that 50% of the genome is in contigs of that length or longer

26
New cards

L50 definition

Number of contigs whose combined length makes up 50% of the genome

27
New cards

Largest contig size

Longest continuous assembled DNA sequence

28
New cards

Total assembly length expectation

Should approximate the organism’s true genome size

29
New cards

Illumina read length

50–600 bp

30
New cards

Nanopore read length

Kilobases to megabases

31
New cards

Why Illumina requires fragmentation

Read length is limited by the number of sequencing cycles

32
New cards

Advantage of Nanopore sequencing

Real-time long-read sequencing

33
New cards

Why long reads help assembly

They span repetitive regions more easily

34
New cards

Bioinformatics pipeline

A sequence of connected steps that transform input data into results

35
New cards

Example hybrid assembly tools

Unicycler and Hybracter

36
New cards

Three phases of the Human Genome Project

Mapping, sequencing, bioinformatics

37
New cards

Mapping techniques used in HGP

Restriction mapping, FISH, linkage maps, BAC/YAC libraries

38
New cards

2001 draft human genome

Covered 83% of the genome; error rate <1/1000

39
New cards

2003 finished genome

Covered 99% of gene-containing regions; error rate ~1/20,000

40
New cards

Information gained from alignment to a reference

Variants, structural differences, expression levels, mapping quality

41
New cards

Three alignment methods

Dot plots, dynamic programming, BLAST