1/40
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Sequence alignment
The process of arranging DNA, RNA, or protein sequences to identify regions of similarity
Importance of alignment in NGS
Reads are short and fragmented; alignment maps them to a reference genome for interpretation
Downstream analyses requiring alignment
Variant calling, gene annotation, gene expression analysis
Global alignment
Aligns entire sequences end-to-end (e.g., Needleman–Wunsch)
Local alignment
Aligns best matching subsequences (e.g., Smith–Waterman)
Example of local alignment tool
BLAST
Dot plot purpose
Visualises all possible alignments; diagonals indicate similarity
Dynamic programming in alignment
Computes optimal scores using match, mismatch, and gap penalties
Indexing in alignment
Preprocessing the reference genome to speed up sequence searching
Features of a high-quality alignment
Many matches, few mismatches, few gaps
SAM acronym
Sequence Alignment/Map
What a SAM file contains
Information on where and how each read aligns to the reference genome
Why BAM files are used
Binary compressed SAM files that save space
Examples of analyses using SAM/BAM
Mutation detection, genome assembly, gene expression studies
Genome assembly
Reconstructing a genome from short reads
De novo assembly
Building a genome from scratch without a reference
Reference-guided assembly
Aligning reads to an existing reference genome; faster and less computationally intensive
Reads definition
Short DNA fragments produced by a sequencer
Contigs definition
Contiguous sequences formed from overlapping reads
Scaffolds definition
Ordered and oriented groups of contigs
DNA features making assembly difficult
High GC content and repeat regions
Recommended read depth for bacterial assembly
About 50×
Solution to poor coverage
Repeat sequencing to increase depth
Hybrid assembly
Combining short and long reads for improved assembly
N50 definition
Length of the smallest contig such that 50% of the genome is in contigs of that length or longer
L50 definition
Number of contigs whose combined length makes up 50% of the genome
Largest contig size
Longest continuous assembled DNA sequence
Total assembly length expectation
Should approximate the organism’s true genome size
Illumina read length
50–600 bp
Nanopore read length
Kilobases to megabases
Why Illumina requires fragmentation
Read length is limited by the number of sequencing cycles
Advantage of Nanopore sequencing
Real-time long-read sequencing
Why long reads help assembly
They span repetitive regions more easily
Bioinformatics pipeline
A sequence of connected steps that transform input data into results
Example hybrid assembly tools
Unicycler and Hybracter
Three phases of the Human Genome Project
Mapping, sequencing, bioinformatics
Mapping techniques used in HGP
Restriction mapping, FISH, linkage maps, BAC/YAC libraries
2001 draft human genome
Covered 83% of the genome; error rate <1/1000
2003 finished genome
Covered 99% of gene-containing regions; error rate ~1/20,000
Information gained from alignment to a reference
Variants, structural differences, expression levels, mapping quality
Three alignment methods
Dot plots, dynamic programming, BLAST