Quality parameters to check for

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/10

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:54 PM on 3/14/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

11 Terms

1
New cards

Obtaining reads

Genome is fragmented before sequencing and each fragment results in a sequencing read

2
New cards

FASTQC

Tool used for assessing the quality of high-throughput sequencing data. Its primary purpose is to provide a comprehensive report on various quality metrics, allowing users to identify potential issues in the sequencing data.

3
New cards

Components of FASTQC report

  1. Per base sequence quality

  2. Per base sequence content

  3. Per sequence GC content

  4. Adapter content

  5. Overrepresented sequences

4
New cards

Per base sequence quality

Shows an overview of the range of quality values across all bases at each position in the FastQ file. The y-axis shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y-axis into very good quality scores (green), scores of reasonable quality (orange) and reads of poor quality (red).

5
New cards

Per base sequence content

Plots the % of each of the 4 nucleotides at each position across all reads in the input sequence file.

In random library you would expect that there would be little to no differences between the different bases of a sequence run, so the lines in this plot should run parallel with each other. If you see strong baises which change in different bases then this usually indicates an overrepresented sequence which is contaminating the library. A bias which is consistent across all bases either indicates that the original library was sequence baised or that there was a systematic problem during the sequencing of the library.

6
New cards

Good base per sequence content report

The percentage of each nucleotide at every position across all reads is shown.

The four lines should run parallel.

The proportions should be relatively constant across the reads. (A=T and G=C for most genomes).

Skewed base composition= the library was generated using random primers. The adapters/priming sequences contribute non-random bases.

7
New cards

Per sequence GC content

Gives GC distribution over all sequences. Good: GC content of the central peak corresponds to the expected %GC for organisms. This distribution should be normal unless overrepresentation or contamination with another organism.

If central peak does not correspond to the theoretical distribution, this would indicate some type of over-represented sequence with the shap peaks, indicating ether contamination or a highly over expressed gene

8
New cards

Good per sequence content content report

Normal bell shaped. GC content forms a smooth gassing distribution. Real sequencing datasets from unbiased libraries typically show a single symmetric peak.

Observed GC content matches the theoretical curve meaning no contamination, no enrichment for GC rich or AT rich regions, no strong overrepresented sequences. No major primer or adapter influence.

The maximum of the peak is around the expected GC% for the organism (-40-50%). Peak shifted left or right could indicate a possible contamination or library bias.

9
New cards

Adapter content

Plot shows cumulative % of reads with the different adapter sequences at each position. Once an adapter sequence is seen in a read it is counted as being present right through to the end of the read so the percentage increases with read length.

10
New cards

Overrepresented sequences

Display the sequences that occur in more than 0.1% of the total number of sequences.

Table aids in identifying contamination (if %GC was not ideal, table helps id source).

A normal high-throughput library will contain a diverse set of sequences, with no individual sequence making up a tiny fraction of the whole. Finding that a single sequence is very overrepsented in the set either means that it is highly biologically significant or indicates that the library is contaminated or not as diverse as you expected.

11
New cards

Quality profiling of raw sequencing data

When the median quality is below a Phred score of 20, we should consider trimming away bad quality bases from the sequence. The quality control and preprocessing of raw FASTQ files is critical especially for degraded samples and involves removing adapter sequences, filtering low quality/complexity reads, error correction etc.

Sequence matching based adapter trimming tools like Trimmomatic, Cutadapt and SOAPnuke can be employed as adapter trimmers and can also perform quality filtering

Explore top notes

note
The History of Ice Cream
Updated 380d ago
0.0(0)
note
Latin Grammar Revision
Updated 841d ago
0.0(0)
note
greece
Updated 1029d ago
0.0(0)
note
Chapter 12: History of key ideas
Updated 1200d ago
0.0(0)
note
Unit 1: Thinking Geographically
Updated 469d ago
0.0(0)
note
APUSH Unit 4 (vocab)
Updated 676d ago
0.0(0)
note
The History of Ice Cream
Updated 380d ago
0.0(0)
note
Latin Grammar Revision
Updated 841d ago
0.0(0)
note
greece
Updated 1029d ago
0.0(0)
note
Chapter 12: History of key ideas
Updated 1200d ago
0.0(0)
note
Unit 1: Thinking Geographically
Updated 469d ago
0.0(0)
note
APUSH Unit 4 (vocab)
Updated 676d ago
0.0(0)

Explore top flashcards

flashcards
All Pre APs
48
Updated 202d ago
0.0(0)
flashcards
Russian CH 9 vocab
89
Updated 486d ago
0.0(0)
flashcards
Comp - Week 2 Vocab
58
Updated 856d ago
0.0(0)
flashcards
Patho Exam 4
292
Updated 593d ago
0.0(0)
flashcards
Unit 7: Period 7: 1890–1945
47
Updated 67d ago
0.0(0)
flashcards
Pre Lab 8
38
Updated 1054d ago
0.0(0)
flashcards
Ap World Unit 7 Vocab
63
Updated 1093d ago
0.0(0)
flashcards
All Pre APs
48
Updated 202d ago
0.0(0)
flashcards
Russian CH 9 vocab
89
Updated 486d ago
0.0(0)
flashcards
Comp - Week 2 Vocab
58
Updated 856d ago
0.0(0)
flashcards
Patho Exam 4
292
Updated 593d ago
0.0(0)
flashcards
Unit 7: Period 7: 1890–1945
47
Updated 67d ago
0.0(0)
flashcards
Pre Lab 8
38
Updated 1054d ago
0.0(0)
flashcards
Ap World Unit 7 Vocab
63
Updated 1093d ago
0.0(0)