Bioinformatics Lecture Review

0.0(0)
Studied by 0 people
call kaiCall Kai
Locked
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/33

flashcard set

Earn XP

Description and Tags

Comprehensive vocabulary flashcards covering bioinformatics topics including sequencing technologies, assembly, alignment algorithms, statistics, phylogeny, and transcriptomics.

Last updated 3:54 PM on 4/30/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai
Chat

No analytics yet

Send a link to your students to track their progress

34 Terms

1
New cards

Illumina Sequencing Errors

A characteristic where read quality typically decreases towards the end of each sequence read.

2
New cards

Metatranscriptomics Strategy

Extracting RNA, fragmenting, sequencing, and using reads to assemble all transcripts in a sample to identify expressed genes, especially from unknown bacteria.

3
New cards

Nanopore Sequencing

A sequencing method recommended for marker genes like 16SrRNA16S\,rRNA because it provides long reads helpful for determining gene copy numbers in prokaryotic genomes.

4
New cards

Read Pair Calculation

To sequence an E.coliE.\,coli genome of 5imes106bp5 imes 10^6\,bp with paired-end 150bp150\,bp reads for a depth of 3030, the required number of read pairs is 500,000500,000.

5
New cards

Read Coverage Probability

In a circular genome of 3,000,000bp3,000,000\,bp, the probability that a random read of length 150bp150\,bp covers a specific position is 1503,000,000\frac{150}{3,000,000}.

6
New cards

Assembly Mapping Disparity

If a region has double the expected mapping depth (e.g., 200200 reads vs. an average of 100100), it indicates the region occurs in twice as many copies in the sequenced genome compared to the reference.

7
New cards

Contig

A segment of the genome that has been assembled from overlapping sequence reads.

8
New cards

N50 Value

A statistical measure of genome assembly quality; for contig lengths of 100100, 200200, 300300, 400400, 500500, 600600, and 700700, the N50 value is 500500.

9
New cards

Hash Table

A data structure optimized for the fastest possible retrieval of a stored element.

10
New cards

Computational Complexity O(N3)O(N^3)

An algorithm property where doubling the problem size NN results in an eightfold (232^3) increase in processing time.

11
New cards

Sensitivity (Homology Search)

The ratio of correctly identified homologs to the total number of true homologs in the database (e.g., 35/5035/50).

12
New cards

Specificity (Homology Search)

The ratio of correctly identified non-homologs to the total number of non-homologs in the database (e.g., 945/950945/950).

13
New cards

BLAST Word Length

A parameter where increasing the length results in fewer total hits within the database.

14
New cards

Affine Gap Penalty

A scoring system that applies a higher penalty for initiating a gap than for extending an existing one.

15
New cards

Extreme Value Distribution

A statistical distribution used to model the score values of the best sequence alignment.

16
New cards

Protein Sequence Identity

A measure of similarity that is discouraged for protein sequences because different amino acids have varying substitution score values.

17
New cards

Sum-of-Pairs Score

The total score of a multiple sequence alignment calculated by summing the scores of all possible pairwise alignments.

18
New cards

Progressive Alignment Method

A multiple sequence alignment approach, such as that using a guide-tree, characterized by the inability to correct errors made in early steps.

19
New cards

Newick Format

A standard data format used to describe the topology and branch lengths of a phylogenetic tree.

20
New cards

PSSM Probabilities

For a protein pattern covering 66 positions, a Position-Specific Scoring Matrix requires 120120 individual probabilities (20extaminoacidsimes6extpositions20 ext{ amino acids} imes 6 ext{ positions}).

21
New cards

PROSITE Model

A syntax for protein motifs; for example, and the pattern G[LI][CHK]HLXC(2)F[YR]WG-[LI]-[CHK]-H-L-X-C(2)-F-[YR]-W describes specific conserved and variable residues.

22
New cards

PHI-BLAST

A variant of BLAST that utilizes a PROSITE-pattern during the database search.

23
New cards

PSI-BLAST

A variant of BLAST that creates a PSSM from hit sequences to perform iterative searches.

24
New cards

Gene Enrichment (Over-representation)

A statistical result indicating that a set of upregulated genes contains more genes related to a specific function (e.g., cold stress) than would be expected by chance.

25
New cards

Volcano Plot Outlier

A data point representing a gene with high fold-change but no statistical significance, often caused by high variance (spread) between samples within the same treatment group.

26
New cards

False Discovery Rate (q-value)

A method for correcting p-values; a q-value threshold of 0.050.05 implies that 5%5\% of the significant genes are expected to be false positives.

27
New cards

Principal Component Analysis (PCA)

A technique used to identify outliers, groups, or gradients within transcriptomics data tables.

28
New cards

Principal Coordinate Analysis (PCoA)

A dimensionality reduction technique typically applied to distance tables rather than raw data tables.

29
New cards

Fisher's Exact Test

A test used to determine if the overlap between two groupings of the same genes is significantly larger than expected.

30
New cards

Metabarcoding

The process of mapping the biological composition of an environment by sequencing specific marker genes.

31
New cards

Alpha Diversity vs. Beta Diversity

Alpha diversity refers to the diversity within a single sample, while Beta diversity measures the diversity difference between samples.

32
New cards

Maximum Likelihood (ML) Advantage

A phylogenetic reconstruction method that utilizes sequence data more effectively and incorporates evolutionary models compared to distance-based methods.

33
New cards

Taxonomic Classification

The process of recognizing a sequence variant in metagenomics and assigning it a scientific name.

34
New cards

BLAST Bit-score

A normalized score that depends on the scoring table and is used to calculate the E-value via a simple formula.