Bioinformatics Lecture Review

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/33

Earn XP

Description and Tags

Comprehensive vocabulary flashcards covering bioinformatics topics including sequencing technologies, assembly, alignment algorithms, statistics, phylogeny, and transcriptomics.

Last updated 3:54 PM on 4/30/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

34 Terms

New cards

Illumina Sequencing Errors

A characteristic where read quality typically decreases towards the end of each sequence read.

New cards

Metatranscriptomics Strategy

Extracting RNA, fragmenting, sequencing, and using reads to assemble all transcripts in a sample to identify expressed genes, especially from unknown bacteria.

New cards

Nanopore Sequencing

A sequencing method recommended for marker genes like $16S\,rRNA$ because it provides long reads helpful for determining gene copy numbers in prokaryotic genomes.

New cards

Read Pair Calculation

To sequence an $E.\,coli$ genome of $5 imes 10^6\,bp$ with paired-end $150\,bp$ reads for a depth of $30$ , the required number of read pairs is $500,000$ .

New cards

Read Coverage Probability

In a circular genome of $3,000,000\,bp$ , the probability that a random read of length $150\,bp$ covers a specific position is $\frac{150}{3,000,000}$ .

New cards

Assembly Mapping Disparity

If a region has double the expected mapping depth (e.g., $200$ reads vs. an average of $100$ ), it indicates the region occurs in twice as many copies in the sequenced genome compared to the reference.

New cards

Contig

A segment of the genome that has been assembled from overlapping sequence reads.

New cards

N50 Value

A statistical measure of genome assembly quality; for contig lengths of $100$ , $200$ , $300$ , $400$ , $500$ , $600$ , and $700$ , the N50 value is $500$ .

New cards

Hash Table

A data structure optimized for the fastest possible retrieval of a stored element.

New cards

Computational Complexity $O(N^3)$

An algorithm property where doubling the problem size $N$ results in an eightfold ( $2^3$ ) increase in processing time.

New cards

Sensitivity (Homology Search)

The ratio of correctly identified homologs to the total number of true homologs in the database (e.g., $35/50$ ).

New cards

Specificity (Homology Search)

The ratio of correctly identified non-homologs to the total number of non-homologs in the database (e.g., $945/950$ ).

New cards

BLAST Word Length

A parameter where increasing the length results in fewer total hits within the database.

New cards

Affine Gap Penalty

A scoring system that applies a higher penalty for initiating a gap than for extending an existing one.

New cards

Extreme Value Distribution

A statistical distribution used to model the score values of the best sequence alignment.

New cards

Protein Sequence Identity

A measure of similarity that is discouraged for protein sequences because different amino acids have varying substitution score values.

New cards

Sum-of-Pairs Score

The total score of a multiple sequence alignment calculated by summing the scores of all possible pairwise alignments.

New cards

Progressive Alignment Method

A multiple sequence alignment approach, such as that using a guide-tree, characterized by the inability to correct errors made in early steps.

New cards

Newick Format

A standard data format used to describe the topology and branch lengths of a phylogenetic tree.

New cards

PSSM Probabilities

For a protein pattern covering $6$ positions, a Position-Specific Scoring Matrix requires $120$ individual probabilities ( $20 ext{ amino acids} imes 6 ext{ positions}$ ).

New cards

PROSITE Model

A syntax for protein motifs; for example, and the pattern $G-[LI]-[CHK]-H-L-X-C(2)-F-[YR]-W$ describes specific conserved and variable residues.

New cards

PHI-BLAST

A variant of BLAST that utilizes a PROSITE-pattern during the database search.

New cards

PSI-BLAST

A variant of BLAST that creates a PSSM from hit sequences to perform iterative searches.

New cards

Gene Enrichment (Over-representation)

A statistical result indicating that a set of upregulated genes contains more genes related to a specific function (e.g., cold stress) than would be expected by chance.

New cards

Volcano Plot Outlier

A data point representing a gene with high fold-change but no statistical significance, often caused by high variance (spread) between samples within the same treatment group.

New cards

False Discovery Rate (q-value)

A method for correcting p-values; a q-value threshold of $0.05$ implies that $5\%$ of the significant genes are expected to be false positives.

New cards

Principal Component Analysis (PCA)

A technique used to identify outliers, groups, or gradients within transcriptomics data tables.

New cards

Principal Coordinate Analysis (PCoA)

A dimensionality reduction technique typically applied to distance tables rather than raw data tables.

New cards

Fisher's Exact Test

A test used to determine if the overlap between two groupings of the same genes is significantly larger than expected.

New cards

Metabarcoding

The process of mapping the biological composition of an environment by sequencing specific marker genes.

New cards

Alpha Diversity vs. Beta Diversity

Alpha diversity refers to the diversity within a single sample, while Beta diversity measures the diversity difference between samples.

New cards

Maximum Likelihood (ML) Advantage

A phylogenetic reconstruction method that utilizes sequence data more effectively and incorporates evolutionary models compared to distance-based methods.

New cards

Taxonomic Classification

The process of recognizing a sequence variant in metagenomics and assigning it a scientific name.

New cards

BLAST Bit-score

A normalized score that depends on the scoring table and is used to calculate the E-value via a simple formula.