Bioinformatics final flash cards

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/138

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:41 AM on 12/10/25
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

139 Terms

1
New cards

What is machine learning?

Making predictions or decisions from data by finding patterns in data rather than using a closed‑form mathematical formula.

2
New cards

Essence of machine learning

An underlying pattern exists that we cannot describe with a closed form; we use data (preferably lots) to learn it.

3
New cards

ML notation: x, y, X, Y

x = feature vector; y = target value; X = matrix of feature vectors; Y = vector of target values.

4
New cards

Unknown target function f

The ideal (unknown) mapping from X to Y that we aim to approximate with data.

5
New cards

Training samples

Observed historical pairs (x_i, y_i) used to learn a model.

6
New cards

Hypothesis set H

A set of candidate functions {h1, h2, ...} from which the learning algorithm selects a final hypothesis.

7
New cards

Learning algorithm A

The procedure that uses training samples to select a hypothesis g ∈ H.

8
New cards

Final hypothesis g

The model chosen by the learning algorithm to approximate the unknown target function.

9
New cards

Supervised learning

Learning from labeled data; includes classification and regression.

10
New cards

Unsupervised learning

Learning from unlabeled data; includes clustering, association rules, and dimension reduction.

11
New cards

Classification task

Predicting a discrete label or class for an input (e.g., recurrent cancer vs no cancer).

12
New cards

Regression task

Predicting a continuous numerical value or relationship between variables.

13
New cards

Clustering

Grouping similar unlabeled data points into clusters (e.g., cancer subtypes).

14
New cards

Association rules

Finding co‑occurrence or co‑regulation patterns (e.g., coexpressed genes).

15
New cards

Dimension reduction

Reducing the number of features while preserving important structure (e.g., PCA).

16
New cards

Structured data

Data that fits into rows and columns (ideal for many ML algorithms).

17
New cards

Semi‑structured data

Data with partial structure such as JSON, HTML, or XML.

18
New cards

Unstructured data

Data without inherent tabular structure (images, free text, audio); often used in deep learning.

19
New cards

Data wrangling (munging)

Cleaning, correcting, and handling missing or inaccurate data before modeling.

20
New cards

Integration (data)

Combining multiple data sources into a single dataset for analysis.

21
New cards

Reduction (feature)

Consolidating or removing attributes to reduce dimensionality before modeling.

22
New cards

Aggregation (data)

Collapsing data to a common level (e.g., averages or sums) for analysis.

23
New cards

Exploratory Data Analysis (EDA)

Descriptive statistics, visualization, transforms, and tests used to understand data before modeling.

24
New cards

Normalization and standardization

Transformations to put features on comparable scales (e.g., z‑score, min‑max).

25
New cards

Training / Validation / Testing split

Common split: ~70% training, 10% validation, 20% testing; train fits model, validation tunes hyperparameters, test evaluates final performance.

26
New cards

Accuracy (definition)

Number of correct predictions divided by total predictions; depends on data quality and balance.

27
New cards

Gold standard data

High‑quality labeled data used as the reference for training and evaluation.

28
New cards

Accuracy paradox

High overall accuracy can be misleading when classes are imbalanced; a model can be accurate yet fail on the minority class.

29
New cards

Confusion matrix

Table summarizing true positives, false positives, true negatives, and false negatives (useful for classification evaluation).

30
New cards

Precision and recall

Precision = TP/(TP+FP); Recall = TP/(TP+FN); useful when class imbalance matters.

31
New cards

ROC curve and AUC

ROC plots true positive rate vs false positive rate; AUC measures overall separability of classes.

32
New cards

Decision trees

Classification/regression algorithm that splits data by feature thresholds to form a tree of decisions.

33
New cards

Random forest

Ensemble of decision trees that reduces variance by averaging many trees trained on bootstrapped samples.

34
New cards

Gradient boosting machines

Ensemble method that builds models sequentially to correct errors of prior models (e.g., XGBoost, LightGBM).

35
New cards

k‑nearest neighbors (k‑NN)

Predicts label based on the labels of the k closest training examples in feature space.

36
New cards

Logistic regression

Linear model for binary classification that outputs probabilities via the logistic function.

37
New cards

Naive Bayes

Probabilistic classifier assuming conditional independence of features given the class.

38
New cards

Support Vector Machine (SVM)

Finds a hyperplane that maximizes margin between classes; can use kernels for nonlinearity.

39
New cards

Neural networks

Models composed of layers of interconnected units (neurons) that learn hierarchical representations.

40
New cards

SVR (Support Vector Regression)

SVM variant for regression tasks.

41
New cards

K‑means clustering

Partitioning algorithm that assigns points to k clusters by minimizing within‑cluster variance.

42
New cards

Hierarchical clustering

Builds nested clusters by merging or splitting clusters based on distance metrics.

43
New cards

Apriori algorithm

Classic algorithm for mining frequent itemsets and association rules.

44
New cards

FP‑tree (frequent pattern tree)

Data structure for efficient mining of frequent patterns and association rules.

45
New cards

Principal Component Analysis (PCA)

Linear dimension reduction technique that projects data onto orthogonal components capturing maximum variance.

46
New cards

Linear Discriminant Analysis (LDA)

Dimension reduction and classification method that maximizes class separability in projected space.

47
New cards

t‑SNE

Nonlinear dimension reduction technique for visualizing high‑dimensional data in 2D/3D.

48
New cards

Autoencoder

Neural network that learns compressed representations (encoding) and reconstructs inputs (decoding).

49
New cards

Data quality importance

Models can only be as good as the data they are trained on; poor labels or bias degrade performance.

50
New cards

Overfitting

Model fits training data too closely and fails to generalize to new data; regularization and validation help prevent it.

51
New cards

Underfitting

Model is too simple to capture underlying patterns; increases bias and poor performance on train and test.

52
New cards

Cross‑validation

Method to estimate model performance by splitting data into multiple train/validation folds (e.g., k‑fold CV).

53
New cards

Feature engineering

Creating or transforming features to improve model performance (domain knowledge often helps).

54
New cards

Bias and variance tradeoff

Bias: error from wrong assumptions; Variance: error from sensitivity to training data; balance is key.

55
New cards

Regularization

Techniques (L1, L2) that penalize model complexity to reduce overfitting.

56
New cards

Hyperparameter tuning

Selecting model parameters (not learned during training) using validation data or CV (grid/random search).

57
New cards

Ensemble methods

Combine multiple models (bagging, boosting, stacking) to improve predictive performance.

58
New cards

Feature selection

Selecting a subset of relevant features to reduce dimensionality and improve model interpretability.

59
New cards

Imbalanced data strategies

Use resampling, class weighting, or specialized metrics when classes are imbalanced.

60
New cards

6 standard criteria of life

List: Cellular organization; Homeostasis; Metabolism; Response to stimuli; Reproduction; Adaptation and evolution.

61
New cards

Cellular organization

Organisms are composed of one or more highly organized cells.

62
New cards

Homeostasis

Organisms regulate and maintain internal conditions within narrow limits.

63
New cards

Metabolism

Organisms carry out complex chemical reactions to sustain life.

64
New cards

Response to stimuli

Organisms can detect and respond to changes in their environment.

65
New cards

Reproduction

Organisms produce new individuals of the same species (asexual or sexual).

66
New cards

Adaptation and evolution

Populations change over time through natural selection and genetic variation.

67
New cards

Prokaryotic vs Eukaryotic

Prokaryotes lack a nucleus (nucleoid instead); eukaryotes have a membrane‑bound nucleus.

68
New cards

Nucleoid

Region in prokaryotes containing the genome (not membrane‑bound).

69
New cards

Organelles

Membrane‑bound or specialized cellular compartments performing distinct functions.

70
New cards

Non‑organelle structures

Examples: cell membrane, cell wall, cytoplasm, flagella, pilus, ribosome.

71
New cards

Cell membrane

Phospholipid bilayer with embedded proteins, channels, and cholesterol that separates inside from outside.

72
New cards

Phospholipid bilayer

Two layers of phospholipids with hydrophilic heads and hydrophobic tails forming the membrane.

73
New cards

Integral vs peripheral proteins

Integral proteins span the membrane; peripheral proteins attach to the surface.

74
New cards

Cell wall

Flexible structural outer layer providing support and protection; composition varies by organism.

75
New cards

Cytoplasm

Liquid component inside the cell membrane containing organelles and molecules; ~80% water.

76
New cards

Flagellum

Hairlike appendage providing motility by whipping; structure varies across species.

77
New cards

Pilus

Short hairlike appendage used for adhesion, sensing, and cell recognition.

78
New cards

Ribosome

Complex of RNA and proteins that translates mRNA into polypeptides (protein synthesis).

79
New cards

Cilia

Threadlike projections that can move cells or sense the environment; similar to pilus in function.

80
New cards

Chloroplast

Membrane‑bound organelle in plants and algae that performs photosynthesis and contains chlorophyll.

81
New cards

Cytoskeleton

Network of protein filaments (actin, microtubules, intermediate filaments) providing structure and transport.

82
New cards

Rough Endoplasmic Reticulum (RER)

Membrane network studded with ribosomes; major site of protein synthesis and folding.

83
New cards

Smooth Endoplasmic Reticulum (SER)

Membrane network involved in lipid and small molecule synthesis; fewer ribosomes.

84
New cards

Golgi apparatus

Receives, sorts, modifies, and packages molecules for transport within or outside the cell.

85
New cards

Mitochondria

Membrane‑bound organelle that generates ATP; contains its own genome and is maternally inherited.

86
New cards

Vesicles

Small membrane‑bound compartments (endosomes, lysosomes, peroxisomes, vacuoles) for transport and storage.

87
New cards

Endosome vs lysosome

Endosomes sort internalized material; lysosomes contain degradative enzymes for breakdown.

88
New cards

Peroxisome

Organelle involved in oxidative reactions and detoxification (e.g., hydrogen peroxide metabolism).

89
New cards

Exosome

Small extracellular vesicle involved in intercellular communication and transport of molecules.

90
New cards

Macromolecule

definition: large polymeric biological molecules such as nucleic acids, proteins, carbohydrates, and lipids.

91
New cards

Monomer vs polymer

Monomer = single subunit (e.g., nucleotide, amino acid); Polymer = chain of monomers (e.g., DNA, protein).

92
New cards

Water (biological role)

Solvent for biological systems, polar molecule that stabilizes charged species and buffers temperature changes.

93
New cards

Carbohydrates

Composed of C, H, O; primary energy source; monomer = monosaccharide (e.g., glucose).

94
New cards

Glucose (formula)

Common sugar used for energy: C6H12O6.

95
New cards

Disaccharide example

Lactose is a disaccharide composed of two monosaccharides.

96
New cards

Polysaccharides

Glycogen (animal energy storage) and cellulose (plant structural component).

97
New cards

Lipids

Nonpolar molecules (fatty acids, phospholipids, sterols) used for energy storage, membranes, and signaling.

98
New cards

Fatty acid

structure: long hydrocarbon chain with a terminal carboxyl group; building block of many lipids.

99
New cards

Phospholipid

structural lipid with hydrophilic head and hydrophobic tails; forms cell membranes.

100
New cards

Sterols (cholesterol)

Rigid lipid molecules that modulate membrane fluidity and serve as hormone precursors.

Explore top flashcards

Topic 5 - Forces
Updated 153d ago
flashcards Flashcards (20)
unit 6
Updated 1046d ago
flashcards Flashcards (71)
Unit 4 AOS 1.1
Updated 930d ago
flashcards Flashcards (68)
APUSH UNIT 3
Updated 644d ago
flashcards Flashcards (36)
Endocrine Vocab
Updated 689d ago
flashcards Flashcards (34)
Chapter 2
Updated 734d ago
flashcards Flashcards (32)
Topic 5 - Forces
Updated 153d ago
flashcards Flashcards (20)
unit 6
Updated 1046d ago
flashcards Flashcards (71)
Unit 4 AOS 1.1
Updated 930d ago
flashcards Flashcards (68)
APUSH UNIT 3
Updated 644d ago
flashcards Flashcards (36)
Endocrine Vocab
Updated 689d ago
flashcards Flashcards (34)
Chapter 2
Updated 734d ago
flashcards Flashcards (32)