272 old exam short answer

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/35

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

36 Terms

New cards

Supervised vs Unsupervised Learning

Supervised uses labeled data for prediction; unsupervised finds patterns in unlabeled data.

New cards

Examples of Supervised Learning

Fraud detection, email spam classification — clear input-output pairs.

New cards

Examples of Unsupervised Learning

Customer segmentation, anomaly detection — no predefined labels.

New cards

Merging vs Concatenating in pandas

Merging joins DataFrames on keys; concatenating stacks them by axis.

New cards

When to use merge in pandas

When combining data from two sources with a common column (e.g., ID).

New cards

ANOVA Hypotheses

Null: all group means are equal; Alternative: at least one differs.

New cards

When to use ANOVA vs t-test

Use ANOVA for 3+ groups; t-test for 2 groups.

New cards

What is an Outlier

A value far from other data points that can distort model accuracy.

New cards

Effects of Outliers on Models

Can skew regression lines, increase error, or influence clustering.

New cards

SQL vs Text File

SQL databases handle large structured data more efficiently than text files.

New cards

SQLite

Lightweight SQL database stored in a single file; great for local analysis.

New cards

Categorical Variable

Variable with non-numeric categories (e.g., color, gender, job title).

New cards

Encoding Categorical Variables

Use one-hot encoding or label encoding for regression/classification.

New cards

Overfitting

Model memorizes training data but fails to generalize; common with small datasets.

New cards

Preventing Overfitting

Use cross-validation, regularization, pruning, or dropout.

New cards

Principal Component Analysis (PCA)

Reduces dimensionality while preserving variance via uncorrelated components.

New cards

Randomization in Ensembles

Introduces diversity in training data or features to reduce overfitting.

New cards

Random Forest

Ensemble of decision trees trained on bootstrapped data with random feature subsets.

New cards

Classification vs Regression

Classification = categories; Regression = continuous values.

New cards

Time Series Components

Trend, seasonality, cyclical, and random components.

New cards

Autocorrelation

Correlation of a time series with a lagged version of itself.

New cards

Importance of Autocorrelation

Helps identify patterns and predict future values in time series.

New cards

R² (coefficient of determination)

Measures how well the regression model explains data variability.

New cards

Use of R²

Compares model fit across different regression models.

New cards

Random Numbers in Data Analysis

Used in simulation, sampling, and randomized algorithms.

New cards

Monte Carlo Simulation

Method using repeated random sampling to model probabilistic systems.

New cards

Gradient Descent Risk Conditions

If learning rate is too high or function is not convex, it may miss the optimum.

New cards

Learning Rate in Gradient Descent

Controls how fast weights are updated; too high may overshoot, too low may be slow.

New cards

Overfitting with Small Datasets

Model may memorize training examples due to lack of generalization data.

New cards

Null Hypothesis in ANOVA

Group means are equal.

New cards

Alternative Hypothesis in ANOVA

At least one group mean is different.

New cards

SQL Definition

Structured Query Language used to manage and query relational databases.

New cards

Purpose of PCA

Reduce data dimensions while retaining most of the variance.

New cards

Ensemble Learning

Randomization helps reduce overfitting and improve model diversity.

New cards

Encoding Example

Gender as Male=1, Female=0 or one-hot: [1,0], [0,1].

New cards