ECEN 250

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/214

There's no tags or description

Looks like no tags are added yet.

Last updated 10:13 PM on 4/17/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

215 Terms

New cards

As engineers we design, build, and operate

Machines

New cards

Steps For a Machine

Take in input, Use a model, Produce Action

New cards

What is a model

A perceived state of the world that is improved overtime.

New cards

What can we learn from the data

Structures, Parameters, Associations, Similarity

New cards

Explain Physics/Mathematical model

Mathematical/physics-based models are rigorously developed based on empirical evidence and understanding of causal relations.

New cards

Machine Learning

Derive the model from the DATA!

New cards

Unsupervised Learning

Machine learns data parameters, structure, relationships for model directly from the data without training data

New cards

Supervised Learning

Machine is given training sets which are: "labeled" data with expected outputs to train the model.

New cards

Examples of Supervised Learning

Email Spam Detection, House Price Prediction, Handwritten Digit Recognition

New cards

Examples of Unsupervised Learning

Grouping Music by Genre, Organizing Photos on Your Phone, News Article Grouping

New cards

Numeric Data

Quantitative, measurable; values are numbers. Eg. 0, 42, 3.1415, 1.602x10^-19

New cards

Categorical

Qualitative, recognizable; values arerestricted to the possible values in a category and canbe represented by a text value or a number

New cards

Types Of Numeric Data

Discrete (1,2,3,4,5,6,7,8,9,10) .

Continuous

(1,1.1,1.12234,2.23434...9.5,9.6,10)

New cards

Types Of Categorical Data

Ordinal (Monday, Friday)

Nominal(Fiat 500, Victor)

New cards

First rule of Machine learning

do not alter the original data(not in the df)

New cards

Dataframe structure

Instance or observation, index attribute, column attribute, datum, feature (LEARN HOW TO PLACE THEM).

New cards

Feature

column attribute + column data

New cards

Feature set or Dataset

set of features covering all attributes

New cards

Dimensionality

number of attributes

New cards

Missing data options

Remove entire feature, Remove an instance/observation

Fill missing datum with some value

New cards

Types of Filling technique for missing data

previous reading, zero, min(), max(), mean(), median(). MORE(regression, KNN)

New cards

The work of filling missing data

Imputation

New cards

What is an outlier

An observation that "lies an abnormal distance from other values in a random sample from a population'

New cards

What to do with outliers

Outlier is not part of the population : Remove.

Outlier is part of the population: Keep .

New cards

For categorical attributes, to operate on the values in our machine learning models.

We convert them to numeric values.

New cards

Descriptive statistics

Understanding the characteristics of our data for greater insights.

New cards

What does out model predict based on

Statistical characteristics of the existing data

New cards

Inferential statistics

Making predictions based on statistical data

New cards

Population

Set of all data in an area of interest.

New cards

Sample

Subset of a population.

New cards

Measures of frequency

Count- Number of data entries.

Proportions-ratio of a number of observations or event.

Occurence percent-Proportions *100

New cards

Measures of Central Tendency

Arithmetic Mean for Sample/ Population, Geometric Mean , Harmonic Mean, Median , Node.(KNOW ALL FORMULAS)

New cards

When to use Arithmetic Mean

An average of individual data

New cards

When to use Geometric Mean

Averaging data from exponential processes: population growth, disease infection.

New cards

When to use harmonic Mean

Averaging of flows : pipeline, volumetric flow, average resistance

New cards

Measured of Dispersion

Range: max/min difference, Variance: spread of the data, standard deviation: sqrt(variance).(KNOW ALL FORMULAS MIGHT BE CALCULATIONS)

New cards

Measured of positions

kth-percentile Rank, Quartile Rank

New cards

Data Scaling

Normalization And Standardization. Should be done in a new df such as dfscaled.

New cards

What is normalization in data processing?

Values are shifted and rescaled so that they end up ranging between 0 and 1.(KNOW THE FORMULA)

New cards

What is another name for normalization?

Min-Max scaling.

New cards

What is Standardization in data scaling ?

The values are centered around the mean with a unit standard deviation.(KNOW THE FORMULA)

New cards

What happens to the mean during Standardization ?

It becomes zero !

New cards

When is Normalization useful ?

Normalization can be useful where distribution of the data is unknown and in algorithms that do not make assumptions of distribution of the data

New cards

When is Standardization useful ?

Standardization is well suited to data that is characterized by a Normal(aka Gaussian) distribution.

New cards

What is Univariate statistics ?

One variable: mean, median, mode, variance, std deviation. ex: Variance

New cards

What is Multivariate statistics?

More than 1 variable. Focuses on the relationships between variables ex: Covariance

New cards

What is Covariance (bivariate) ?

Measure of the relation between the variation of two variables(KNOW THE FORMULA)

New cards

How do we interpret Covariance ?

cov(X,Y) > 0 positively correlated

cov(X,Y) < 0 inversely correlated

cov(X,Y) = 0 X and Y independent (goes from -infinity to +infinity)

New cards

What is Pearson Correlation (bivariate) ?

Measures both the strength and direction of a linear relationship (stays between -1 and 1 )

New cards

Is Variance Covariate or Univariate ?

Univariate

New cards

What's a good tool to visualize correlation ?

Seaborn pair-plots

New cards

What type of learning is Clustering ?

Unsupervised Learning

New cards

What is k-Means Clustering ?

Go watch a youtube video

New cards

In K-Means Clustering what do we alternate between ?

Assign data instances to closest mean and Reassign each mean to the average of its newly assigned points

New cards

When does K- Means clustering stop ?

When no points' assignments change.

New cards

What is K in k means clustering ?

Estimated number of clusters represent a point assigned at an estimated cluster center.

New cards

What are other clustering algorithms ?

(KNOW THE TABLE )

New cards

What is the elbow method ?

The Elbow Method is a technique used in unsupervised learning, especially in K-Means clustering, to determine the optimal number of clusters (K).

New cards

PROBABILITY

DID NOT DO CARDS NEED DEEP UNDERSTANDING

New cards

What is the main goal of linear regression?

Minimize estimation error . To predict new values that follow the previously found trend.

New cards

what is Extrapolation/ Interpolation ?

Interpolation:

Estimating values within the range of known data points.

Extrapolation:

Estimating values outside the range of known data points.

New cards

What is Dependent variable ?

The variable you measure or try to predict.

New cards

What is Independent Variable ?

The variable you control, manipulate, or use to make predictions ?

New cards

What are the causes of error in regression ?

Hidden features: y does not just depend on x and the y-intercept but our model does not include these features, Observational error, Statistical variation, physical noise

New cards

What is parametric Machine Learning ?

Using training data to learn the parameters such as linear regression.

New cards

How to Linear regression on paper ?

(LEARN AN EXAMPLE)

New cards

What do we change when we fit for minimal error in linear regression?

We find the coefficients that minimize the sum the square of the residuals.

New cards

What is the linear regression is not ideal for our data ?

Use a non linear model or a higher degree linear model.

Use a piecewise-linear fit(make bins and apply regression on them).

New cards

What leads to measurement or observational error ?

Limited accuracy in instruments,

Faulty sensors,

Recording errors,

Noise & stochastic variation

New cards

What is a higher degree linear model ?

A polynomial with higher degree.

New cards

What is Mean squared Error ?

A metric for regression error. (KNOW THE FORMULA)

New cards

Definition of Residual ?

Deviation of the observed value from the predicted value of the measured quantity.

New cards

What kind of Model Errors can we get ?

Regression errors: residuals on the training data, Prediction errors: residuals on test data.

New cards

What is Generalization ?

How well a model predicts on data it has not been trained on.

New cards

Can you explain Bias Variance Tradeoff ?

Increasing model complexity (makes it more sensitive to small changes in the dataset which leads to great changes in the parameters ) increase of variance and decrease of bias.

New cards

Define Underfitting?

Bias is too high , the model does not correctly approximate the data , we need to increase the complexity and variance.

New cards

Define Overfitting ?

Variance is too high, the model is very sensitive to any small changes in the dataset and which causes major error in the fit and major error in predictions.

New cards

Describe the bias variance tradeoff graph ?

(PUT THIS IN CHEAT SHEET)

New cards

What is regularization for ?

Prevent overfitting.

New cards

What is regularization using Ridge regression ?

Penalizes large weights(of features) by adding to the cost function (MSE()) a fraction of square of each weight. (Impacts all weights)

New cards

What is regularization using Lasso regression ?

Drives least important weights(of features) to zero. (can make them straight up 0)

New cards

When to use Ridge regression ?

When all features are expected to matter.

New cards

When to use Lasso regression ?

When only some features matter.

New cards

Wheat hyper parameter can we change for Lasso and Ridge ?

Alpha hyperparameter controls how aggressively the cost function is modified by the regularization penalty

New cards

What is classification ?

Given features X, predict label (class) y

New cards

Explain K-nearest neighbors classification ?

Assign class based on the majority vote of the k-closest neighbors.(WATCH YOUTUBE VIDEO)

New cards

KNN is a non-parametric classifier ?

True

New cards

What does KNN classify from ?

Classification from similarity in features geometry.

New cards

What are the tradeoff when choosing K ?

Small k gives relevant neighbors, Large k gives smoother functions.

New cards

When to use KNN ?

Not too many dimensions , lots of training data.

New cards

What are the perks and tweaks of KNN ?

Advantages:

Very fast at training

Learn complex functions

Disadvantages:

Slow at new data

Irrelevant features can confuse the classifier

New cards

What's the problems with Accuracy for KNN.

Not well suited to imbalanced classes .If we have more reds that green and predict all red we'll will get high accuracy whatsoever.

New cards

Understand True/False/Positive/Negative ?

-True Positive

actual class = Positive;

predicted class = Positive

-True Negative

actual class = Negative

predicted class = Negative

-False Positive

actual class = Negative

predicted class = Positive

-False Negative

actual class = Positive

predicted class = Negative.

New cards

What is Precision ?

TP / (TP + FP) (Know by heart)

New cards

What is Recall ?

TP/ (TP + FN) (Know by heart)

New cards

Situations when we want high recall ?

Cancer Detection, Credit Card Fraud Detection

-Better off making false negatives

New cards

Situations when we want high Precision ?

Fake News Detection, Spam Detection

-You would rather miss some positive than flag some for no reason.

New cards

Know the Stucture of Confusion Matrix ?

(MAYBE GO INTO SHEET)

New cards

What is feature selection ?

Decide which features to use in training.

100

New cards

What can provide feedback on feature importance ?

Random Forest