A subquery is a SELECT
inside another query — helps dynamically filter, insert, update, or delete.
You can use subqueries in:
SELECT
→ filter with values like MAX()
INSERT INTO ... SELECT
→ copy filtered data into another table
UPDATE
→ set values based on results of a subquery
DELETE
→ remove rows matching results of a subquery
Subqueries often work with IN
, =
, or EXISTS
.
Supervised Learning: labeled data → predict outputs (classification, regression)
Unsupervised Learning: unlabeled data → find patterns (clustering, dimensionality reduction)
Reinforcement Learning: agent learns by reward/penalty
Ranking: predict the best item/order (used in recommendations)
Recommendation Systems = personalized filtering for music, shopping, etc.
Features: properties/columns of data (quantitative or categorical)
Labels: target outcomes (what we’re trying to predict)
Examples: (X, y) pairs
Feature Vector: one sample's data (row in X)
Feature Matrix: all samples (full table of features = X)
Target Vector: column of labels (y)
x̄
(x-bar) = mean, not a vector
Derivative: measures rate of change (used in optimization)
Probability: likelihood of an outcome
Independent vs Conditional
Probability Distribution: how likely each outcome is
Gaussian (normal) vs Uniform
train_test_split()
(not tran_test_split
) = splits data for training and testing
scikit-learn models don’t accept NaNs
Import from sklearn.linear_model
, sklearn.model_selection
Abstract Class: can’t be instantiated, contains at least one abstract method
Abstract Method: no implementation; must be overridden by child class
Use @abstractmethod
from abc
module
Static Method (@staticmethod
): doesn’t use instance or class — utility method
Instance Method: regular method that acts on self
Concrete Method: has a full implementation (opposite of abstract)
.set_index()
and .reset_index()
help control DataFrame structure
drop=True
removes the index column after reset
.loc[]
is used for label-based subsetting (by index or column name)
df.groupby()
groups data by a column’s values — often used with aggregation
Subset: A slice of your dataset, filtered down (rows, columns, or both)
Pivot Table: Reshape data to highlight summaries (like Excel pivot tables)
x̄
is the mean, not a vector.
You wrote pd.subset()
— there’s no such method in Pandas.
Your INSERT subquery SQL was missing IN (...)
parentheses.
Still unsure about .loc[]
? Think of it as label-based filtering: df.loc[row_label, column_label]
.
Subqueries allow filtering or setting values dynamically.
ML uses data to predict labels (supervised) or uncover patterns (unsupervised).
Features = input data (X); labels = targets (y); examples = (X, y).
Abstract classes provide structure, not instantiation.
Pandas lets you transform and reshape data with indexing and grouping.
Probability distributions help model uncertainty.
Reinforcement learning involves trial and error in an environment.
Use .loc[]
for label-based filtering and .groupby()
to summarize data.
One hot encoding is essential for handling categorical data in ML models.
Pivot tables and box plots help find insights quickly in messy datasets.
Training a model on data with NaNs → model will error out.
Using WHERE
on an aggregate query → won’t work, must use HAVING
.
Using a static method where instance context is needed → leads to bugs.
Misspelling train_test_split
as tran_test_split
.
Forgetting @abstractmethod
decorator → abstract methods won’t behave correctly.
Thinking .subset()
exists in pandas — it doesn’t.
What’s the difference between a feature vector, feature matrix, and target vector?
When should you use a subquery in SQL?
What does @abstractmethod
do?
What is the difference between a static method and an instance method?
What does df.groupby()
do in pandas?
What happens if you try to fit a model with NaNs in scikit-learn?
Define supervised learning vs. unsupervised learning.
What’s the use of a pivot table?
Why is .loc[]
important in pandas?
What does one-hot encoding solve?
What is a subquery?
A SELECT query that is enclosed inside another query.
The inner SELECT query is usually used to determine the results of the outer SELECT query.
SELECT, INSERT, UPDATE, DELETE
MySQLWorkbench?
SELECT dept
FROM employees
WHERE salary = (SELECT MAX(salary) from employees);
INSERT: uses data returned from the subquery to insert into another table
INSERT INTO orders
SELECT product_id, product_name, sell_price
FROM products
WHERE product_id IN SELECT (product_id FROM projects WHERE sell_price > 1000);
UPDATE: Either a single or multiple columns can be updated when using a subquery with the update statement.
UPDATE employees
SET SALARY = SALARY * 0.35
WHERE age IN (SELECT age FROM employees_b WHERE age >= 27);
DELETE: DELETE from
DELETE FROM employees
WHERE age IN (SELECT age FROM employees_b WHERE age >= 27);
Terminology
Ranking: Optimizing machine learning models to rank candidates, such as music, articles, or products. Typically, the goal is to order the candidates such that the candidates which are most likely to be interacted with (purchased, viewed, liked, etc…) are above other candidates that aren’t as likely to be interacted with.
Supervised Learning: Optimizing machine learning models based on previously observed features and labels. Typically, the goal is to attach the most likely label to some provided features.
Unsupervised Learning: An approach within machine learning that takes in unlabeled examples and produces patterns from the provided data. Typically, the goal is to discover something previously unknown about the unlabeled examples.
Deep Learning: Optimizing neural networks, often with many hidden layers, to perform unsupervised or supervised learning.
Recommendation Systems: Systems with the goal of presenting an item to a user such that the user will most likely purchase, view, or like the recommended item. Items can take many forms, such as music, movies, products, etc… Also called a recommender system.
Feature: A set of quantities or properties describing an observation. They can be binary like “day” and “night”; categorical like “morning, “afternoon”, and “evening”; continuous like 3.141; or ordinal like “big”, “medium”, “large”, where the categories can be ordered.
Labels: Usually paired with a set of features for use in supervised learning. Can be discrete or continuous.
Examples: Pairs of features and labels
Dimensions: Here, the number of features associated with a particular example.
Vector: A feature vector, which is a list of features representing a particular example.
Matrix: An array of values, usually consisting of multiple rows and columns.
Matrix Transpose: An operator that flips a matrix over it’s diagonal.
Polynomial: A function with more than one variable / coefficient pair.
Derivative: Indicates how much the output of a function will change with respect to a change in it’s input.
Probability: How likely something is to occur. This can be independent, such as the roll of the dice, or conditional, such as drawing two cards subsequently our of a deck without replacement.
Probability Distribution: A function that takes in an outcome and outputs the probability of that particular outcome occurring.
Gaussian Distribution: A very common type of probability distribution which fits many real world observations; also called a normal distribution.
Uniform Distribution: A probability distribution in which earn outcome is equally likely; for example, rolling a normal six-sided die.
xbar: It’s a vector? I thought this meant it’s a mean?
Matrix are assigned to a capital letter
Abstract Classes: A class that contains at least one abstract method and is not meant to be instantiated. Abstract classes are meant to act as a parent or base class in the inheritance hierarchy. Typically abstract classes implement some funcitonality that can be used commonly by all child or subclasses.
A class that never has any instances of it.
Should not be instantiated.
A base class, superclass or parent class.
Can implement methods
Abstract Method: A method that is defined in an interface or abstract class and does not provide an implementation. Abstract methods are designed to be overridden by base or subclasses that extend the class or implement the interface they’re defined in.
required to be implemented by any class that inherits from this abstract class
What is @abstractmethod
?
Instance vs Static Methods
@staticmethod
df.set_index()
df.reset_index()
drop=True → Allows you to discard an index.
Indexes make subsetting cleaner
.loc[] is used for subsetting → accepts index values
df.sort_index()