Data science for Business + Fairness

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/109

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

110 Terms

New cards

data science

a set of fundamental principles that guide the extraction of knowledge from data

New cards

what are the pros of artificial neural networks

reduces the need for feature engineering
can fit any function '(non-linear or otherwise)
extreme flexibility
state fo the art performance acriss many domains

New cards

cons of artificial neural networks

‘black bocx’, bas interpretability
optimization is stochastic, solutions possibly unstable

New cards

negative side of using cummulative response / gains cure and lift curve

they are not robust to changes in class proportions

→ ROC curves are (but harder to interpret)

New cards

what do activation functions do in neural networks

they allow neural networks to model complex non-linear patterns

New cards

what type of activation functions exist, and when do we use them

ReLU : used most of the time

sigmoid activation function : for binary classification tasks

softmax activation function : for multinomial classification tasks

New cards

why is gradient descent used

to optimize the neural network weitghts

New cards

main takeaway from backrpopagation

the oder that we chose to compute partial derivatives : from the back of the NN to the fornt

partial derivatives reappear due to sequential dependency of layers and chain rule → storing results allows for fatser computations

New cards

picture : which machine learning method leads to the depicted decision regions

KNN (k=1)

The decision regions shown in the images are a classic example of a Voronoi Tessellation. Here is how we can tell:

Individual Influence: In the left plot, you can see that every single data point (training example) sits inside its own "cell" or "neighborhood" of a specific color. For example, the outlier red dot at (4, 6) creates a small red island in the middle of a green region.
Exact Boundary Fitting: The boundaries are composed of perpendicular bisectors between neighboring points of different classes. This happens when the algorithm classifies a new point based solely on the single closest training point.
Sensitivity to Noise: The "islands" of color (like the green dot surrounded by blue at height 8, width 5.2) indicate that the model is perfectly fitting the training data, which is a hallmark of $k=1$. If $k$ were higher (e.g., $k=5$), the boundaries would be much smoother and less influenced by single outliers.

New cards

why is logistic regression considered a linear classification method

because for a given cut-off, the decision boundaru / surface is linear

New cards

Decision tree assignment decision: Majority voting vs Probability • Discuss using visualization

Majority voting and probabilistic voting are two fundamentally different approaches to collective decision-making. Majority voting results in a single, definitive winner, while probabilistic voting uses a random process to select an outcome with a probability based on voter preferences or power.

New cards

why do we use information gain and not directly imprvemen of accuracu in evaluating candidate splits in decsiosn trees ?

While accuracy only cares about the final "win/loss" at a node, Information Gain measures the reduction in uncertainty, which is a much more sensitive guide for building an effective tree.

New cards

what is a gradient

a vector that describes the derivate or rate of change of a function

componts of the vector are the partial derivatives of the loss function with respect to the network parameters

points in the direction of greatest increase of a function

is zero at a local maximum of local minimum

New cards

data mining

is the extraction of knowledge form data, via technologies that incorporate these principles

New cards

churn

customers switching from one company to another

→ attracting new customers is much more expensive than retaining existing ones

New cards

what ensemble method can we use to reduce bias

boosting and stacking

New cards

what is a cross entropy metric (log_loss in scikitlearn)

Cross-entropy is a metric used to measure the difference between two probability distributions: the true labels (what actually happened) and the predicted probabilities (what your model thinks will happen).

New cards

what ensembble methods can we use to reduce variance

bagging methods, forest of random trees

New cards

polynomial regression

when you add powers of each feature as new features, then train a lineair model on this extended set of features

New cards

biased model

errors due to wrong assumptions inherent to the learning algorithm

(low model complexity)

New cards

variance in the model

errors due to sensitivity to fluctuations in the training dataset

→ high model complexity

New cards

bagging : bootstrap aggregation

use the same training algorithm for every predictor but train them on different random subsets of the training set
sample random subsets from the training set with replacement (put samples back in the set afer a subset was used)
lower variance compared to a single predictor trained on the original set

New cards

what happens to the slope of your ridge regression when you increase alpha

Ridge regression, the parameter alpha acts as a "penalty" on the size of your coefficients (the slopes). As you increase ^alpha, the slopes of your regression line are forced to shrink toward zero.²

When $\alpha$ is 0: It acts exactly like standard Linear Regression.
As $\alpha$ increases: The model is "punished" for having large $\beta$ (slope) values. To keep the total cost low, the model must choose smaller values for the slopes.
As $\alpha \to \infty$: The slopes get closer and closer to 0, eventually resulting in a horizontal line (though they never mathematically hit absolute zero in Ridge regression).

New cards

DDD : data driven decision making

the practice of basing decisions on the analysis of data, rather than purely on intuition

New cards

how does a bagging ensemble of trees work

1. Bootstrapping (Data Sampling)

The algorithm creates multiple different versions of your original dataset. It does this using sampling with replacement.

If you have 100 rows of data, the algorithm might pick 100 rows at random to create "Bag 1."
Because it’s "with replacement," some rows will be picked twice, and others won't be picked at all.
This process is repeated for as many trees as you want in your ensemble (e.g., 100 bags for 100 trees).

2. Parallel Training (Building the Trees)

Each "Bag" of data is used to train its own individual decision tree.

These trees are typically grown deep and are not pruned.
Because each tree saw a slightly different version of the data, they will all develop different "opinions" or "biases."
Crucially, these trees are trained in parallel—they don't talk to or influence each other.

3. Aggregating (The Final Vote)

When you want to make a new prediction, you pass the input through every single tree in the ensemble.

For Classification: Each tree "votes" for a category. The class with the most votes wins.
For Regression: Each tree predicts a number. The final answer is the average of all those numbers.

New cards

random forest

adds extra randomness compared to bagging

subsamples are drawn with replacement
and the splits are chosen from a random subset of the total number of features

→ idea : lots of very diverse high variance and low bias trees combine to generate a low bias low variance forest

New cards

data processing technologies

do not involve extracting knowledge or data-driven decision making

New cards

big data

means datasets that are too large for traditional processing systems, and therefore require new processing technologies

New cards

classification and class probability estimation

attempt to predict, for each individual in a population, which of a set of classes this individual belongs to

→ class probability estimation : a score representing the probability that that individual belongs to each class

New cards

whan do we mean when we say that random forests make it easy to identify a features importance

by looking at how much the tree nodes that use that feature reduce impurity on average (across all trees in the forest)

New cards

regression

attemps to estimate or predict, for each individual, the numerical value of some variable for that individual

New cards

what is boosting

any ensemble method that can combine several weak learners into a strong learner (reduce bias)

train predictors sequentially

iterative procedure where observations currently misclassified are made more important → remove errors sur to model’s bad assumptions

New cards

similarity matching

attemps to find associations between entities based on transactions involving them

New cards

AdaBoost

a first base classifier (such as a decision tree) is trained and used to make predictions on the training set. The relative weight of misclassified training instances is then increased. A second classifier is trained using the updates weights and again it makes predictions on the training set, weights are updated

New cards

profiling

attempst to characterize the typical behavior of an individual, group or population

New cards

stacking

combine predictions of different models

use a regression model to learn the optimal weight of each model’s predictioon

ideally reduces variance and bias

New cards

link-prediction

attempts to predict connextions between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link

New cards

data reduction

attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set

New cards

causal modeling

attempts to help us understand what events or actions influence others

New cards

unsupervised learning

when there is no target

New cards

supervised data mining problem

when there is a specific target defined

→ classification and regression

New cards

CRISP DM : business understanding

it is vital to understand the problem to be solved

New cards

CRISP DM : Data understanding

it is important to understand the strength and limitations of the data bc rarely is there an exact match with the problem

New cards

CRISP DM : data preparation

analytic technologies often require data to be in a form different from how the data are provided naturally, and some conversion will be necessary

ex : converting data to tabular format, removing or inferring missing values, and converting data to different types/ normilising numerical values

New cards

CRISP DM : modeling

finding some sort of model or pattern capturing regularities in the data

New cards

CRISP DM : evaluation

assess the data mining results tigorously and gain confidence that they are valid and reliable before moving on

New cards

CRISP DM : deployment

the results are put into real use in order to realize some return on investment

New cards

on what does data mining focusses

on automated search for knowledge, patterns, or regularities from data

New cards

query (not data mining)

is a specific request for a subset of data or for statistics about data

New cards

data warehouses (not data mining)

collects and coalesce data from across an entreprise, often from multiple transaction-processing systems, each with its own database

New cards

machine learning

is a subset of artificial intelligence, and is concerned with models for improving the knowledge or performance of an intelligent agent over time, in response to the agent’s experience in the world

New cards

what is information

a quantity that reduces uncertainty about something

New cards

what is a model

a simplified representation of reality created to serve a puprose,

it is simplified based on some assumptions about what is and is not important for the specific purpose

New cards

model induction

the creation of models from data

New cards

what do we mean when we say that we wnat to have groups as pure as possible

we want groups that are homogeneous with respect to their target variable

New cards

what is entropy

is a measure of disorder that can be applied to a set

disorder = how mixed (impure) the segment is with respect to the properties of interest

New cards

information gain IG

is used to measure how much an attribute improves (decreases) entroppy over the whole segmentation it creates. Information gain measures the change in entropy due to any amount of new information being added

New cards

what type of learning are decision trees

supervised segmentation, because each leaf contains a value for the target variable

New cards

what is the goal at each step with classification trees

is to select an attribute to partition the current group ointo subgroups that are as pure as possible with respect to the target variable

New cards

what is the goal of data mining

to tune the parameters so that the model fits the data as well as possible

→ parameter modeling

New cards

decision boundaries

boundarues that partition the instance space into similar regions

New cards

loss

a general term for error penalty

a loss function determines how much penalty should be assigned to an instance based on the error in the model’s predictive value

New cards

differences between classification trees and linear classifiers

DT used decision boundaries that are perpendicular to the instance-space axes, whereas the linear classifier can use decision boundaries of any direction or orientation
DT segments the instance space recursively, using a divde and conquer approach. A linear classifier places a single decision surface through the entire space. It is limited to a division into two segments

New cards

what type of tasks can we use neural networks

for classification and regresison

New cards

overfitting the data

is the tendency of data mining procedures to tailor models to the training data, at the expense of generalization to previously unseen data points

New cards

generalization

is the property of a model or modeling process, whereby the model applies data that were not used to build the model

New cards

fitting graph

show the accuracy of a model as a function of complexity

New cards

generalization performance

by comparing the predicted values with the hidden true values

New cards

base rate

the percentage of the majority class

New cards

why is overfitting bad

as the model gets more complex, it is allowed to pick up harmful spurious correlations. these correlatios are idiosyncracies of the specific training set used and do not represent characteristics of the population in general

New cards

holdout data

data not used in building the model, but for which we do know the actual value of the target variable

New cards

cross-validation

is a more sophisticated holdout training and testing procedure. We would like not only a simple estimate of the generalization performance, but also some statistics on the estimated performance, such as the mean, and the variance, so that we can understand how the performance is expected to vary across datasets

→ cross validation computed its estimates over all the data by performing multiple siplits and systematically swapping out samples for testing

New cards

how does cross-validation work

begins by splitting a labeled dataset into k partition - folds

it iterates training and testing k times, in a particular way

in each iteraction of the cross-validation, a different fold is chosen as the test data

1. Split: Divide the 100 rows into 5 equal "folds" (20 rows each).

2. Iterate: Run 5 separate training loops. In each loop, use a different fold as the validation set and the remaining 4 folds (80 rows) as the training set.

3. Score: Calculate the performance (e.g., accuracy) for each of the 5 loops.

4. Average: Take the mean of those 5 scores to get a robust estimate of how the model performs on unseen data.

New cards

True or False: After performing K-fold Cross-Validation, we end up with K different models to use in production.

False. * During the CV process, we do technically train K temporary models to evaluate the chosen hyperparameters.
- The Final Step: Once we are satisfied with the results, we throw away those 5 models and train one final model on the entire dataset using the best parameters found during CV.

Key Insight: CV is for assessment, not for creating the final production model.

New cards

cross validation Why go through the trouble of training 5 times instead of just doing one simple Train/Test split?

1. Stability: It ensures every data point gets a chance to be in the "validation set," making the performance score less dependent on a "lucky" or "unlucky" split.

2. Data Efficiency: It is especially useful for smaller datasets where a single split might leave too little data for training.

New cards

learning curve

the generalization performance againt the amount of training data

generally : the more training data becomes available, the better the generalization performance (up to a certain point)

New cards

what techniques do DT use to avoid overfitting

1) stop growing the tree before it gets to complex

2) gros the tree until it is too large, then prune it back, reducing its size

New cards

nested holdout testing

the validation set is separate from the final test set, on which we are never going to make any modeling decisions

New cards

If you use the same validation set to tune multiple hyperparameters (e.g., first max_depth, then min_samples_leaf), what is the primary risk?

Overfitting to the validation set. * By repeatedly "peeking" at the validation set to make decisions, the hyperparameters start to capture the specific noise and quirks of those specific validation samples rather than general patterns.

The validation score will become optimistic (too high), and the model will perform worse on truly unseen data

New cards

How do you safely tune multiple hyperparameters without "exhausting" your validation data?

Use Grid Search or Random Search with K-Fold Cross-Validation.

Instead of tuning one parameter and then another, you test a "grid" of combinations (e.g., Depth 3 + Leaf 5, Depth 5 + Leaf 2) simultaneously.
By using K-Fold, the "validation set" changes in every iteration, making it much harder for the model to overfit to a specific subset of data

New cards

To ensure your tuning process hasn't overfitted, what is the final step in the workflow?

The Hold-out Test Set.

You must keep a completely separate "Test Set" that is never used during any part of the tuning or training process.
You only touch this set once at the very end to confirm that your tuned model actually works.

New cards

what is regularization

trying to optimize not just the fit to the data, but a combination of fit to the data and simplicity of the model

New cards

what do we become when we incorporate L2 norm penalty into standard least squares linear regression

we get ridge regression

New cards

what do we become when we incorporate L1 norm (sum of the absolute valuesà penalty into standard least squares linear regression

we get lasso

New cards

grid search

the general approach to optimizing the parameter values of a data mining procedure

New cards

what are the most-similar instances called

nearest neighbors

New cards

what is the result of having a high k (smoothness)

the higher k the more the estimates are smoothed out amongs neigbors

New cards

how many k (in kNN) results in having more erratic boundaries

with 1-NN classifier

is more sensitive to outliers, bc it retrieves only single instances,

→ tends to overfit a LOT

New cards

what happens when we use 1-NN (what type of model will you have)

we get an extremely complex model, which places complicated boundaries such that every training example will be in a region labeled by its own class

New cards

what might be a problem with K-NN

lack of an explicit, interpretable model

importance of feature selection → is very sensitive to unnuseful variables

New cards

clustering

the idea of finding natural groupings in the data

New cards

supervised modeling

involves discovering patterns to predict the value of a specific variable, based on data where we know the values of the target variable

New cards

unsupervised modeling

does not focus on a target variable

instead it looks for other sorts of regularities in a set of data

New cards

positive example

= one worthy of attention or alarm

a negative example is uninteresting / benign

New cards

accuracy

the proportion of correct decisions

New cards

what does a confusion matrix

separates out the decisions made by the classifier, making explicit how one class is being confused for another

New cards

why can the class distribution be unbalanced / skewed

because the unusial / interesting class is rare among the general population

New cards

what is the base rate of a class

how well a classifier would perform by simply choosing the majority class for every instance

100

New cards

what problems exist regarding the use of accuracy as a metric

it makes no distinction between false positive and false negative errors → it makes the assumption that both errors are equally important

and unbalanced class distribution

Explore top notes

Case Study 1: Japanese Expansion in East Asia (1931-1941)

Updated 1020d ago

Note

BIO125 Lec First Exam Flashcards

Updated 742d ago

Note

28601_ELS_ATOMS_I_ELS_SEUS_ENLLACOS

Updated 434d ago

Note

Revolt of the Northern Earls 1569

Updated 1223d ago

Note

2.6: indoor air pollution

Updated 1148d ago

Note

Mission Statements

Updated 300d ago

Note

Les verbes avec l'auxiliaire être

Updated 1241d ago

Note

AP Physics C: Electricity and Magnetism Ultimate Guide

Updated 1028d ago

Note

Case Study 1: Japanese Expansion in East Asia (1931-1941)

Updated 1020d ago

Note

BIO125 Lec First Exam Flashcards

Updated 742d ago

Note

28601_ELS_ATOMS_I_ELS_SEUS_ENLLACOS

Updated 434d ago

Note

Revolt of the Northern Earls 1569

Updated 1223d ago

Note

2.6: indoor air pollution

Updated 1148d ago

Note

Mission Statements

Updated 300d ago

Note

Les verbes avec l'auxiliaire être

Updated 1241d ago

Note

AP Physics C: Electricity and Magnetism Ultimate Guide

Updated 1028d ago

Note

Explore top flashcards

Flashcards (23)

Flashcards (71)

Flashcards (20)

Spanish III Honors Final Vocab

Updated 985d ago

Flashcards (402)

Thermodynamics and Fluid Mechanics (AE-280) Final

Updated 652d ago

Flashcards (88)

INTECH 2100 - Introduction to Multimedia

Updated 871d ago

Flashcards (141)

U.S. History Review Sheet

Updated 844d ago

Flashcards (31)

Microeconomics Monster Vocab

Flashcards (83)

Flashcards (23)

Flashcards (71)

Flashcards (20)

Spanish III Honors Final Vocab

Updated 985d ago

Flashcards (402)

Thermodynamics and Fluid Mechanics (AE-280) Final

Updated 652d ago

Flashcards (88)

INTECH 2100 - Introduction to Multimedia

Updated 871d ago

Flashcards (141)

U.S. History Review Sheet

Updated 844d ago

Flashcards (31)

Microeconomics Monster Vocab

Updated 839d ago

Flashcards (83)