Data Science

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/48

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

49 Terms

New cards

Nominal

data is a categorical type of data that represents labels or names without a specific order or ranking.

New cards

Ordinal

data is a categorical type of data that has a defined order or ranking, allowing for comparison of relative positions.

New cards

Categorical

data represents distinct categories or groups.

New cards

Numerical

data is a type of data that represents quantifiable values, allowing for mathematical calculations and comparisons.

New cards

Discrete

Difference between units on scale is constant but can only take certain values

New cards

Interval

Difference between units on scale is constant, but no zero point - measures exact difference.

New cards

Scatterplot/ line plot/ scatterplot matrix

Compare two variables (numerical/numerical)

New cards

Joint plot

Compare two variables (numerical / numerical data)

New cards

Bivariate Kernel Density Plot

Numerical/ numerical data

New cards

Boxplot / violin plot

categorical / numerical data

New cards

Heatmap

Categorical / categorical data

New cards

Probability Distribution

between 0 and 1

New cards

Bernoulli Random Variable

Two possible outcomes

New cards

Binomial Random Variable

How many successes after n times

New cards

Central limit theorem

Distribution of the sample mean will approximately normal if large sample.

-normalize data by data version of mean

New cards

Bootstrap

Sampling with replacements

-provide consistence

-helps quantify errors when making inferences

New cards

Confidence Intervals

Measure variation in a statistic

New cards

Null Hypothesis

No effect or nothing of interest

New cards

Alternative Hypothesis

There is an effect

New cards

Test Statistic

Denote difference between null and alternative hypothesis

New cards

Rejection Criterion

Rejects Null Hypothesis

New cards

Type I Error

Rejects Null Hypothesis when it is true, false positive

New cards

Type II Error

New cards

Hypothesis Testing

Model that helps decide between different hypotheses using falsification

New cards

P-Value

probability of getting a more extreme value than the observed test statistic given Null hypothesis is true.

-Lower p value = lower the risk of type I error

New cards

Rejects null hypothesis

p-value < 0.05

-Null Hypothesis unlikely to be making a type I error

New cards

Binomial Distribution

Probability distribution that describes the number of successes in a fixed number of independent trials of binary experiment.

probability of K successes n trials and p probability of successes

New cards

t-test

Numerical vs Categorical/ Two Categories

Compares variables with two values vs numerical. It answers the question of means of two groups are different.

New cards

Kruskal-Wallis Test

Same as t-test but with many categories.

Hypothesis test that compares multiple values vs numerical variables but does not specify which category is different.

-ranks the sum of two and check if ranks differ

New cards

Pearson’s Correction

Numerical vs Numerical

Measures strength/direction of the linear relationship and answers the question of two variables move together.

Between -1 and 1, if 0 = non-linear relationship

New cards

Spearman’s Correlation

Used if there is non-linear relationship

correlation does not equal to causation

New cards

X^s Test of Independence

compares two categories and measure whether there is dependence.

H0: independent, no association
Ha: dependent, is an association

New cards

Family Error

Probability of making at least one type I error.

New cards

What does probability answers

Count all the test in the same statistical family together

New cards

Bonferroni Correction

k test simultaneously

rejects p-value <= alpha/2
Adjust for multiple comparisons to control the family-wise error rate

New cards

Multiple Hypothesis Test

The more hypothesis testing the more type I error accrue

Using Bonferroni Correction will help reduce this risk by adjusting the threshold.

New cards

Alpha_new

adjust significance level for each individual test

New cards

Data Set Cards

New cards

Clustering

Unsupervised technique used to group similar datas

New cards

K-Means

Distance-based clustering algorithm

Uses distance to measure intra - cluster “coherence“
Finds local optimum
clustering metric-sum of squares

Pros:

Simplicity
Scalability
Convergence

Cons:

Sensitive to outliers
Cluster shape
Choosing K values

New cards

Convex

data from A to B without going out of the circle

New cards

Elbow

Find the optimal K value for K-means

Can be very hard to find the “elbow“ when line is linear

New cards

Silhouette Scores

Metric for evaluating any clustering (not only to choose the best k for k-means). Returns the average of silhouette coefficients over all samples

-1: cluster is incorrect

0: overlapping

1: strong structure

New cards

Hierarchical Clustering

Bottom up method

does not require number of cluster k to run
Can interpret dendrogram (“tree-based“)
Expensive in terms of comute and memory

New cards

Curse of Dimensionality

High dimensional data tends to be sparse and hard to analyze, more features cause complication to model.

New cards

Principal Components Analysis (PCA)

reduce dimensionality while preserving as much information as possible.

Goal: find the subspace and project the data.
Benefits: Simplifies data without losing information and helps with visualization.

New cards

The Process of PCA

1)Feature Matrix

2) Standardize data(sensitve to scale)

3) Compute the Covariance Matrix E

4) Find the Eigenvalues and Eigenvector

5) project the data

New cards

Eigenvalues

How much variance (information) in each direction

New cards

Eigenvector

new axes or direction

Explore top notes

Chapter Thirteen: Sexual Disorders and Gender Variations

Updated 728d ago

Note

Verbal Forms

Updated 322d ago

Note

Блок 4: Питание - Питательные вещества

Updated 744d ago

Note

Chapter 4: Reactions in Aqueous Solutions

Updated 134d ago

Note

Unit 6: Integration and Accumulation of Change

Updated 711d ago

Note

Geo Notes - Sem 2 2024

Updated 168d ago

Note

AP US HISTORY PERIOD 1 & 2 (Part 2)

Updated 1082d ago

Note

Drivers Ed

Updated 374d ago

Note

Explore top flashcards

cream pack 1 - issues for psychology as a science

Flashcards (33)

Flashcards (25)

Flashcards (151)

Midterm 2 for CECS 342

Updated 356d ago

Flashcards (167)

Numbers and descriptions

Flashcards (27)

Flashcards (108)

Flashcards (46)

Flashcards (166)