data science

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/48

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

49 Terms

New cards

population

set of all possible observations of interest to problem at hand

New cards

sample

subset of population containing objects or outcomes that have actually been observed

New cards

parameter

describes a population (mean, standard deviation)

New cards

statistic

describes a sample

New cards

probability sampling

random selection

New cards

non probability

based on convenience

New cards

sampling with replacement

each data unit in the population can reappear in the sample

New cards

sampling without replacement

each data unit in population appears once in the sample

New cards

simple random sampling

equal chance

New cards

stratified

divide into homogenous groups, sample each

New cards

cluster

divide into heterogenous clusters then sample clusters

New cards

why sampling is important

generalise population

prevent bias

reduce computation cost

New cards

cross validation

a sampling technique used during assessment phase to assess how well the model generalises to unseen data

New cards

how a cross validation works

split into k fold

train on k-1 folds

validate on remaining folds

repeat k times

average performance

New cards

aspects of data quality

accuracy, completeness, consistency, timeliness, validity, uniqueness

New cards

variance

{(n-mean)² + (n1-mean)² …}/ n-1

New cards

standard deviation

square root of variance

New cards

histogram

displays the frequency with which values occur in data

New cards

n+3/4 if odd

n+2/4 if even

New cards

X(Lq) if lq is integer

{X(Lq-0.5)+X(Lq+0.5)} / 2 if Lq is not integer

New cards

If Lq is integer: X(n+1-Lq)

If Lq is not integer:

X{(n+1 - Lq - 0.5)+X(n+1-Lq+0.5)}/2

New cards

fences

step = 1.5(Q3-Q1)

UIF = Q3+step

LIF = Q1- step

New cards

Outlier formula

(>1.5XIQR from median)

New cards

what statistics to use on symmetrical data with no outliers

standard deviation, mean

New cards

what to use on skewed data

median, quartiles

New cards

sample space

set of all possible outcomes of an experiment

New cards

event

sub set of the sample space

New cards

product rule

P(A or C) = P(A+C)/P(C)

New cards

addition rule

P(A or C)=P(A)+P(C)-P(AandC)

New cards

system failure (mutually exclusive)

P(System fail)= P(b1)+P(b2)…

New cards

parallel systems(all components must fail)

P(system failure)= P(b1)*P(b2)…

New cards

probability density function(PDF)

shows where the variable is most or less likely to fal

New cards

cumulative distribution function

probability that the random variable is less than or equal to a certain value

New cards

binomial distribution

fixed number of trials
each trial has two possible outcomes

probability of success is equal for each trial

you are counting number of successes

New cards

uniform distribution

all outcomes are equally likely

can be discrete or continuous

e.g rolling a die

New cards

poisson distribution

counting number of events that occur within time space or area

events occur independantly

average rate of occurrence is constant

New cards

exponential distribution

measuring the time or distance between events

events occur continuously and independently at a constant average rate

always positive values

New cards

binomial distribution mean and variance

mean= n*p

variance=n*p(1-p)

me=number of trials

p=probability of success in each trial

New cards

uniform distribution mean and variance

mean=(a+b)/2

variance = (b-a+1)²-1/12

if x is uniformly distributed between a and b variance=(b-a)²/12p

New cards

Poisson mean and variance

mean and variance = lambda

lambda = average number of events per intervalex

New cards

potential mean and variance

mean=1/lambda

variance=1/lambda²

lambda = events per unit time

New cards

normal distribution

continuous data

bell shaped curve

values cluster around meann

New cards

bayes theorem

P(A∣B)=P(BIA)P(A)/P(B)

New cards

probability plots

graphical tool to determine if a set of empirical observations comes from a population

compare CDF calculated for sampled values with the theoretical CDF

scatter plot is made to compare values

New cards

estimate the CDF from the data with n observations

F(xi)=P(X<=xi)=number of observations<=xi/n

New cards

sign test

non parametric

test for median of a random sample or median of the difference of two random samples

New cards

null hypothesis

assumption is correct

New cards

p<=0.05

reject null hypothesisp

New cards

p>0.05

fail to reject

Explore top notes

C2.2 Neural signalling

Updated 154d ago

Note

Viking Expansion, c. 750 — c. 1050

Updated 475d ago

Note

Algebra1 SOL Brain Dump

Updated 559d ago

Note

AP Art History Ultimate Guide

Note

Note

Note

GEOL 101: Final Exam Review

Updated 554d ago

Note

Unit 2: Freedom, Enslavement, and Resistance

Updated 601d ago

Note

C2.2 Neural signalling

Updated 154d ago

Note

Viking Expansion, c. 750 — c. 1050

Updated 475d ago

Note

Algebra1 SOL Brain Dump

Updated 559d ago

Note

AP Art History Ultimate Guide

Note

Note

Note

GEOL 101: Final Exam Review

Updated 554d ago

Note

Unit 2: Freedom, Enslavement, and Resistance

Updated 601d ago

Note

Explore top flashcards

Flashcards (33)

Flashcards (70)

Flashcards (51)

Flashcards (56)

geo unit 3 vocabulary

Updated 1120d ago

Flashcards (41)

Korean1101_L2C1

Updated 211d ago

Flashcards (20)

MENTAL STATUS EXAMINATION (MSE)

Updated 541d ago

Flashcards (25)

Lec 39 Metabolic adaptations to nutritional interventions

Flashcards (43)

Flashcards (33)

Flashcards (70)

Flashcards (51)

Flashcards (56)

geo unit 3 vocabulary

Updated 1120d ago

Flashcards (41)

Korean1101_L2C1

Updated 211d ago

Flashcards (20)

MENTAL STATUS EXAMINATION (MSE)

Updated 541d ago

Flashcards (25)

Lec 39 Metabolic adaptations to nutritional interventions

Updated 415d ago

Flashcards (43)