data science

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/48

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

49 Terms

1
New cards

population

set of all possible observations of interest to problem at hand

2
New cards

sample

subset of population containing objects or outcomes that have actually been observed

3
New cards

parameter

describes a population (mean, standard deviation)

4
New cards

statistic

describes a sample

5
New cards

probability sampling

random selection

6
New cards

non probability

based on convenience

7
New cards

sampling with replacement

each data unit in the population can reappear in the sample

8
New cards

sampling without replacement

each data unit in population appears once in the sample

9
New cards

simple random sampling

equal chance

10
New cards

stratified 

divide into homogenous groups, sample each

11
New cards

cluster

divide into heterogenous clusters then sample clusters

12
New cards

why sampling is important

generalise population

prevent bias

reduce computation cost

13
New cards

cross validation

a sampling technique used during assessment phase to assess how well the model generalises to unseen data

14
New cards

how a cross validation works

split into k fold

train on k-1 folds

validate on remaining folds

repeat k times

average performance

15
New cards

aspects of data quality

accuracy, completeness, consistency, timeliness, validity, uniqueness

16
New cards

variance

{(n-mean)² + (n1-mean)² …}/ n-1

17
New cards

standard deviation

square root of variance

18
New cards

histogram

displays the frequency with which values occur in data

19
New cards

Lq

n+3/4 if odd

n+2/4 if even

20
New cards

Q1

X(Lq) if lq is integer

{X(Lq-0.5)+X(Lq+0.5)} / 2 if Lq is not integer

21
New cards

Q3

If Lq is integer: X(n+1-Lq)

If Lq is not integer:

X{(n+1 - Lq - 0.5)+X(n+1-Lq+0.5)}/2

22
New cards

fences

step = 1.5(Q3-Q1)

UIF = Q3+step

LIF = Q1- step

23
New cards

Outlier formula

(>1.5XIQR from median)

24
New cards

what statistics to use on symmetrical data with no outliers

standard deviation, mean

25
New cards

what to use on skewed data

median, quartiles

26
New cards

sample space

set of all possible outcomes of an experiment

27
New cards

event

sub set of the sample space

28
New cards

product rule

P(A or C) = P(A+C)/P(C)

29
New cards

addition rule

P(A or C)=P(A)+P(C)-P(AandC)

30
New cards

system failure (mutually exclusive)

P(System fail)= P(b1)+P(b2)…

31
New cards

parallel systems(all components must fail)

P(system failure)= P(b1)*P(b2)…

32
New cards

probability density function(PDF)

shows where the variable is most or less likely to fal

33
New cards

cumulative distribution function

probability that the random variable is less than or equal to a certain value

34
New cards

binomial distribution

fixed number of trials
each trial has two possible outcomes

probability of success is equal for each trial

you are counting number of successes

35
New cards

uniform distribution

all outcomes are equally likely

can be discrete or continuous

e.g rolling a die

36
New cards

poisson distribution

counting number of events that occur within time space or area

events occur independantly

average rate of occurrence is constant

37
New cards

exponential distribution

measuring the time or distance between events

events occur continuously and independently at a constant average rate

always positive values

38
New cards

binomial distribution mean and variance

mean= n*p

variance=n*p(1-p)

me=number of trials

p=probability of success in each trial

39
New cards

uniform distribution mean and variance

mean=(a+b)/2

variance = (b-a+1)²-1/12

if x is uniformly distributed between a and b variance=(b-a)²/12p

40
New cards

Poisson mean and variance

mean and variance = lambda

lambda = average number of events per intervalex

41
New cards

potential mean and variance

mean=1/lambda

variance=1/lambda²

lambda = events per unit time

42
New cards

normal distribution

continuous data

bell shaped curve

values cluster around meann

43
New cards

bayes theorem

P(A∣B)=P(BIA)P(A)/P(B)

44
New cards

probability plots

graphical tool to determine if a set of empirical observations comes from a population

compare CDF calculated for sampled values with the theoretical CDF

scatter plot is made to compare values

45
New cards

estimate the CDF from the data with n observations

F(xi)=P(X<=xi)=number of observations<=xi/n

46
New cards

sign test

non parametric

test for median of a random sample or median of the difference of two random samples

nu

47
New cards

null hypothesis

assumption is correct

48
New cards

p<=0.05

reject null hypothesisp

49
New cards

p>0.05

fail to reject