stats + mechanics

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/27

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

28 Terms

New cards

P (A / B)

P (A ∩ B) / P (B)
[P (B / A) x P (A)] / P (B)
P (A) , if independent

<ul><li><p>P (A ∩ B) / P (B)</p></li><li><p>[P (B / A) x P (A)] / P (B)</p></li><li><p>P (A) , if independent </p></li></ul><p></p>

New cards

P (A ∪ B)

all of it

New cards

P (A ∩ B)

intersection

New cards

mutually exclusive

P (A ∩ B) = 0

New cards

independent

P (A) x P (B)

New cards

addition law

P (A ∪ B) = [P (A) + P (B)] - P (A ∩ B)

New cards

census

measures every member of a population

+ accurate result
- expensive, testing may destroy

New cards

sampling units

individuals of a population

New cards

sampling frame

list of sample units

New cards

random sampling

simple random sampling
equal change of being selected. uses a random number / lottery system
- + bias free
- - needs sampling frame
systematic sampling
take every k^thunit, where k = population / sample. pick a random number between 1 and k to start
- + quick to start
- - needs sampling frame
stratified sampling
sample represents groups (strata) of a population. (sample / population) x strata for each strata, and picked randomly
- + reflects population
- - population must be classified in strata

New cards

non random sampling

quota sampling
like stratified, but strata filled up by interviews / researcher
- + no sampling frame
- - non random so potential bias
opportunity sampling
quota filled by who is available at the time
- + cheap and easy
- - unlikely to be representative, researcher bias

New cards

types of data

qualitive - non numerical

quantitative - numerical

New cards

large data set - stations in UK

cambourne (coast, south)
hurn (coast, south)
heathrow (south)
leeming (north)
leuchars (coast, north)

coastal stations = windy, rainy

south = warmer, more hours in the day

recorded for 6 months only, may - october 1987 - 2015

<ol><li><p>cambourne (coast, south)</p></li><li><p>hurn (coast, south)</p></li><li><p>heathrow (south)</p></li><li><p>leeming (north)</p></li><li><p>leuchars (coast, north)</p></li></ol><p></p><p>coastal stations = windy, rainy</p><p>south = warmer, more hours in the day</p><p>recorded for 6 months only, may - october 1987 - 2015</p>

New cards

large data set - international stations

perth (australia)
southern hemisphere so seasons switched
very hot in summer

beijing (china)
really hot and rainy in summer
really cold in winter
jacksonville (florida, usa)
very warm
prone to hurricanes

<ol><li><p>perth (australia)</p><p>southern hemisphere so seasons switched</p><p>very hot in summer</p></li></ol><p></p><ol start="2"><li><p>beijing (china)</p><p>really hot and rainy in summer</p><p>really cold in winter</p><p></p></li><li><p>jacksonville (florida, usa)</p><p>very warm</p><p>prone to hurricanes</p></li></ol><p></p>

New cards

large data set - data

rainfall
‘tr’ means trace, treat it as 0 in calculations
n/a
not available, so can’t be used in a sample
cloud cover
oktas, discrete values of 0 - 8. measures how many 1/8 of the sky is covered
max gust
knots, 1 kn = 1.15 mph

New cards

sum of

New cards

measures of location, learn how to do on calc come back

measure of central tendency
- mean
  ^-x = Σx / n
  ^-x of grouped data = Σfx / Σf
- mode
- median
quartiles
- for listed data -
  Q₁ = n / 4
  Q₂ = n / 2
  Q₃ = 3n / 4
  where n is the number of sampling units
  if a decimal, round up
  if whole, find midpoint with next number
- for grouped data -
  Q₁ = n / 4
  Q₂ = n / 2
  Q₃ = 3n / 4
  percentiles , e.g., 57_th = 0.57 x n
  deciles , 10% chunks , e.g., D₃= P₃₀ = 0.3 x n
  do not round. use linear interpolation

New cards

linear interpolation

for grouped data

cumulative frequency for quartiles or percentiles
find the class interval

New cards

measures of spread learn how to do on calc come baxj

interquartile range
IQR = Q₃ - Q₁
+ ignores extremes
interpercentile range
IPR = P_n2- P_n1
variance

σ² = Σ (x - x-)2 / n

or, the mean of the squares [(sum of x2) / n] - square of the mean [x-2]

if it’s frequency data, then σ² = [(sum of fx2) / total frequency] - [fx-2] where fx is data class x frequency. midpoints if continuous

on calc, stats, put into list, calc, 1 var

standard deviation
how much data scatters around the mean on average
root of the sum of the squared deviations divided by number of values
σ = _/ (1 / n) x (x1 - x-)2 +(x2 - x-)2 + … + (xn - x-)2
negative if below the mean, positive if above the mean
square root of variance equation

New cards

coding

if y = ax + b,

then always y^- = ax^- + b

New cards

representations of data

cumulative frequency
mark the quartiles on the y-axis and find the corresponding value on the x-axis

box plots
median, LQ, UQ, highest and lowest values, and outliers marked
can be used to compare location and spread of different data sets
can be made from a cumulative frequency graph
histograms
for continuous data
no gaps
frequency density = freq / class width
area = freq x k

compare = 1 measure of location, 1 measure of spread

<ol><li><p>cumulative frequency</p><p>mark the quartiles on the y-axis and find the corresponding value on the x-axis </p></li></ol><p></p><ol start="2"><li><p>box plots</p><p>median, LQ, UQ, highest and lowest values, and outliers marked</p><p>can be used to compare location and spread of different data sets </p><p>can be made from a cumulative frequency graph </p><p></p></li><li><p>histograms</p><p>for continuous data </p><p>no gaps</p><p>frequency density = freq / class width </p><p>area = freq x k </p></li></ol><p></p><p>compare = 1 measure of location, 1 measure of spread </p><p></p>

New cards

regression and correlation

product moment correlation coefficient (PMCC)
-1 ≤ r ≤ 1
measures strength and + / - of correlation

regression line
line of best fit, y = a + b x
a = y when x = 0
b = how much y changes when x increases by 1

interpolation = estimating inside the data range
more reliable
extrapolation = estimating outside the data range
not reliable

if y = ab^x , exponential
log y = log a + x log b
y = c + m x
(m = log b, x = x)

if y = axⁿ, polynomial
log y = log a + n log x
y = c + m x
(m = n, x = log x)

New cards

discrete uniform distribution

probabilities of outcomes are all equal

mean is in the middle, and is the same as the median, = (a + b) / 2

all value add up to 1

standard deviation = _/ (b - a)2 / 12

probability questions usually finding an interval. all probability the same, so find as ratio (d - c) / (b - a)

<p>probabilities of outcomes are all equal</p><p>mean is in the middle, and is the same as the median, = (a + b) / 2</p><p>all value add up to 1</p><p>standard deviation = _/ (b - a)2 / 12 </p><p>probability questions usually finding an interval. all probability the same, so find as ratio (d - c) / (b - a)</p><p></p>

New cards

hypothesis testing

New cards

binomial distribution

New cards

normal distribution

New cards

outliers

> Q₃ + k IQR

< Q₁ - k IQR

New cards

measures of central tendency on grouped data

mean

midpoints of class
[sum of midpoints x frequency of class] / total frequency

median

total frequency (n) / 2
cumulative frequency to find which class the n / 2 is in
linear interpolation

mode

class with the highest frequency