stats + mechanics

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/27

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

28 Terms

1
New cards

P (A / B)

  • P (A ∩ B) / P (B)

  • [P (B / A) x P (A)] / P (B)

  • P (A) , if independent

<ul><li><p>P (A ∩ B) / P (B)</p></li><li><p>[P (B / A) x P (A)] / P (B)</p></li><li><p>P (A) , if independent </p></li></ul><p></p>
2
New cards

P (A ∪ B)

all of it

<p>all of it </p>
3
New cards

P (A ∩ B)

intersection

<p>intersection </p>
4
New cards

mutually exclusive

P (A ∩ B) = 0

5
New cards

independent

P (A) x P (B)

6
New cards

addition law

P (A ∪ B) = [P (A) + P (B)] - P (A ∩ B)

7
New cards

census

measures every member of a population

  • + accurate result

  • - expensive, testing may destroy

8
New cards

sampling units

individuals of a population

9
New cards

sampling frame

list of sample units

10
New cards

random sampling

  1. simple random sampling

    equal change of being selected. uses a random number / lottery system

    • + bias free

    • - needs sampling frame

  2. systematic sampling

    take every kth unit, where k = population / sample. pick a random number between 1 and k to start

    • + quick to start

    • - needs sampling frame

  3. stratified sampling

    sample represents groups (strata) of a population. (sample / population) x strata for each strata, and picked randomly

    • + reflects population

    • - population must be classified in strata

11
New cards

non random sampling

  1. quota sampling

    like stratified, but strata filled up by interviews / researcher

    • + no sampling frame

    • - non random so potential bias

  2. opportunity sampling

    quota filled by who is available at the time

    • + cheap and easy

    • - unlikely to be representative, researcher bias

12
New cards

types of data

qualitive - non numerical

quantitative - numerical

13
New cards

large data set - stations in UK

  1. cambourne (coast, south)

  2. hurn (coast, south)

  3. heathrow (south)

  4. leeming (north)

  5. leuchars (coast, north)

coastal stations = windy, rainy

south = warmer, more hours in the day

recorded for 6 months only, may - october 1987 - 2015

<ol><li><p>cambourne (coast, south)</p></li><li><p>hurn (coast, south)</p></li><li><p>heathrow (south)</p></li><li><p>leeming (north)</p></li><li><p>leuchars (coast, north)</p></li></ol><p></p><p>coastal stations = windy, rainy</p><p>south = warmer, more hours in the day</p><p>recorded for 6 months only, may - october 1987 - 2015</p>
14
New cards

large data set - international stations

  1. perth (australia)

    southern hemisphere so seasons switched

    very hot in summer

  1. beijing (china)

    really hot and rainy in summer

    really cold in winter

  2. jacksonville (florida, usa)

    very warm

    prone to hurricanes

<ol><li><p>perth (australia)</p><p>southern hemisphere so seasons switched</p><p>very hot in summer</p></li></ol><p></p><ol start="2"><li><p>beijing (china)</p><p>really hot and rainy in summer</p><p>really cold in winter</p><p></p></li><li><p>jacksonville (florida, usa)</p><p>very warm</p><p>prone to hurricanes</p></li></ol><p></p>
15
New cards

large data set - data

  • rainfall

    ‘tr’ means trace, treat it as 0 in calculations

  • n/a

    not available, so can’t be used in a sample

  • cloud cover

    oktas, discrete values of 0 - 8. measures how many 1/8 of the sky is covered

  • max gust

    knots, 1 kn = 1.15 mph

16
New cards

Σ

sum of

17
New cards

measures of location, learn how to do on calc come back

  1. measure of central tendency

    • mean

      -x = Σx / n

      -x of grouped data = Σfx / Σf

    • mode

    • median

  2. quartiles

    • for listed data -

      Q1 = n / 4

      Q2 = n / 2

      Q3 = 3n / 4

      where n is the number of sampling units

      if a decimal, round up

      if whole, find midpoint with next number

    • for grouped data -

      Q1 = n / 4

      Q2 = n / 2

      Q3 = 3n / 4

      percentiles , e.g., 57th = 0.57 x n

      deciles , 10% chunks , e.g., D3 = P30 = 0.3 x n

      do not round. use linear interpolation

18
New cards

linear interpolation

for grouped data

  1. cumulative frequency for quartiles or percentiles

  2. find the class interval

<p>for grouped data</p><ol><li><p>cumulative frequency for quartiles or percentiles</p></li><li><p>find the class interval</p></li></ol><p></p>
19
New cards

measures of spread learn how to do on calc come baxj

  1. interquartile range

    IQR = Q3 - Q1

    + ignores extremes

  2. interpercentile range

    IPR = Pn2 - Pn1

  3. variance

σ2 = Σ (x - x-)2 / n

or, the mean of the squares [(sum of x2) / n] - square of the mean [x-2]

if it’s frequency data, then σ2 = [(sum of fx2) / total frequency] - [fx-2] where fx is data class x frequency. midpoints if continuous

on calc, stats, put into list, calc, 1 var

  1. standard deviation

    how much data scatters around the mean on average

    root of the sum of the squared deviations divided by number of values

    σ = _/ (1 / n) x (x1 - x-)2 +(x2 - x-)2 + … + (xn - x-)2

    negative if below the mean, positive if above the mean

    square root of variance equation

20
New cards

coding

if y = ax + b,

then always y- = ax- + b

21
New cards

representations of data

  1. cumulative frequency

    mark the quartiles on the y-axis and find the corresponding value on the x-axis

  1. box plots

    median, LQ, UQ, highest and lowest values, and outliers marked

    can be used to compare location and spread of different data sets

    can be made from a cumulative frequency graph

  2. histograms

    for continuous data

    no gaps

    frequency density = freq / class width

    area = freq x k

compare = 1 measure of location, 1 measure of spread

<ol><li><p>cumulative frequency</p><p>mark the quartiles on the y-axis and find the corresponding value on the x-axis </p></li></ol><p></p><ol start="2"><li><p>box plots</p><p>median, LQ, UQ, highest and lowest values, and outliers marked</p><p>can be used to compare location and spread of different data sets </p><p>can be made from a cumulative frequency graph </p><p></p></li><li><p>histograms</p><p>for continuous data </p><p>no gaps</p><p>frequency density = freq / class width </p><p>area = freq x k </p></li></ol><p></p><p>compare = 1 measure of location, 1 measure of spread </p><p></p>
22
New cards

regression and correlation

  • product moment correlation coefficient (PMCC)

    -1 ≤ r ≤ 1

    measures strength and + / - of correlation

  • regression line

    line of best fit, y = a + b x

    a = y when x = 0

    b = how much y changes when x increases by 1

  • interpolation = estimating inside the data range

    more reliable

  • extrapolation = estimating outside the data range

    not reliable

  • if y = abx , exponential

    log y = log a + x log b

    y = c + m x

    (m = log b, x = x)

  • if y = axn , polynomial

    log y = log a + n log x

    y = c + m x

    (m = n, x = log x)

23
New cards

discrete uniform distribution

probabilities of outcomes are all equal

mean is in the middle, and is the same as the median, = (a + b) / 2

all value add up to 1

standard deviation = _/ (b - a)2 / 12

probability questions usually finding an interval. all probability the same, so find as ratio (d - c) / (b - a)

<p>probabilities of outcomes are all equal</p><p>mean is in the middle, and is the same as the median, = (a + b) / 2</p><p>all value add up to 1</p><p>standard deviation = _/ (b - a)2 / 12 </p><p>probability questions usually finding an interval. all probability the same, so find as ratio (d - c) / (b - a)</p><p></p>
24
New cards

hypothesis testing

25
New cards

binomial distribution

26
New cards

normal distribution

27
New cards

outliers

> Q3 + k IQR

< Q1 - k IQR

28
New cards

measures of central tendency on grouped data

mean

  1. midpoints of class

  2. [sum of midpoints x frequency of class] / total frequency

median

  1. total frequency (n) / 2

  2. cumulative frequency to find which class the n / 2 is in

  3. linear interpolation

mode

  1. class with the highest frequency