Normal Applied Year 1 definitions

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/129

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

130 Terms

1
New cards

Population

Entire set of items in the group being studied

2
New cards

Census

Measuring every member of a population

  • accurate

  • Expensive

  • Some testing destroys items

  • Time consuming

  • Hard to process large quantity of data

3
New cards

Sampling frame

List of sampling units. Sampling units individually named or numbered.

E.g. database of ___ who/which___

4
New cards

Sample

Subset of the population intended to represent the population.

  • less time consuming and expensive

  • Less data to process

  • Not as accurate

  • Sample may not be large enough to give info on smaller subgroups

5
New cards

Sampling units

Individual units of a population

6
New cards

ALWAYS MENTION SAMPLING FRAME FOR ANY SAMPLING TECHNIQUE!!!!

7
New cards

Simple random sampling (definition, pros, cons)

  • every member if population has equal chance of being selected

  • In sampling frame, each item has unique identifying number

  • Use random number generator/lottery sampling

  • Free of bias

  • Easy and cheap

  • Not suitable for large pop.

  • Need sampling frame

8
New cards

Systematic sampling (definition, pros, cons)

  • Required elements chosen at regular intervals from ordered list

  • Pick first item randomly by picking random number between 1 and k

  • K = pop. size/ sample size

  • Quick to use

  • Need sampling frame

  • Suitable for large pop.

  • Patterns in sample data may occur

9
New cards

Stratified sampling

  • Population divided into mutually exclusive strata (groups) and random sample taken from each

  • Proportion stays same

  • No. Sampled in strata = (no. in stratum/no. in population) x overall sample size

  • Accurately represents population structure

  • Population must be classified in strata so need to know population structure which can be hard to tell clearly wnd time consuming

  • Rest of cons same as random sampling

10
New cards

Why can different samples reach different conclusions

Natural variation in population

11
New cards

What to say when asked to comment on a claim

  • whether its mean, median, mode (median better with outliers. Can also mean half good/half not etc)

  • Whether data supports company claim

12
New cards

How to improve data sampling reliability

  • take larger sample

  • Use simple random sampling

13
New cards

Quota sampling

  • population divided into strata according to characteristics. Size set to try and reflect group proportion of whole population.

  • Strata filled by interviewer/researcher

  • No sampling frame needed

  • Non random so potential bias

  • Quick, easy and small sample still representative

  • Time consuming and expensive to divide into strata

14
New cards

Opportunity sampling

  • Sampling taken from people available at the time of the study

  • Easy to carry out

  • Cheap

  • Unlikely to be representative

  • Highly dependent on individual researcher so bias

15
New cards

Qualitative

Non-numerical

16
New cards

Quantitative

Numerical. Discrete(specific values) or continuous(any value)

17
New cards

Key terms for intervals

  • class intervals

  • Lower/upper class boundaries

  • Class width

  • Midpoint

18
New cards

UK stations from south of UK to north and weather conditions

Are in alphabetical order (except heathrow and hurn which are not). Ones in south warmer and more sunlight

  • cambourne - coastal so windy and rainier

  • Hurn- coastal so windy and rainier.

  • Heathrow- warmest

  • Leeming- pretty warm but more in north

  • Leuchars- coastal. Wettest, coldest, windiest, furthest north.

19
New cards

When did the great storm happen

  • October 15-16, 1987 . High wind speeds. Mostly SE affected

20
New cards

When is large data set recorded for

May to october in 1987 and in 2015.

Only 6 months so bit of a disadvantage.

21
New cards

International stations and weather conditions

  • Perth, Australia - when summer here, winter there. Very hot in summer. 0 to very high rainfall

  • Beijing, China- very hot and rainy in summer, very cold in winter. Inland

  • Jacksonville, Florida, USA - very hot and humid, prone to hurricanes. Hurricane in oct ‘87 and oct 2015.

22
New cards

Special fact about 2015 may

Windiest month

23
New cards

Special fact about may 1987

Lots of missing data

24
New cards

Rainfall ‘tr’ meaning

Trace. Treat as 0 in calculation

25
New cards

N/A meaning

Reading not available so cant use in sample

26
New cards

Cloud cover special fact

  • Measured in oktas. Quantitative

  • Discrete Values 0-8

27
New cards

Max. Gust special fact

  • Measure in knots

  • 1kn =1.15mph

  • Integers only

28
New cards

Cardinal wind directions

Directions on a compass

29
New cards

Daily mean temp units

Degrees celcius

30
New cards

Daily total rainfall units

mm

31
New cards

Daily mean pressure units

hPa. 1hPa= 100Pa

Integers only

32
New cards

Windspeed units

Beaufort scale- fresh, light, moderate, strong.

Qualitative.

33
New cards

Wind/gust direction

Bearings.

Multiples of 10 only

34
New cards

Daily total sunshine units

Hours.

35
New cards

Relative humidity units

%.

Integers only

36
New cards

Daily mean visibility units

Dm. 1Dm=10m.

Round to nearest 100

37
New cards

As you move further north from may to october, what happens to maximum hours of sunshine

Increases

38
New cards

Also consider size of sample and geographical factors affecting things, not just numerical values!!!

39
New cards

Humidity for fog

>95%

40
New cards

Outliers meaning

Unusual data

41
New cards

Anomallies meaning

Errors

42
New cards

Mean (x bar)

sum of x/n or sum of fx/sum of f

43
New cards

For listed data, Upper quartile

3n/4

44
New cards

For listed data , Lower quartile

n/4

45
New cards

For listed data, Median

n/2

46
New cards

For listed data, what if values for upper quartile/median/lower quartile are decimals

Round up

47
New cards

For listed data, what if values for upper quartile/median/lower quartile are whole numbers

Find midpoint with next one

48
New cards

Quartiles for grouped data

  • lower: n/4

  • Upper: 3n/4

  • Median: n/2

  • DO NOT ROUND, USE LINEAR INTERPOLATION

49
New cards

Percentiles

E.g. 57th percentile, P57= 0.57 x n

50
New cards

Deciles

10% chunks

E.g. D3 =0.3 x n

51
New cards

True class limit for ‘10-12’

9.5 <= x < 12.5

52
New cards

Interquartile range (is a measure of spread)

Upper - lower quartile

Ignores extremes

53
New cards

Interpercentile range

E.g. 10th to 90th IPR = P90 - P10

54
New cards

Variance (sigma squared)

MSMSM (mean of the squares minus the square of the means)

(Sum of x²/n) - mean² = Sxx/n

Grouped frequency used midpoint of class width to find f(x)

55
New cards

Standard deviation (sigma)

Sqrt (variance)

For some questions, can make assumptions that data is equally distributed through range

56
New cards

Variance definition

Measure of spread that takes all values into account. Average squared distance from mean.

57
New cards

Standard deviation definition

Measure of spread of data. How many values on each side of mean/median

58
New cards

Coding - if y=ax+b

Data values coded to make new set of values easier to work with

  • Mean of y = a x (mean of x) + b

  • Standard deviation of y = a x (standard deviation of x)

59
New cards

Cumulative frequency graphs and box plots

Use highest number in range for class

<p>Use highest number in range for class</p>
60
New cards

Standard way for finding outlier boundaries

  • Must be less than this: (LowerQ) - 1.5 x IQR or

  • Must be greater than this: (UpperQ) + 1.5 x IQR

61
New cards

What is meant by cleaning the data

Removing anomalies from data

62
New cards

Histograms

Joining middle of top of each bar forms frequency polygon

<p>Joining middle of top of each bar forms frequency polygon</p><p></p>
63
New cards

When comparing data sets on histograms/ box plots, comment on:

  • mean and standard deviation OR

  • median and interquartile rangE

If data set contains extreme values, median and IQR more appropriate

64
New cards

Correlation

Describes measure of linear relationship between 2 variables. Between -1 and 1

  • strong negative

  • Weak negative

  • No/zero

  • Weak positive

  • Strong positive

65
New cards

Bivariate data

Data which has pairs of values for 2 variables. On scatter diagrams. E.g. pulse beats per minute

66
New cards

What variable goes on x-axis

Independent variable

67
New cards

Causal relationship

If one variable causes change in another. *just because 2 variables show correlation doesnt mean they have causal relationship. Must use context and common sense to deduce.

68
New cards

When can outliers be included in data for correlation, and when excluded

Included as they may unlikely be an anomally.

Exclude as they may not be representative.

69
New cards

Regression line

Line of best fit. y = mx + c.

c is y when x=0

m is rate of change of y with x. How much y increase/decreases with increase/decrease in x

70
New cards

Least squares regression line

Straight line that minimises sum of squares of distances of each data point from line

71
New cards

To improve regression line or any models…

Always increase sample size and choose randomly

72
New cards

Can you use regression line equation to find values of x when given y

No because x is independent variable

73
New cards

Interpolation

Estimating inside data range. Reliable.

74
New cards

Extrapolation

Estimating outside data range. Unreliable as we dont know graph continues to be linear.

75
New cards

What to do for non linear equations to draw linear graphs

Take logs

76
New cards

Venn diagrams

Represents events graphically

Rectangle represents sample space

<p>Represents events graphically</p><p>Rectangle represents sample space</p><p></p>
77
New cards

Experiment

Repeatable process that gives rise to a number of outcomes

78
New cards

Event

Collection of 1 or more outcomes

79
New cards

Sample space

Set of all possible outcomes. Could use table, tree diagram, or venn diagram

<p>Set of all possible outcomes. Could use table, tree diagram, or venn diagram</p>
80
New cards
term image
81
New cards

Mutually exclusive events

When events have no outcomes in common. Venn diagrams do not overlap.

P(A or B) = P(A) + P(B)

P (A and B) = 0

82
New cards

Independent events

When one event has no effect on another. Can’t tell from venn diagram so must do some calculations to determine if independent

P(A and B) = P(A) x P(B)

83
New cards
term image
84
New cards

Probability addition law

P(A U B) = P(A) + P(B) - P(A and B).

*When events are mutually exclusive, P(A and B) must be 0 and if not 0, events are not mutually exclusive

85
New cards

Discrete uniform distribution

Probabilities of all outcomes are equal

86
New cards

Sum of probabilities of all outcomes of an event

1

87
New cards

Are the number of days in a given week discrete

No. Days in a week is always 7 and is pre-determined so not random.

88
New cards
term image
89
New cards

When can you model X with binomial distribution X~B(n,p)

FFIT

  • Fixed number of trials

  • Fixed probability of success

  • Independent trials

  • Two outcomes only

90
New cards

Binomial X~B(n,p) equation for P(X=r)

(nCr)(p^r)(1-p)^(n-r)

<p>(n<strong>C</strong>r)(p^r)(1-p)^(n-r)</p>
91
New cards

Hypothesis

A statement about a population parameter.

92
New cards

Test statistic

Result of the experiment or the statistic calculated. E.g. no. of people saying they support a candidate

93
New cards

Null hypothesis

Ho- what you assume to be true

94
New cards

Alternative hypothesis

What would be true if Ho is wrong

95
New cards

Significance level

The given threshold of likeliness

96
New cards

One tailed test

When H1: p>k or p<k

97
New cards

2 tailed test

When H1: p doesnt not equal k. Halve significance level value for each end

98
New cards

When to reject Ho

If p<sig. level (in critical region so sufficient evidence to reject Ho)

If p>sig. level, not in critical region so insufficient evidence to reject Ho

99
New cards

Critical region

If test statistic falls within this region, would cause you to reject null

100
New cards

Critical value

First value to fall inside critical region