Maths - Statistics

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/145

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

146 Terms

1
New cards

Hypothesis

A statement about the value of a population parameter which can be tested.

2
New cards

4 rules of a binomial model

- Fixed number of trials
- Fixed number of outcomes (success/failure)
- Constant probability of success
- All trials are independent of each other

3
New cards

One-tailed test

A hypothesis test that has a critical region at one end of the distribution

4
New cards

Two-tailed test

A hypothesis test that has a critical region at both ends of the distribution

5
New cards

P-value

The probability of observing a test statistic at or beyond the stated value, assuming the null hypothesis to be true.

6
New cards

Conditional probability

When the known occurrence of one event affects the probability of subsequent events

7
New cards

Acceptance region

A region of the probability distribution which would lead to the null hypothesis being accepted if the test statistic falls within it.

8
New cards

Experiment

A repeatable process that gives rise to a number of outcomes

9
New cards

Binomial distribution equation

P(X=x) = nCr x p^r x (1-p)^(n-r)

10
New cards

Advantages and disadvantages of using a census

Advantages (1):
- Should give completely accurate result

Disadvantages (3):
- Time consuming, expensive
- Cannot be used when testing involves destruction of sampling units as all would be destroyed and none would be left to use
- Large volume of data to process

11
New cards

How to carry out simple random sampling? (4)

1. Set up sampling frame
2. Use an RNG or lottery sampling to select sampling units - ignore repeats / numbers not included in sampling frame
3. Each sampling unit has an equal chance of being selected
4. Repeat until you have desired sample size

12
New cards

How does stratified sampling work? (3)

1. Population divided into mutually exclusive strata
2. Simple random sampling carried out in each group
3. Proportion of strata chosen = strata size/pop size

13
New cards

Advantages and disadvantages of stratified sampling

Advantages (2):
- Reflects population structure
- Guarantees proportional representation of groups within population

Disadvantages (2):
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from same disadvantages as simple random sampling

14
New cards

How does opportunity / convenience sampling work?

Sample is taken from people who meet criteria and are available at time of study

15
New cards

Advantages and disadvantages of opportunity sampling

Advantages (2):
- Easy to carry out
- Inexpensive

Disadvantages (2):
- Unlikely to provide a representative sample
- Highly dependent on individual researcher

16
New cards

What happens if the sample size increases?

Becomes more accurate BUT more resources are needed

17
New cards

How can you improve a survey using sampling?

Use a larger sample

18
New cards

How can you improve a sampling method?

- Use a larger sample size
- Interview at different (random) times of day and/or different locations

19
New cards

Qualitative variables / data

Variables / data associated with non-numerical observations (descriptive)

20
New cards

Continuous variable

A variable that can take any value in a given range

21
New cards

Discrete variable

A variable that can only take specific values in a given range

22
New cards

8 Weather stations in LDS

ENGLAND (S -> N):
- Camborne
- Hurn
- Heathrow
- Leeming
- Leuchars

OVERSEAS (S -> N):
- Beijing
- Jacksonville
- Perth

23
New cards

In the LDS, what happens to results recorded as 'tr' when calculating averages and why?

'tr' = trace, between 0 to 0.05mm, which would round to 0.0mm to 1 d.p., so they are treated as 0

24
New cards

Where is Perth located and why is that significant?

Southern hemisphere - experience summer during northern hemisphere's winter and vice versa

25
New cards

When working with the LDS, why might you be unable to calculate with the needed number of data points?

Some data is not available (n/a)

26
New cards

Daily mean temperature

The average of the hourly temperature readings during a 24-hour period (measured in °C)

27
New cards

Daily total rainfall

The amount of rainfall measured in a day - includes solid precipitation by melting them first before measuring (measured in mm)

28
New cards

What is daily total sunshine recorded to?

The nearest tenth of an hour (1 d.p.)

29
New cards

Daily mean wind direction and windspeed

The average of the wind direction and windspeed over 24 hours (measured in knots - wind directions given as bearings and as cardinal [compass] directions relative to true north; windspeed also categorised according to Beaufort scale)

30
New cards

What is daily maximum relative humidity given as?

It is given as a percentage of air saturation with water vapour (measured in %)

31
New cards

Class width

The difference between the upper and lower class boundaries

32
New cards

Measure of central tendency

A single value which describes the centre of the data

33
New cards

Mode / Modal class

The value / class that occurs the most often

34
New cards

Median (Q2)

The middle value when the data values are put in order

35
New cards

Mean formula

x̄ = Σx/n (sum of data values/number of data values)

36
New cards

Mean formula for data given in a frequency table

x̄ = Σxf/Σf (sum of products of data values/sum of frequencies)

37
New cards

To find the mean, the class containing the median and the modal class for continuous data, what data do you use?

The midpoint of each class interval

38
New cards

Upper quartile (Q3) and how to find it for DISCRETE data

Three quarters of the way through the data set - find 3/4 of n, if whole number, select this data point; if not, round up and select this data point

39
New cards

What is a problem of using interpolation to estimate the value of the median/quartiles/percenteiles?

Assumes that data values are evenly distributed within each class

40
New cards

Range

The difference between the largest and smallest values in the data set

41
New cards

The interquartile range

The difference between the upper quartile and lower quartile (Q3 - Q1)

42
New cards

Using the range vs using the IQR?

Range: takes into account all data values BUT can be affected by extreme values
IQR: not affected by extreme values BUT only considers the spread of the middle 50% of the data

43
New cards

Units of variance

Units of data^2

44
New cards

Variance symbol

σ^2

45
New cards

Standard deviation symbol

σ

46
New cards

Variance

The average squared distance from the mean

47
New cards

How do you find the median position for data in a list (not a table)?

(n+1)/2

48
New cards

If some data is coded by y = nx, what is the effect on the mean and standard deviation?

ȳ = x̄ x n
σy = σx x n

49
New cards

Cleaning the data

The process of removing anomalies from a data set

50
New cards

Why do you need to clean data sets? (2)

To remove anomalies, which are errors and are misleading to keep in the data OR to identify trace values and convert them to 0

51
New cards

>> / << meaning

Much greater than / much less than

52
New cards

How do you draw a box plot?

1. Plot the lowest value and highest value using a small line (either the lowest/highest values of data set or the boundary for outliers)
2. Plot Q1, Q2 and Q3 using long lines and connect with a box
3. Draw outliers with a cross

53
New cards

How do you draw a cumulative frequency diagram?

1. Draw cumulative frequency table - add up frequencies
2. Plot points at end-point of classes
3. Connect points with a curved line

54
New cards

How do you draw a histogram?

1. Vertical scale is frequency density (= frequency/class width)
2. Draw boxes touching, each box is the length of the class width

55
New cards

What happens when one variable increases for:
1. negatively correlated data?
2. positively correlated data?

1. The other decreases
2. The other also increases

56
New cards

When do two variables have a causal relationship and how do you determine if they do?

If a change in one variable causes a change in the other - determine this using the context of the question and common sense

57
New cards

Line of best fit

A line drawn on a scatter diagram that approximates the relationship between the variables

58
New cards

How do you write the equation of a regression line of y on x?

y = a + bx

59
New cards

For lines of regression, what does the coefficient of x (the gradient) tell you?

The change in y for each unit change in x

60
New cards

When can you not make predictions from a scatter diagram and line of regression?

1. If the x value is not within the range of data as will need to use extrapolation
2. If want to estimate the value of x given y - need to use regression line of x on y, not y on x

61
New cards

For lines of regression, what does a (the y-intercept) tell you?

A constant value when x = 0

62
New cards

When can you draw a line of regression on a scatter graph?

When the points on a scatter graph lie close to a straight line

63
New cards

Will an estimation from a scatter diagram be reliable if it is outside the range of given data?

Reasonably reliable if close to range - unlikely to be reliable if well outside range

64
New cards

When carrying out a hypothesis test, what happens if the calculated probability is greater than the significance level?

Sufficient evidence to accept null hypothesis (why? as more than e.g. 5% chance of null hypothesis being true)

65
New cards

When carrying out a hypothesis test, what happens if the calculated probability is less than the significance level?

Sufficient evidence to reject null hypothesis (why? as less than e.g. 5% chance of null hypothesis being true)

66
New cards

When can you use a line of regression to estimate a value?

When you are predicting a value for y given x and when the x value falls within the given range (can use interpolation)

67
New cards

What is the effect of proximity to the sea on temperature and windspeed?

The closer to sea, the lower the temperature but the higher the windspeed

68
New cards

Sample space

The set of all possible outcomes

69
New cards

Probability description

A full description of the probabilities of any outcome in the sample space.

70
New cards

Test statistic

An observation or statistic calculated from a sample, used to test a hypothesis.

71
New cards

Critical region

The set of values for which the null hypothesis is rejected in a hypothesis test.

72
New cards

Critical value

The first value to fall inside a critical region

73
New cards

Sample

A selection of observations taken from a subset of the population used to find out information about the population as a whole

74
New cards

Actual significance level

The probability of incorrectly rejecting the null hypothesis

75
New cards

Parameter

A defining statistical characteristic of the population

76
New cards

Statistically independent events

When the occurrence of one event does not affect another

77
New cards

Uniform distribution

When the probability is the same for each outcome

78
New cards

Hypothesis test

A statistical test used to determine whether there is enough evidence in a sample of data to infer a certain condition is true for the whole population

79
New cards

Mutually exclusive

When events have no outcomes in common so cannot occur at the same time.

80
New cards

Event

A subset of the sample space

81
New cards

Population

The whole set of items that are of interest

82
New cards

Sample unit

Each individual thing in the population that can be sampled

83
New cards

Sampling frame

A list of the sampling units where they are individually named or numbered

84
New cards

Census

Data collected from the entire population

85
New cards

Advantages and disadvantages of using a sample

Advantages (3):
- Cheaper
- Quicker
- Less data to process

Disadvantages (3):
- Data may not be accurate
- Data may not be large enough to represent small sub groups of population
- Different samples may lead to different conclusions due to natural variation

86
New cards

Advantages and disadvantages of simple random sampling

Advantages (3):
- Bias free / less bias
- Easy and cheap to implement
- Each number (and so, sample) has an equal chance of being selected

Disadvantages (2):
- Sampling frame needed
- Not suitable when population size is large

87
New cards

How to carry out systematic sampling? (2)

Required elements are chosen at regular intervals in ordered text - sampling frame MUST BE RANDOMLY ASSIGNED (like not alphabetical) OR ELSE WILL INTRODUCE BIAS

1. Calculate k (k = pop size/sample size)
2. Randomly select a number between 1 and k - select unit that is assigned this number
3. Select every kth sampling unit after this

88
New cards

Advantages and disadvantages of systematic sampling

Advantages (2):
- Simple and quick to use
- Suitable for large samples/populations

Disadvantages (2):
- Sampling frame needed
- Can introduce bias if sampling frame not random

89
New cards

When is stratified sampling used?

Population is large + naturally divides into groups

90
New cards

How does quota sampling work? (3)

1. Population divided into groups/categories according to characteristic - size of each group determines proportion of sample that should have characteristic
2. Interviewer interviews people, assesses what group they fall into and selects them
3. Ignore people of a type where the quota is full

91
New cards

Advantages and disadvantages of quota sampling

Advantages (4):
- Allows small sample to still be representative of the whole population
- No sampling frame needed
- Quick, easy, inexpensive
- Allows for easy comparison between different groups in the population

Disadvantages (4):
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Increasing scope of study increases number of groups, so becomes more time consuming and expensive
- Non-responses are not recorded

92
New cards

Raw data

Unprocessed information

93
New cards

How do you compare data?

Compare mean and standard deviation or median and IQR - median and IQR are more appropriate if there are outliers

94
New cards

What does it mean if the sampling method used is not be representative?

It's unlikely to reflect the characteristics of the whole population

95
New cards

Quantitative variables / data

Variables / data associated with numerical observations

96
New cards

4 Beaufort scale terms

1. 0 (calm): less than 1 knot
2. 1-3 (light): 1-10 knots
3. 4 (moderate): 11-16 knots
4. 5 (fresh): 17 to 21 knots

97
New cards

Daily maximum gust

The highest instantaneous windspeed recorded (measured in knots - direction also recorded)

98
New cards

What does it mean if daily maximum relative humidity is greater than 95%?

Misty and foggy conditions

99
New cards

In the LDS, which stations are coastal?

Camborne, Hurn, Leuchars, Jacksonville, Perth

100
New cards

In the LDS, which stations are inland?

Heathrow, Leeming, Beijing