1/145
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Hypothesis
A statement about the value of a population parameter which can be tested.
4 rules of a binomial model
- Fixed number of trials
- Fixed number of outcomes (success/failure)
- Constant probability of success
- All trials are independent of each other
One-tailed test
A hypothesis test that has a critical region at one end of the distribution
Two-tailed test
A hypothesis test that has a critical region at both ends of the distribution
P-value
The probability of observing a test statistic at or beyond the stated value, assuming the null hypothesis to be true.
Conditional probability
When the known occurrence of one event affects the probability of subsequent events
Acceptance region
A region of the probability distribution which would lead to the null hypothesis being accepted if the test statistic falls within it.
Experiment
A repeatable process that gives rise to a number of outcomes
Binomial distribution equation
P(X=x) = nCr x p^r x (1-p)^(n-r)
Advantages and disadvantages of using a census
Advantages (1):
- Should give completely accurate result
Disadvantages (3):
- Time consuming, expensive
- Cannot be used when testing involves destruction of sampling units as all would be destroyed and none would be left to use
- Large volume of data to process
How to carry out simple random sampling? (4)
1. Set up sampling frame
2. Use an RNG or lottery sampling to select sampling units - ignore repeats / numbers not included in sampling frame
3. Each sampling unit has an equal chance of being selected
4. Repeat until you have desired sample size
How does stratified sampling work? (3)
1. Population divided into mutually exclusive strata
2. Simple random sampling carried out in each group
3. Proportion of strata chosen = strata size/pop size
Advantages and disadvantages of stratified sampling
Advantages (2):
- Reflects population structure
- Guarantees proportional representation of groups within population
Disadvantages (2):
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from same disadvantages as simple random sampling
How does opportunity / convenience sampling work?
Sample is taken from people who meet criteria and are available at time of study
Advantages and disadvantages of opportunity sampling
Advantages (2):
- Easy to carry out
- Inexpensive
Disadvantages (2):
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
What happens if the sample size increases?
Becomes more accurate BUT more resources are needed
How can you improve a survey using sampling?
Use a larger sample
How can you improve a sampling method?
- Use a larger sample size
- Interview at different (random) times of day and/or different locations
Qualitative variables / data
Variables / data associated with non-numerical observations (descriptive)
Continuous variable
A variable that can take any value in a given range
Discrete variable
A variable that can only take specific values in a given range
8 Weather stations in LDS
ENGLAND (S -> N):
- Camborne
- Hurn
- Heathrow
- Leeming
- Leuchars
OVERSEAS (S -> N):
- Beijing
- Jacksonville
- Perth
In the LDS, what happens to results recorded as 'tr' when calculating averages and why?
'tr' = trace, between 0 to 0.05mm, which would round to 0.0mm to 1 d.p., so they are treated as 0
Where is Perth located and why is that significant?
Southern hemisphere - experience summer during northern hemisphere's winter and vice versa
When working with the LDS, why might you be unable to calculate with the needed number of data points?
Some data is not available (n/a)
Daily mean temperature
The average of the hourly temperature readings during a 24-hour period (measured in °C)
Daily total rainfall
The amount of rainfall measured in a day - includes solid precipitation by melting them first before measuring (measured in mm)
What is daily total sunshine recorded to?
The nearest tenth of an hour (1 d.p.)
Daily mean wind direction and windspeed
The average of the wind direction and windspeed over 24 hours (measured in knots - wind directions given as bearings and as cardinal [compass] directions relative to true north; windspeed also categorised according to Beaufort scale)
What is daily maximum relative humidity given as?
It is given as a percentage of air saturation with water vapour (measured in %)
Class width
The difference between the upper and lower class boundaries
Measure of central tendency
A single value which describes the centre of the data
Mode / Modal class
The value / class that occurs the most often
Median (Q2)
The middle value when the data values are put in order
Mean formula
x̄ = Σx/n (sum of data values/number of data values)
Mean formula for data given in a frequency table
x̄ = Σxf/Σf (sum of products of data values/sum of frequencies)
To find the mean, the class containing the median and the modal class for continuous data, what data do you use?
The midpoint of each class interval
Upper quartile (Q3) and how to find it for DISCRETE data
Three quarters of the way through the data set - find 3/4 of n, if whole number, select this data point; if not, round up and select this data point
What is a problem of using interpolation to estimate the value of the median/quartiles/percenteiles?
Assumes that data values are evenly distributed within each class
Range
The difference between the largest and smallest values in the data set
The interquartile range
The difference between the upper quartile and lower quartile (Q3 - Q1)
Using the range vs using the IQR?
Range: takes into account all data values BUT can be affected by extreme values
IQR: not affected by extreme values BUT only considers the spread of the middle 50% of the data
Units of variance
Units of data^2
Variance symbol
σ^2
Standard deviation symbol
σ
Variance
The average squared distance from the mean
How do you find the median position for data in a list (not a table)?
(n+1)/2
If some data is coded by y = nx, what is the effect on the mean and standard deviation?
ȳ = x̄ x n
σy = σx x n
Cleaning the data
The process of removing anomalies from a data set
Why do you need to clean data sets? (2)
To remove anomalies, which are errors and are misleading to keep in the data OR to identify trace values and convert them to 0
>> / << meaning
Much greater than / much less than
How do you draw a box plot?
1. Plot the lowest value and highest value using a small line (either the lowest/highest values of data set or the boundary for outliers)
2. Plot Q1, Q2 and Q3 using long lines and connect with a box
3. Draw outliers with a cross
How do you draw a cumulative frequency diagram?
1. Draw cumulative frequency table - add up frequencies
2. Plot points at end-point of classes
3. Connect points with a curved line
How do you draw a histogram?
1. Vertical scale is frequency density (= frequency/class width)
2. Draw boxes touching, each box is the length of the class width
What happens when one variable increases for:
1. negatively correlated data?
2. positively correlated data?
1. The other decreases
2. The other also increases
When do two variables have a causal relationship and how do you determine if they do?
If a change in one variable causes a change in the other - determine this using the context of the question and common sense
Line of best fit
A line drawn on a scatter diagram that approximates the relationship between the variables
How do you write the equation of a regression line of y on x?
y = a + bx
For lines of regression, what does the coefficient of x (the gradient) tell you?
The change in y for each unit change in x
When can you not make predictions from a scatter diagram and line of regression?
1. If the x value is not within the range of data as will need to use extrapolation
2. If want to estimate the value of x given y - need to use regression line of x on y, not y on x
For lines of regression, what does a (the y-intercept) tell you?
A constant value when x = 0
When can you draw a line of regression on a scatter graph?
When the points on a scatter graph lie close to a straight line
Will an estimation from a scatter diagram be reliable if it is outside the range of given data?
Reasonably reliable if close to range - unlikely to be reliable if well outside range
When carrying out a hypothesis test, what happens if the calculated probability is greater than the significance level?
Sufficient evidence to accept null hypothesis (why? as more than e.g. 5% chance of null hypothesis being true)
When carrying out a hypothesis test, what happens if the calculated probability is less than the significance level?
Sufficient evidence to reject null hypothesis (why? as less than e.g. 5% chance of null hypothesis being true)
When can you use a line of regression to estimate a value?
When you are predicting a value for y given x and when the x value falls within the given range (can use interpolation)
What is the effect of proximity to the sea on temperature and windspeed?
The closer to sea, the lower the temperature but the higher the windspeed
Sample space
The set of all possible outcomes
Probability description
A full description of the probabilities of any outcome in the sample space.
Test statistic
An observation or statistic calculated from a sample, used to test a hypothesis.
Critical region
The set of values for which the null hypothesis is rejected in a hypothesis test.
Critical value
The first value to fall inside a critical region
Sample
A selection of observations taken from a subset of the population used to find out information about the population as a whole
Actual significance level
The probability of incorrectly rejecting the null hypothesis
Parameter
A defining statistical characteristic of the population
Statistically independent events
When the occurrence of one event does not affect another
Uniform distribution
When the probability is the same for each outcome
Hypothesis test
A statistical test used to determine whether there is enough evidence in a sample of data to infer a certain condition is true for the whole population
Mutually exclusive
When events have no outcomes in common so cannot occur at the same time.
Event
A subset of the sample space
Population
The whole set of items that are of interest
Sample unit
Each individual thing in the population that can be sampled
Sampling frame
A list of the sampling units where they are individually named or numbered
Census
Data collected from the entire population
Advantages and disadvantages of using a sample
Advantages (3):
- Cheaper
- Quicker
- Less data to process
Disadvantages (3):
- Data may not be accurate
- Data may not be large enough to represent small sub groups of population
- Different samples may lead to different conclusions due to natural variation
Advantages and disadvantages of simple random sampling
Advantages (3):
- Bias free / less bias
- Easy and cheap to implement
- Each number (and so, sample) has an equal chance of being selected
Disadvantages (2):
- Sampling frame needed
- Not suitable when population size is large
How to carry out systematic sampling? (2)
Required elements are chosen at regular intervals in ordered text - sampling frame MUST BE RANDOMLY ASSIGNED (like not alphabetical) OR ELSE WILL INTRODUCE BIAS
1. Calculate k (k = pop size/sample size)
2. Randomly select a number between 1 and k - select unit that is assigned this number
3. Select every kth sampling unit after this
Advantages and disadvantages of systematic sampling
Advantages (2):
- Simple and quick to use
- Suitable for large samples/populations
Disadvantages (2):
- Sampling frame needed
- Can introduce bias if sampling frame not random
When is stratified sampling used?
Population is large + naturally divides into groups
How does quota sampling work? (3)
1. Population divided into groups/categories according to characteristic - size of each group determines proportion of sample that should have characteristic
2. Interviewer interviews people, assesses what group they fall into and selects them
3. Ignore people of a type where the quota is full
Advantages and disadvantages of quota sampling
Advantages (4):
- Allows small sample to still be representative of the whole population
- No sampling frame needed
- Quick, easy, inexpensive
- Allows for easy comparison between different groups in the population
Disadvantages (4):
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Increasing scope of study increases number of groups, so becomes more time consuming and expensive
- Non-responses are not recorded
Raw data
Unprocessed information
How do you compare data?
Compare mean and standard deviation or median and IQR - median and IQR are more appropriate if there are outliers
What does it mean if the sampling method used is not be representative?
It's unlikely to reflect the characteristics of the whole population
Quantitative variables / data
Variables / data associated with numerical observations
4 Beaufort scale terms
1. 0 (calm): less than 1 knot
2. 1-3 (light): 1-10 knots
3. 4 (moderate): 11-16 knots
4. 5 (fresh): 17 to 21 knots
Daily maximum gust
The highest instantaneous windspeed recorded (measured in knots - direction also recorded)
What does it mean if daily maximum relative humidity is greater than 95%?
Misty and foggy conditions
In the LDS, which stations are coastal?
Camborne, Hurn, Leuchars, Jacksonville, Perth
In the LDS, which stations are inland?
Heathrow, Leeming, Beijing