1/167
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Population
The whole set of terms that are of interest.
Census
Observes or measures every member of a population
Sample
A selection of observations taken from a subset of the population which is used to find information about the population as a whole
Census advantages
It should give a completely accurate result
Census disadvantages
- Time consuming and expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantity of data
Sample advantages
- Less time consuming and expensive than a census
- Fewer people have to respond
- Less data to process that a census
Sample disadvantages
- The data may not be as accurate
- The sample may not be large enough to give information about small sub-groups of the population
Sampling units
Individual units of a population
Sampling frame
Individually named or numbered to form a list called a sampling frame
The sample should be
representative of the population
Random sampling helps to remove
bias from a sample
Random sampling methods
- Simple random sampling
- Systematic sampling
- Stratified sampling
Simple Random Sampling
A simple random sample of size n is one where every sample of size n has an equal chance of being selected
Systematic sampling
The required elements are chosen at regular intervals from an ordered list
Stratified sampling
The population is divided into mutually exclusive strata (which are proportional to the population) and a random sample is taken from each.
Stratified sampling equation
The number sampled in a stratum = (number in stratum/number in population) x overall sample size
Simple random sampling advantages
- Free of bias
- Easy and cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection
Simple random sampling disadvantages
- Not suitable when the population size or the sample size is large
- A sampling frame is needed
Systematic sampling advantages
-simple and quick
-suitable for large samples and populations
Systematic sampling disadvantages
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random
Stratified sampling advantages
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population
Stratified sampling disadvantages
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from the same disadvantages as simple random sampling
Non-random sampling methods
- Quota sampling
- Opportunity (convenience) sampling
Quota sampling
An interviewer or researcher selects a sample that reflects the characteristics of the whole population
Quota sampling advantages
- Allows a small sample to still be representative of the population
- No sampling frame required
- Quick, easy and inexpensive
- Allows for easy comparison between different groups within a population
Opportunity (convenience) sampling
The sample is taken from people who are available at the time the study is carried out and who fir the criteria you are looking for.
Quota sampling disadvantages
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Increasing scope of study increases number of groups, which adds time and expense
- Non-responses are not recorded as such
Opportunity (convenience) sampling advantages
- Easy to carry out
- Inexpensive
Opportunity (convenience) sampling disadvantages
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
Quantitative
Quantitative data is variables that are associated with numerical observations, e.g. height
Qualitative
Qualitative data is variables that are associated with non-numerical observations, e.g. hair colour
Continuous
Any value in a given range
Discrete
Only specific values in a given range
Classes
When data is presented in a grouped frequency table, the specific data values are not shown. The groups are more commonly known as classes.
Class boundaries
The class boundaries tell you the maximum and minimum values that belong in each class
Midpoint
The midpoint is the average of the class boundaries
Class width
The class width is the difference between the class boundaries
The large data set
The weather data provided is set over two periods of time (May to October in 1987 and in 2015) and is for five UK and three overseas weather stations.
Leuchars
UK, Northern, Coastal
Leeming
UK, Northern, Inland
Heathrow
UK, Southern, Inland
Hurn
UK, Southern, Coastal
Camborne
UK, Southern, Coastal
Jacksonville
Worldwide, Northern Hemisphere, Coastal
Beijing
Worldwide, Northern Hemisphere, Inland
Perth
Worldwide, Southern Hemisphere, Coastal
Daily mean temperature
in °C - this is the average of the hourly temperature readings during a 24-hour period.
Daily total rainfall
Including solid precipitations such as snow and hail, which is melted before being included in any measurements.
tr
rainfall amounts less than 0.05 mm are recorded as 'tr' or 'trace'
Daily total sunshine
recorded to the nearest tenth of an hour
Daily mean wind direction
Mean wind directions are given as bearings and as cardinal directions.
Daily mean windspeed
in knots, averaged over 24 hours from midnight to midnight. The data for mean windspeed is also categorised according to Beaufort scale.
Beaufort scale
1: Calm, less than 1 knot
1-3: Light, 1 to 10 knots
4: Moderate, 11 to 16 knots
5: Fresh, 17 to 21 knots
Daily maximum gust
in knots - this is the highest instantaneous windspeed recorded. The direction from which the maximum gust was blowing is also recorded.
Knot
A knot (kn) is a 'nautical mile per hour'. 1 kn = 1.15 mph
Daily maximum relative humidity
Given as a percentage of air saturation with water vapour. Relative humidities above 95% give rise to misty and foggy conditions.
Daily mean cloud cover
Measured in oktas or eighths of the sky covered by cloud
Daily mean visibility
Measured in decametres (Dm). This is the greatest horizontal distance at which an object can be seen in daylight.
Daily mean pressure
Measured in hectopascals (hPa)
If you need to calculations on the large data set in your exam,
the relevant extract from the data set will be provided
Countif
Command in a spreadsheet to work out the frequency in each class
Measure of location
A single value which describes a position in a data set.
Measure of central tendency
If the single value describes the centre of the data.
Mode
The mode is the value or class that occurs most often
Median
The middle value when the data values are put in order
Mean
sum of data values over number of data values
X bar
represents the mean of the data
Frequency table mean
sum of the products of the data values and their frequencies/sum of their frequencies
Quartiles
- Lower Quartile
- Upper Quartile
Percentiles
Split the data set into 100 parts.
Lower quartile
One quarter of the way through the data set
Upper quartile
Three quarters of the way through the data set
To find the lower quartile for discrete data
divide n by 4. If this is a whole number, the lower quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point.
To find the upper quartile of discrete data
find 3/4 of n. If this is a whole number, the upper quartile is halfway between this point and the one above. If it is not a whole number, round up and pick this data point.
Age is always rounded
down
Interpolation
When data are presented in a grouped frequency table you can use interpolation to estimate the median, quartiles and percentiles.
When you use interpolation you are assuming the data values are
evenly distributed within each class.
Quartiles for grouped continuous data
Q1 = n/4th data value
Q2 = n/2th data value
Q3 = 3n/4th data value
For finding quartiles in an ungrouped frequency tabled
use the rules for discrete data
Range
Difference between the largest and smallest values in the data set
Measures of spread names
- Measures of dispersion
- Measures of variation
Interquartile range (IQR)
The difference between the upper quartile and the lower quartile
Q3 - Q1
Interpecentile range
The difference between the values for two given percentiles.
Variance
Each data point deviates from the mean by the amount x - ⨲
Variance =
Σx²/n - (Σx/n)²
Sxx =
Σx² - (Σx)²/n
The standard deviation is
The square root of the variance
σ =
√(Σx²/n - (Σx/n)²)
σ² in a frequency table
Σfx²/Σf - (Σfx/Σf)²
σ in a frequency table
√(Σfx²/Σf - (Σfx/Σf)²)
f in standard deviation
frequency for each group
Σf in standard deviation
total frequency
Calculate estimates for the variance and standard deviation of the data in a grouped frequency table using
the midpoint of each class interval
Sxx
summary statistic
If data is coded using the formula y = x-a/b
- The mean of the coded data is given by Ý = ⨲-a/b
- The standard deviation of the coded data is given by σy = σx/b, where ox is the standard deviation of the original data.
To find the original data from the coded data
rearrange the formulae:
- ⨲ = bÝ + a
- σx = bσy
Outlier
- Either greater than Q3 + k(Q3 - Q1)
- Or less than Q1 - k(Q3 - Q1)
Cleaning the data
The process of removing anomalies from a data set.
Anomalies
Where an outlier should be removed from the data since it is clearly an error and ti would be misleading to keep it in.
Cumulative frequency diagram
If you are given data in a grouped frequency table, you are not able to find the exact values of the median and quartiles. The diagram can help find estimates for the mean, quartiles and percentiles.