1/57
Topic 1 and 2
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is a population
All members of the set that are being studied
What is a sample
A smaller subset of a population that is used to draw conclusions about the population
What does a census do
Measures every member of a population
What is raw data
Unprocessed information
Advantages of using a population
Accurate result
Disadvantages of using a population
Time consuming
Expensive
Hard to process that much data
Testing can destroy all the data
Advantages of using a sample
Faster
Cheaper
Less data to process
Fewer people have to respond
Disadvantages of using a sample
Not as accurate 
small sub-groups may not be properly represented
What is simple random sampling
Each member of the population has an equal probability of being included in the sample
What is a sample frame
Members of a population are given a name or number
What is systematic sampling
Sample members are selected from a larger population according to a random stating point and a fixed sampling interval
How is a sample interval formed
By dividing the population size by the desired sample size
What is stratified sampling
The population is sorted into mutually exclusive groups and a proportional sample is taken from each group
How to find the number sampled in a group
( number in group / number in population ) x overall sample size
What is opportunity sampling
Taking people from the population that are available at the time and willing to take part in the survey
Advantages of opportunity sampling
Easy to carry out
Cheap
Disadvantages of opportunity sampling
Unlikely to provide a representative sample
Dependent on the researcher
Advantages of simple random sampling
Free of bias
Simple cheap from small samples and populations
Each sampling unit has an equal and known chance of being selected
Advantages of systematic sampling
Simple
Fast
Good for large samples and populations
Advantages of stratified sampling
Sample accurately represents population structure
Guarantees proportional representation of groups in a population
Disadvantages of stratified sampling
Population has to be clearly classified into distinct groups
Selection within each group requires a sampling frame and can be time consuming, disruptive and expensive if the groups are large
Disadvantages of systematic sampling
Sampling frame is needed
Bias if sampling frame is not random
Disadvantages of simple random sampling
Sampling frame is needed
Not suitable if population size is large as then will be time consuming, disruptive and expensive
What is bias
When something is not fair or representative
What will an ideal sample do
Be large enough
Represent the population
Be unbiased
Things to consider when reviewing data from samples
Sample size
Where the sample is taken
Time of day the sample was taken
What is the IQR in a box plot diagram
The box
What is the range in a box plot diagram
End to end of the whiskers
What does cumulative mean
running total
How to find the total population using histograms
All areas of the bars together
What is Bi-variate data
Data that has pairs of values for different variables
What do scatter diagrams represent
It is a visual representation of any relationship between two variables
What is correlation
A measure of how well two variables are related to each other
How is strong correlation represented
The points lie close to the regression line
How is weak correlation represented
The points do not lie close to the regression line
What is a regression line and what does it show
A line of best fit models the relationship between two variables
What is Interpolation
Using the data we have got to find an estimate value
What is Extrapolation
Making an assumption that the regression line is true for all values to find an unknown value beyond the range of known data
Does correlation imply causation?
No
What is a measure of central tendancy
A single value in a list of values that describes the center of the data
Advantages of using the mean
All data values are used so all values are taken into account
Advantages of using the mode
Can be used with data that is not numeric
Advantages of using the median
Not affected by extreme values so good for outliers
Disadvantages of using the mean
Affected by extreme values
Only useful with numeric data
Disadvantages of using the mode
May be no mode
May not represent the data well
Can be different modes
Disadvantages of using the median
Can take a long time to order all the data
What is central variation
A measure of the spread of data
What is variance
A statistic that measures how far each value in the set of data is from the mean
What is standard deviation
The square root of the variance
What are summary statistics
Information that gives a brief description of the data
What is the act of dealing with errors
Detecting and correcting or removing data with errors
What is an outlier
A value that does not follow the pattern of the data
How to spot an outlier
An outlier is any value that is smaller that the LQ-1.5xIQR or larger than UQ+1.5xIQR
What does missing data do
Makes a sample less representative
Ways to deal with missing data
Delete the samples with missing data elements
Impute the value of missing data
Remove a variable
What does deleting missing data do
Creates a smaller sample size and may end up not representing the whole population
How to impute missing data
Substitute in data values for a similar sample
Use the mean from all other values for the same statistic
Use regression techniques to predict values based on the relationship between the variable and other varibales
When would you remove a variable
If a particular question has a high amount of missing data