stats

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/81

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

82 Terms

1
New cards
Population
The whole set of items that are of interest
2
New cards
Census
Observes or measures every member of a population
3
New cards
Sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
4
New cards
Census - Adv & Disadv
Adv
- Completely accurate

Disadv
- Time consuming & expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantity of data
5
New cards
Sample - Adv & Disadv
Adv
- Less time consuming & less expensive than a census
- Fewer people have to respond
- Less data to process than in a census

Disadv
- Data may not be as accurate
- May not be large enough to reflect about subsets in population
6
New cards
Sampling units
Individual units of a population
7
New cards
Sampling frame
Sampling units of a population individually named or numbered to form a list
8
New cards
Simple random sampling
Number the list from 001 to \______
Select x random numbers using random number generator
Ignore repeats
Continue until you have x numbers
Select corresponding items from the data sheet
9
New cards
Systematic sampling
The required elements are chosen at regular intervals from an ordered list
10
New cards
Stratified sampling
The population is divided into mutually exclusive strata and a random sample is taken from each
- proportion of each strata sampled should be the same
11
New cards
Stratified sampling formula
The number sampled in a stratum \= (number in stratum / number in population) x overall sample size
12
New cards
Simple random sampling - Adv & Disadv
Adv
- Free of bias
- Easy & cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection

Disadv
- Not suitable when the population size or the sample size is large
- A sampling frame is needed
13
New cards
Systematic sampling - Adv & Disadv
Adv
- Simple and quick to use
- Suitable for large samples and large populations

Disadv
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random
14
New cards
Stratified sampling - Adv & Disadv
Adv
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population

Disadv
- Population must be clearly classified into distinct strata
- Not suitable when the population size or the sample size is large
- A sampling frame is needed
15
New cards
Quota sampling
How many members of each group you wish to sample is decided in advance and opportunity sampling is used until you have a large enough sample for each group
16
New cards
Opportunity sampling
Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
17
New cards
Quantitative variable
Data associated with numerical observations
18
New cards
Qualitative variable
Data associated with non-numerical observations
19
New cards
Mode / Modal class
-Qualitative and quantitative data
-The value or class that occurs most often
-Not informative if each value occurs once
20
New cards
Median (Q2)
-((n+1)/2)th term
-The middle value when the data values are put in order
-Quantitative data
-Not affected by extreme values
21
New cards
Mean (x̄)
-Average of values
-Quantitative data
-Uses all data
-Affected by extreme values

x̄\= Σx / n
22
New cards
Mean (frequency table)
x̄ \= Σxf / Σf
x \= midpoint of each class interval
23
New cards
Lower quartile
Is one-quarter of the way through the data set
24
New cards
Upper quartile
Is three-quarters of the way through the data set
25
New cards
Calculator
Menu
2
List 1 - Values
List 2 - Frequencies
F2 (CALC)
1VAR
26
New cards
Interpolation
Make predictions of dependent variable withing the range if given data
27
New cards
Extrapolation
Make predictions of dependent variable outside range of given valies(not as accurate)
28
New cards
Range
The difference between the largest and smallest values in the data set
29
New cards
Interquartile range
The difference between the upper quartile and the lower quartile, Q₃ - Q₁
30
New cards
Interpercentile range
The difference between the values for two given percentiles
31
New cards
Variance
σ² \= Σ(x - x̄)² / n
σ² \= (Σx² / n) - (Σx/n)²

'the mean of the squares minus the square of the mean'
32
New cards
Standard deviation
Square root of the variance
σ \= √(Σ(x - x̄)² / n)
σ \= √((Σx² / n) - (Σx/n)²)
33
New cards
Variance (frequency table)
σ² \= Σf(x - x̄)² / Σf \= (Σfx² / Σf) - (Σfx / Σf)²
34
New cards
Standard deviation (frequency table)
σ \= √(Σf(x - x̄)² / Σf) \= √((Σfx² / Σf) - (Σfx / Σf)²)
35
New cards
Outlier
An extreme value that lies outside the overall pattern of the data

Greater than Q₃ : Q₃ + 1.5Q₃ - Q₁)
Less than Q₁ : Q₁ - 1.5(Q₃ - Q₁)
36
New cards
Keep Outlier
Outliers may indicate natural variation and is still a piece of data to keep

May be the result of errors in measuring or recording data
37
New cards
Cleaning the data
Removing anomalies from a data set
38
New cards
Histogram
Can be used to represent grouped continuous data
- area of the bar is proportional to the frequency in each class
- Can be scaled
39
New cards
Histogram formulas
area of bar \= k x frequency

frequency density \= frequency / class width
40
New cards
Frequency Polygon
Midpoint
Straight Line
41
New cards
Cumulative Frequency
Upper Limit
Curve
42
New cards
Histogram and Frequency Polygon
Join the middle of the top of each bar in the histogram to form a frequency polygon
43
New cards
Comparing data
Comment on:
- Interquartile range
(less/more precise?)
- Median
(On average has a higher/lower\____)
-Outliers
-Positively/Negatively skewed
44
New cards
Strong negative correlation

45
New cards
Weak negative correlation

46
New cards
Weak positive correlation

47
New cards
Strong positive correlation

48
New cards
Correlation
Describes the nature of the linear relationship between two variables
"With__outliers"
"The higher the \___the higher/lower the\___ between \___ and \___"
49
New cards
Bivariate data
Data which has pairs of values for two variables
50
New cards
Regression line
Line of y on x is written in the form y \= a + bx
Y can be predicted from X
51
New cards
Regression line interpretation
y\=a+bx
"If the (x in words) increases by 1 (Unit on axis) then (y in words) increases/decreases by (value of b ignore sign)(unit on axis)"

"If (x in words) is 0 (unit on axis) then (y in words) is (value of a)(unit on y axis)
52
New cards
Dependent (response) Variable
Y-axis
Researcher measures variable
Found from x-axis
53
New cards
Independent (explanatory) Variable
X-axis
Researcher controls variable
54
New cards
Venn diagrams
Can be used to represent events graphically
- frequencies or probabilities can be placed in the regions of the Venn diagrams
55
New cards
Intersection
A & B (A ∩ B)
56
New cards
Union
A or B (A ∪ B)
57
New cards
Complement
P(not A) \= 1 - P(A), A'
58
New cards
Mutually exclusive events
Both can't happen at the same time
P(A and B) \= 0
P(A or B) \= P(A) + P(B)
59
New cards
Independent events
When one event happens, it doesn't affect the probability of the other happening
P(A and B) \= P(A) x P(B)
60
New cards
Random variable
A variable whose value depends on the outcome of a random event
61
New cards
Probability distribution
Shows all the values of a variable (x) abd their probabilities
62
New cards
Probability mass function
P(X \= x)
63
New cards
Interval Length Equation
Amount of items in a population ÷ Sample size
64
New cards
Cluster Sampling
Split the population into clusters. Select a set amount of these clusters at random then take a simple random sample from each of these clusters
65
New cards
Cluster Sampling Adv & Disadv
Adv
-Easy to carry out
-Inexpensive
Disadv
-Bias
-Members of the population aren't equally likely to be selected as the probability depends on size(Larger-Less likely)
-Population must be divided into clusters which can be costly
-Increasing scope of study increases clusters which adds time and expense
66
New cards
Box Plot
Median
LQ
UQ
Lowest value that isn't an outlier
Highest value that isn't an outlier
Outlier (x)
Skew
67
New cards
Discrete Datas
Daya that takes values which change in steps (e.g.shoe size)
68
New cards
Random Variable
Variable whose value is determined by chance
69
New cards
Binomial Distribution (Conditions)
1. Binary? Trials can be classified as success/failure
2. Independent? Trials must be independent.
3. Number? The number of trials (n) must be fixed in advance
4. Success? The probability of success (p) must be the same for each trial.
70
New cards
Binomial Probability Formula
P(x)\= (nCx) (p^x) (1-p)^n-x
71
New cards
Distrubution of x
x~B(n,p)
p \= probability
n \= number of trials
72
New cards
Binomial mean
Np
n \= number of trials
p \= probability
73
New cards
binomial standard deviation
square root of np(1-p)
74
New cards
Binomial variance
np(1-p)
75
New cards
Null Hypothesis (H0)
Hypothesis you assume to be correct
(H0 : p \= )
76
New cards
Alternative hypothesis (H1) One tailed test
Tells you about the parameter if your assumption is shown to be wrong
(H1 : p
77
New cards
Reject null hypothesis
To carry out a hypothesis test, you assume the null hypothesis is true and likliness for it to occur. If the likliness is < significance level you reject null hypothesis
78
New cards
significance level
Probability threshold
Uaually 10% 5% 1%
79
New cards
critical region
the area in the tails of the comparison distribution in which the null hypothesis can be rejected
How many before we're below significance level
80
New cards
Acceptance region
The region where we accept the null hypothesis
81
New cards
Test the claim
1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find P(X
82
New cards
Test the claim (Two tailed test)
1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find where the bias is (pn)\>x/