stats

studied byStudied by 14 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 81

flashcard set

Earn XP

Description and Tags

82 Terms

1
Population
The whole set of items that are of interest
New cards
2
Census
Observes or measures every member of a population
New cards
3
Sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
New cards
4
Census - Adv & Disadv

Adv

  • Completely accurate

Disadv

  • Time consuming & expensive

  • Cannot be used when the testing process destroys the item

  • Hard to process large quantity of data

New cards
5
Sample - Adv & Disadv

Adv

  • Less time consuming & less expensive than a census

  • Fewer people have to respond

  • Less data to process than in a census

Disadv

  • Data may not be as accurate

  • May not be large enough to reflect about subsets in population

New cards
6
Sampling units
Individual units of a population
New cards
7
Sampling frame
Sampling units of a population individually named or numbered to form a list
New cards
8
Simple random sampling
Number the list from 001 to \______
Select x random numbers using random number generator
Ignore repeats
Continue until you have x numbers
Select corresponding items from the data sheet
New cards
9
Systematic sampling
The required elements are chosen at regular intervals from an ordered list
New cards
10
Stratified sampling
The population is divided into mutually exclusive strata and a random sample is taken from each
- proportion of each strata sampled should be the same
New cards
11
Stratified sampling formula
The number sampled in a stratum \= (number in stratum / number in population) x overall sample size
New cards
12
Simple random sampling - Adv & Disadv

Adv

  • Free of bias

  • Easy & cheap to implement for small populations and small samples

  • Each sampling unit has a known and equal chance of selection

Disadv

  • Not suitable when the population size or the sample size is large

  • A sampling frame is needed

New cards
13
Systematic sampling - Adv & Disadv

Adv

  • Simple and quick to use

  • Suitable for large samples and large populations

Disadv

  • A sampling frame is needed

  • It can introduce bias if the sampling frame is not random

New cards
14
Stratified sampling - Adv & Disadv

Adv

  • Sample accurately reflects the population structure

  • Guarantees proportional representation of groups within a population

Disadv

  • Population must be clearly classified into distinct strata

  • Not suitable when the population size or the sample size is large

  • A sampling frame is needed

New cards
15
Quota sampling
How many members of each group you wish to sample is decided in advance and opportunity sampling is used until you have a large enough sample for each group
New cards
16
Opportunity sampling
Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
New cards
17
Quantitative variable
Data associated with numerical observations
New cards
18
Qualitative variable
Data associated with non-numerical observations
New cards
19
Mode / Modal class
-Qualitative and quantitative data
-The value or class that occurs most often
-Not informative if each value occurs once
New cards
20
Median (Q2)
-((n+1)/2)th term
-The middle value when the data values are put in order
-Quantitative data
-Not affected by extreme values
New cards
21
Mean (x̄)
-Average of values
-Quantitative data
-Uses all data
-Affected by extreme values

x̄\= Σx / n
New cards
22
Mean (frequency table)
x̄ \= Σxf / Σf
x \= midpoint of each class interval
New cards
23
Lower quartile
Is one-quarter of the way through the data set
New cards
24
Upper quartile
Is three-quarters of the way through the data set
New cards
25
Calculator
Menu
2
List 1 - Values
List 2 - Frequencies
F2 (CALC)
1VAR
New cards
26
Interpolation
Make predictions of dependent variable withing the range if given data
New cards
27
Extrapolation
Make predictions of dependent variable outside range of given valies(not as accurate)
New cards
28
Range
The difference between the largest and smallest values in the data set
New cards
29
Interquartile range
The difference between the upper quartile and the lower quartile, Q₃ - Q₁
New cards
30
Interpercentile range
The difference between the values for two given percentiles
New cards
31
Variance
σ² \= Σ(x - x̄)² / n
σ² \= (Σx² / n) - (Σx/n)²

'the mean of the squares minus the square of the mean'
New cards
32
Standard deviation
Square root of the variance
σ \= √(Σ(x - x̄)² / n)
σ \= √((Σx² / n) - (Σx/n)²)
New cards
33
Variance (frequency table)
σ² \= Σf(x - x̄)² / Σf \= (Σfx² / Σf) - (Σfx / Σf)²
New cards
34
Standard deviation (frequency table)
σ \= √(Σf(x - x̄)² / Σf) \= √((Σfx² / Σf) - (Σfx / Σf)²)
New cards
35
Outlier
An extreme value that lies outside the overall pattern of the data

Greater than Q₃ : Q₃ + 1.5Q₃ - Q₁)
Less than Q₁ : Q₁ - 1.5(Q₃ - Q₁)
New cards
36
Keep Outlier
Outliers may indicate natural variation and is still a piece of data to keep

May be the result of errors in measuring or recording data
New cards
37
Cleaning the data
Removing anomalies from a data set
New cards
38
Histogram

Can be used to represent grouped continuous data

  • area of the bar is proportional to the frequency in each class

  • Can be scaled

New cards
39
Histogram formulas
area of bar \= k x frequency

frequency density \= frequency / class width
New cards
40
Frequency Polygon
Midpoint
Straight Line
New cards
41
Cumulative Frequency
Upper Limit
Curve
New cards
42
Histogram and Frequency Polygon
Join the middle of the top of each bar in the histogram to form a frequency polygon
New cards
43
Comparing data

Comment on:

  • Interquartile range (less/more precise?)

  • Median (On average has a higher/lower____) -Outliers -Positively/Negatively skewed

New cards
44
Strong negative correlation

New cards
45
Weak negative correlation

New cards
46
Weak positive correlation

New cards
47
Strong positive correlation

New cards
48
Correlation
Describes the nature of the linear relationship between two variables
"With__outliers"
"The higher the \___the higher/lower the\___ between \___ and \___"
New cards
49
Bivariate data
Data which has pairs of values for two variables
New cards
50
Regression line
Line of y on x is written in the form y \= a + bx
Y can be predicted from X
New cards
51
Regression line interpretation
y\=a+bx
"If the (x in words) increases by 1 (Unit on axis) then (y in words) increases/decreases by (value of b ignore sign)(unit on axis)"

"If (x in words) is 0 (unit on axis) then (y in words) is (value of a)(unit on y axis)
New cards
52
Dependent (response) Variable
Y-axis
Researcher measures variable
Found from x-axis
New cards
53
Independent (explanatory) Variable
X-axis
Researcher controls variable
New cards
54
Venn diagrams
Can be used to represent events graphically
- frequencies or probabilities can be placed in the regions of the Venn diagrams
New cards
55
Intersection
A & B (A ∩ B)
New cards
56
Union
A or B (A ∪ B)
New cards
57
Complement
P(not A) \= 1 - P(A), A'
New cards
58
Mutually exclusive events
Both can't happen at the same time
P(A and B) \= 0
P(A or B) \= P(A) + P(B)
New cards
59
Independent events
When one event happens, it doesn't affect the probability of the other happening
P(A and B) \= P(A) x P(B)
New cards
60
Random variable
A variable whose value depends on the outcome of a random event
New cards
61
Probability distribution
Shows all the values of a variable (x) abd their probabilities
New cards
62
Probability mass function
P(X \= x)
New cards
63
Interval Length Equation
Amount of items in a population ÷ Sample size
New cards
64
Cluster Sampling
Split the population into clusters. Select a set amount of these clusters at random then take a simple random sample from each of these clusters
New cards
65
Cluster Sampling Adv & Disadv
Adv
-Easy to carry out
-Inexpensive
Disadv
-Bias
-Members of the population aren't equally likely to be selected as the probability depends on size(Larger-Less likely)
-Population must be divided into clusters which can be costly
-Increasing scope of study increases clusters which adds time and expense
New cards
66
Box Plot
Median
LQ
UQ
Lowest value that isn't an outlier
Highest value that isn't an outlier
Outlier (x)
Skew
New cards
67
Discrete Datas
Daya that takes values which change in steps (e.g.shoe size)
New cards
68
Random Variable
Variable whose value is determined by chance
New cards
69
Binomial Distribution (Conditions)
  1. Binary? Trials can be classified as success/failure

  2. Independent? Trials must be independent.

  3. Number? The number of trials (n) must be fixed in advance

  4. Success? The probability of success (p) must be the same for each trial.

New cards
70
Binomial Probability Formula
P(x)\= (nCx) (p^x) (1-p)^n-x
New cards
71
Distrubution of x
x~B(n,p)
p \= probability
n \= number of trials
New cards
72
Binomial mean
Np
n \= number of trials
p \= probability
New cards
73
binomial standard deviation
square root of np(1-p)
New cards
74
Binomial variance
np(1-p)
New cards
75
Null Hypothesis (H0)
Hypothesis you assume to be correct
(H0 : p \= )
New cards
76
Alternative hypothesis (H1) One tailed test
Tells you about the parameter if your assumption is shown to be wrong
(H1 : p
New cards
77
Reject null hypothesis
To carry out a hypothesis test, you assume the null hypothesis is true and likliness for it to occur. If the likliness is < significance level you reject null hypothesis
New cards
78
significance level
Probability threshold
Uaually 10% 5% 1%
New cards
79
critical region
the area in the tails of the comparison distribution in which the null hypothesis can be rejected
How many before we're below significance level
New cards
80
Acceptance region
The region where we accept the null hypothesis
New cards
81
Test the claim
1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find P(X
New cards
82
Test the claim (Two tailed test)
1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find where the bias is (pn)\>x/
New cards
robot