stats

0.0(0)

Studied by 14 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/81

Earn XP

Description and Tags

Statistics

A-Level Statistics

Edexcel

12th

Last updated 2:11 AM on 2/16/23

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

82 Terms

New cards

Population

The whole set of items that are of interest

New cards

Census

Observes or measures every member of a population

New cards

Sample

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole

New cards

Census - Adv & Disadv

Adv
- Completely accurate

Disadv
- Time consuming & expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantity of data

New cards

Sample - Adv & Disadv

Adv
- Less time consuming & less expensive than a census
- Fewer people have to respond
- Less data to process than in a census

Disadv
- Data may not be as accurate
- May not be large enough to reflect about subsets in population

New cards

Sampling units

Individual units of a population

New cards

Sampling frame

Sampling units of a population individually named or numbered to form a list

New cards

Simple random sampling

Number the list from 001 to \______
Select x random numbers using random number generator
Ignore repeats
Continue until you have x numbers
Select corresponding items from the data sheet

New cards

Systematic sampling

The required elements are chosen at regular intervals from an ordered list

New cards

Stratified sampling

The population is divided into mutually exclusive strata and a random sample is taken from each
- proportion of each strata sampled should be the same

New cards

Stratified sampling formula

The number sampled in a stratum \= (number in stratum / number in population) x overall sample size

New cards

Simple random sampling - Adv & Disadv

Adv
- Free of bias
- Easy & cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection

Disadv
- Not suitable when the population size or the sample size is large
- A sampling frame is needed

New cards

Systematic sampling - Adv & Disadv

Adv
- Simple and quick to use
- Suitable for large samples and large populations

Disadv
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random

New cards

Stratified sampling - Adv & Disadv

Adv
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population

Disadv
- Population must be clearly classified into distinct strata
- Not suitable when the population size or the sample size is large
- A sampling frame is needed

New cards

Quota sampling

How many members of each group you wish to sample is decided in advance and opportunity sampling is used until you have a large enough sample for each group

New cards

Opportunity sampling

Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for

New cards

Quantitative variable

Data associated with numerical observations

New cards

Qualitative variable

Data associated with non-numerical observations

New cards

Mode / Modal class

-Qualitative and quantitative data
-The value or class that occurs most often
-Not informative if each value occurs once

New cards

Median (Q2)

-((n+1)/2)th term
-The middle value when the data values are put in order
-Quantitative data
-Not affected by extreme values

New cards

Mean (x̄)

-Average of values
-Quantitative data
-Uses all data
-Affected by extreme values

x̄\= Σx / n

New cards

Mean (frequency table)

x̄ \= Σxf / Σf
x \= midpoint of each class interval

New cards

Lower quartile

Is one-quarter of the way through the data set

New cards

Upper quartile

Is three-quarters of the way through the data set

New cards

Calculator

Menu
2
List 1 - Values
List 2 - Frequencies
F2 (CALC)
1VAR

New cards

Interpolation

Make predictions of dependent variable withing the range if given data

New cards

Extrapolation

Make predictions of dependent variable outside range of given valies(not as accurate)

New cards

Range

The difference between the largest and smallest values in the data set

New cards

Interquartile range

The difference between the upper quartile and the lower quartile, Q₃ - Q₁

New cards

Interpercentile range

The difference between the values for two given percentiles

New cards

Variance

σ² \= Σ(x - x̄)² / n
σ² \= (Σx² / n) - (Σx/n)²

'the mean of the squares minus the square of the mean'

New cards

Standard deviation

Square root of the variance
σ \= √(Σ(x - x̄)² / n)
σ \= √((Σx² / n) - (Σx/n)²)

New cards

Variance (frequency table)

σ² \= Σf(x - x̄)² / Σf \= (Σfx² / Σf) - (Σfx / Σf)²

New cards

Standard deviation (frequency table)

σ \= √(Σf(x - x̄)² / Σf) \= √((Σfx² / Σf) - (Σfx / Σf)²)

New cards

Outlier

An extreme value that lies outside the overall pattern of the data

Greater than Q₃ : Q₃ + 1.5Q₃ - Q₁)
Less than Q₁ : Q₁ - 1.5(Q₃ - Q₁)

New cards

Keep Outlier

Outliers may indicate natural variation and is still a piece of data to keep

May be the result of errors in measuring or recording data

New cards

Cleaning the data

Removing anomalies from a data set

New cards

Histogram

Can be used to represent grouped continuous data
- area of the bar is proportional to the frequency in each class
- Can be scaled

New cards

Histogram formulas

area of bar \= k x frequency

frequency density \= frequency / class width

New cards

Frequency Polygon

Midpoint
Straight Line

New cards

Cumulative Frequency

Upper Limit
Curve

New cards

Histogram and Frequency Polygon

Join the middle of the top of each bar in the histogram to form a frequency polygon

New cards

Comparing data

Comment on:
- Interquartile range
(less/more precise?)
- Median
(On average has a higher/lower\____)
-Outliers
-Positively/Negatively skewed

New cards

Strong negative correlation

New cards

Weak negative correlation

New cards

Weak positive correlation

New cards

Strong positive correlation

New cards

Correlation

Describes the nature of the linear relationship between two variables
"With__outliers"
"The higher the \___the higher/lower the\___ between \___ and \___"

New cards

Bivariate data

Data which has pairs of values for two variables

New cards

Regression line

Line of y on x is written in the form y \= a + bx
Y can be predicted from X

New cards

Regression line interpretation

y\=a+bx
"If the (x in words) increases by 1 (Unit on axis) then (y in words) increases/decreases by (value of b ignore sign)(unit on axis)"

"If (x in words) is 0 (unit on axis) then (y in words) is (value of a)(unit on y axis)

New cards

Dependent (response) Variable

Y-axis
Researcher measures variable
Found from x-axis

New cards

Independent (explanatory) Variable

X-axis
Researcher controls variable

New cards

Venn diagrams

Can be used to represent events graphically
- frequencies or probabilities can be placed in the regions of the Venn diagrams

New cards

Intersection

A & B (A ∩ B)

New cards

Union

A or B (A ∪ B)

New cards

Complement

P(not A) \= 1 - P(A), A'

New cards

Mutually exclusive events

Both can't happen at the same time
P(A and B) \= 0
P(A or B) \= P(A) + P(B)

New cards

Independent events

When one event happens, it doesn't affect the probability of the other happening
P(A and B) \= P(A) x P(B)

New cards

Random variable

A variable whose value depends on the outcome of a random event

New cards

Probability distribution

Shows all the values of a variable (x) abd their probabilities

New cards

Probability mass function

P(X \= x)

New cards

Interval Length Equation

Amount of items in a population ÷ Sample size

New cards

Cluster Sampling

Split the population into clusters. Select a set amount of these clusters at random then take a simple random sample from each of these clusters

New cards

Cluster Sampling Adv & Disadv

Adv
-Easy to carry out
-Inexpensive
Disadv
-Bias
-Members of the population aren't equally likely to be selected as the probability depends on size(Larger-Less likely)
-Population must be divided into clusters which can be costly
-Increasing scope of study increases clusters which adds time and expense

New cards

Box Plot

Median
LQ
UQ
Lowest value that isn't an outlier
Highest value that isn't an outlier
Outlier (x)
Skew

New cards

Discrete Datas

Daya that takes values which change in steps (e.g.shoe size)

New cards

Random Variable

Variable whose value is determined by chance

New cards

Binomial Distribution (Conditions)

1. Binary? Trials can be classified as success/failure
2. Independent? Trials must be independent.
3. Number? The number of trials (n) must be fixed in advance
4. Success? The probability of success (p) must be the same for each trial.

New cards

Binomial Probability Formula

P(x)\= (nCx) (p^x) (1-p)^n-x

New cards

Distrubution of x

x~B(n,p)
p \= probability
n \= number of trials

New cards

Binomial mean

Np
n \= number of trials
p \= probability

New cards

binomial standard deviation

square root of np(1-p)

New cards

Binomial variance

np(1-p)

New cards

Null Hypothesis (H0)

Hypothesis you assume to be correct
(H0 : p \= )

New cards

Alternative hypothesis (H1) One tailed test

Tells you about the parameter if your assumption is shown to be wrong
(H1 : p

New cards

Reject null hypothesis

To carry out a hypothesis test, you assume the null hypothesis is true and likliness for it to occur. If the likliness is < significance level you reject null hypothesis

New cards

significance level

Probability threshold
Uaually 10% 5% 1%

New cards

critical region

the area in the tails of the comparison distribution in which the null hypothesis can be rejected
How many before we're below significance level

New cards

Acceptance region

The region where we accept the null hypothesis

New cards

Test the claim

1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find P(X