stats

studied byStudied by 14 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 81

flashcard set

Earn XP

Description and Tags

82 Terms

1

Population

The whole set of items that are of interest

New cards
2

Census

Observes or measures every member of a population

New cards
3

Sample

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole

New cards
4

Census - Adv & Disadv

Adv

  • Completely accurate

Disadv

  • Time consuming & expensive

  • Cannot be used when the testing process destroys the item

  • Hard to process large quantity of data

New cards
5

Sample - Adv & Disadv

Adv

  • Less time consuming & less expensive than a census

  • Fewer people have to respond

  • Less data to process than in a census

Disadv

  • Data may not be as accurate

  • May not be large enough to reflect about subsets in population

New cards
6

Sampling units

Individual units of a population

New cards
7

Sampling frame

Sampling units of a population individually named or numbered to form a list

New cards
8

Simple random sampling

Number the list from 001 to ______ Select x random numbers using random number generator Ignore repeats Continue until you have x numbers Select corresponding items from the data sheet

New cards
9

Systematic sampling

The required elements are chosen at regular intervals from an ordered list

New cards
10

Stratified sampling

The population is divided into mutually exclusive strata and a random sample is taken from each

  • proportion of each strata sampled should be the same

New cards
11

Stratified sampling formula

The number sampled in a stratum = (number in stratum / number in population) x overall sample size

New cards
12

Simple random sampling - Adv & Disadv

Adv

  • Free of bias

  • Easy & cheap to implement for small populations and small samples

  • Each sampling unit has a known and equal chance of selection

Disadv

  • Not suitable when the population size or the sample size is large

  • A sampling frame is needed

New cards
13

Systematic sampling - Adv & Disadv

Adv

  • Simple and quick to use

  • Suitable for large samples and large populations

Disadv

  • A sampling frame is needed

  • It can introduce bias if the sampling frame is not random

New cards
14

Stratified sampling - Adv & Disadv

Adv

  • Sample accurately reflects the population structure

  • Guarantees proportional representation of groups within a population

Disadv

  • Population must be clearly classified into distinct strata

  • Not suitable when the population size or the sample size is large

  • A sampling frame is needed

New cards
15

Quota sampling

How many members of each group you wish to sample is decided in advance and opportunity sampling is used until you have a large enough sample for each group

New cards
16

Opportunity sampling

Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for

New cards
17

Quantitative variable

Data associated with numerical observations

New cards
18

Qualitative variable

Data associated with non-numerical observations

New cards
19

Mode / Modal class

-Qualitative and quantitative data -The value or class that occurs most often -Not informative if each value occurs once

New cards
20

Median (Q2)

-((n+1)/2)th term -The middle value when the data values are put in order -Quantitative data -Not affected by extreme values

New cards
21

Mean (x̄)

-Average of values -Quantitative data -Uses all data -Affected by extreme values

x̄= Σx / n

New cards
22

Mean (frequency table)

x̄ = Σxf / Σf x = midpoint of each class interval

New cards
23

Lower quartile

Is one-quarter of the way through the data set

New cards
24

Upper quartile

Is three-quarters of the way through the data set

New cards
25

Calculator

Menu 2 List 1 - Values List 2 - Frequencies F2 (CALC) 1VAR

New cards
26

Interpolation

Make predictions of dependent variable withing the range if given data

New cards
27

Extrapolation

Make predictions of dependent variable outside range of given valies(not as accurate)

New cards
28

Range

The difference between the largest and smallest values in the data set

New cards
29

Interquartile range

The difference between the upper quartile and the lower quartile, Q₃ - Q₁

New cards
30

Interpercentile range

The difference between the values for two given percentiles

New cards
31

Variance

σ² = Σ(x - x̄)² / n σ² = (Σx² / n) - (Σx/n)²

'the mean of the squares minus the square of the mean'

New cards
32

Standard deviation

Square root of the variance σ = √(Σ(x - x̄)² / n) σ = √((Σx² / n) - (Σx/n)²)

New cards
33

Variance (frequency table)

σ² = Σf(x - x̄)² / Σf = (Σfx² / Σf) - (Σfx / Σf)²

New cards
34

Standard deviation (frequency table)

σ = √(Σf(x - x̄)² / Σf) = √((Σfx² / Σf) - (Σfx / Σf)²)

New cards
35

Outlier

An extreme value that lies outside the overall pattern of the data

Greater than Q₃ : Q₃ + 1.5Q₃ - Q₁) Less than Q₁ : Q₁ - 1.5(Q₃ - Q₁)

New cards
36

Keep Outlier

Outliers may indicate natural variation and is still a piece of data to keep

May be the result of errors in measuring or recording data

New cards
37

Cleaning the data

Removing anomalies from a data set

New cards
38

Histogram

Can be used to represent grouped continuous data

  • area of the bar is proportional to the frequency in each class

  • Can be scaled

New cards
39

Histogram formulas

area of bar = k x frequency

frequency density = frequency / class width

New cards
40

Frequency Polygon

Midpoint Straight Line

New cards
41

Cumulative Frequency

Upper Limit Curve

New cards
42

Histogram and Frequency Polygon

Join the middle of the top of each bar in the histogram to form a frequency polygon

New cards
43

Comparing data

Comment on:

  • Interquartile range (less/more precise?)

  • Median (On average has a higher/lower____) -Outliers -Positively/Negatively skewed

New cards
44

Strong negative correlation

New cards
45

Weak negative correlation

New cards
46

Weak positive correlation

New cards
47

Strong positive correlation

New cards
48

Correlation

Describes the nature of the linear relationship between two variables "With__outliers" "The higher the _the higher/lower the_ between ___ and ___"

New cards
49

Bivariate data

Data which has pairs of values for two variables

New cards
50

Regression line

Line of y on x is written in the form y = a + bx Y can be predicted from X

New cards
51

Regression line interpretation

y=a+bx "If the (x in words) increases by 1 (Unit on axis) then (y in words) increases/decreases by (value of b ignore sign)(unit on axis)"

"If (x in words) is 0 (unit on axis) then (y in words) is (value of a)(unit on y axis)

New cards
52

Dependent (response) Variable

Y-axis Researcher measures variable Found from x-axis

New cards
53

Independent (explanatory) Variable

X-axis Researcher controls variable

New cards
54

Venn diagrams

Can be used to represent events graphically

  • frequencies or probabilities can be placed in the regions of the Venn diagrams

New cards
55

Intersection

A & B (A ∩ B)

New cards
56

Union

A or B (A ∪ B)

New cards
57

Complement

P(not A) = 1 - P(A), A'

New cards
58

Mutually exclusive events

Both can't happen at the same time P(A and B) = 0 P(A or B) = P(A) + P(B)

New cards
59

Independent events

When one event happens, it doesn't affect the probability of the other happening P(A and B) = P(A) x P(B)

New cards
60

Random variable

A variable whose value depends on the outcome of a random event

New cards
61

Probability distribution

Shows all the values of a variable (x) abd their probabilities

New cards
62

Probability mass function

P(X = x)

New cards
63

Interval Length Equation

Amount of items in a population ÷ Sample size

New cards
64

Cluster Sampling

Split the population into clusters. Select a set amount of these clusters at random then take a simple random sample from each of these clusters

New cards
65

Cluster Sampling Adv & Disadv

Adv -Easy to carry out -Inexpensive Disadv -Bias -Members of the population aren't equally likely to be selected as the probability depends on size(Larger-Less likely) -Population must be divided into clusters which can be costly -Increasing scope of study increases clusters which adds time and expense

New cards
66

Box Plot

Median LQ UQ Lowest value that isn't an outlier Highest value that isn't an outlier Outlier (x) Skew

New cards
67

Discrete Datas

Daya that takes values which change in steps (e.g.shoe size)

New cards
68

Random Variable

Variable whose value is determined by chance

New cards
69

Binomial Distribution (Conditions)

  1. Binary? Trials can be classified as success/failure

  2. Independent? Trials must be independent.

  3. Number? The number of trials (n) must be fixed in advance

  4. Success? The probability of success (p) must be the same for each trial.

New cards
70

Binomial Probability Formula

P(x)= (nCx) (p^x) (1-p)^n-x

New cards
71

Distrubution of x

x~B(n,p) p = probability n = number of trials

New cards
72

Binomial mean

Np n = number of trials p = probability

New cards
73

binomial standard deviation

square root of np(1-p)

New cards
74

Binomial variance

np(1-p)

New cards
75

Null Hypothesis (H0)

Hypothesis you assume to be correct (H0 : p = )

New cards
76

Alternative hypothesis (H1) One tailed test

Tells you about the parameter if your assumption is shown to be wrong
(H1 : p
New cards
77

Reject null hypothesis

To carry out a hypothesis test, you assume the null hypothesis is true and likliness for it to occur. If the likliness is < significance level you reject null hypothesis

New cards
78

significance level

Probability threshold Uaually 10% 5% 1%

New cards
79

critical region

the area in the tails of the comparison distribution in which the null hypothesis can be rejected How many before we're below significance level

New cards
80

Acceptance region

The region where we accept the null hypothesis

New cards
81

Test the claim

1. Define X
2.X~B(n,p)
3.State H0 and H1
4.Find P(X
New cards
82

Test the claim (Two tailed test)

  1. Define X 2.X~B(n,p) 3.State H0 and H1 4.Find where the bias is (pn)>x/<x 5.Half significance level then compare 6.State accept or reject H0 7.Put into context

New cards
robot