STAT1070: Statistics

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 73

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

74 Terms

1

Continuous variables

Numerical values measured as part of a whole and can take on any value e.g. percentages, fractions, times. Time and age are always continuous variables.

New cards
2

Discrete variables

Finite numbers that are counted not measured e.g. people (you can’t have half a person)

New cards
3

Nominal variables

Categorical variables that have no natural order. This also includes numerical data that acts as a symbol e.g. post codes or coded variables (1 = yes, 2 = no or outcomes)

New cards
4

Ordinal variables

Categorical variables with a natural order or scale but not with equal intervals e.g. grades.

New cards
5

Population

The entire group that the researcher is interested in and that is used to select the sample.

New cards
6

Sample

Selection/subset of data (ideally representative) from the population of interest

New cards
7

Paramters

Property of the population that you use the statistic to infer. μ (mean), X (observation), σ (SD), P (proportion) and N (size)

New cards
8

Statistics

Property of the sample that you use to infer parameters. x̄ (mean), n (size), x (observation), s (SD), p-hat (proportion).

New cards
9

Graph for ordinal data.

Bar charts (categories on x axis)

<p>Bar charts (categories on x axis)</p>
New cards
10

Graph for nominal data

Pareto chart (line = cumulative percentage) or bar chart for few categories

<p>Pareto chart (line = cumulative percentage) or bar chart for few categories</p>
New cards
11

Graph for continuous data

Histogram or box plot

<p>Histogram or box plot</p>
New cards
12

Graph for discrete data

bar chart (few outcomes) or histogram (many outcomes or the data is sparse)

New cards
13

Graph for two categorical variables

Clustered bar chart (frequencies) or stacked bar chart (proportions) helps with seeing relative difference

<p>Clustered bar chart (frequencies) or stacked bar chart (proportions) helps with seeing relative difference</p>
New cards
14

P-value

The probability of obtaining a result equal to or more extreme than test statistic if the null hypothesis was true i.e. how likely the observed difference between groups is due to chance. Represents the chance of getting the test statistic or something more extreme, and the test statistic value is how close the data is to the null hypothesis.

New cards
15

Correlation

A relationship between two or more things that is measured by the correlation coefficient (strength (0/1) and direction of relationship (-/+)) which indicates the extent to which changes in one variable are related to changes in another)

New cards
16

Coefficient of determination (R2)

Measure of proportion of variance in the dependent variable that can be explained by the independent variable in a regression model. Represents the goodness-of-fit of the regression model and ranges from 0 to 1 with higher = better fit.

New cards
17

Inter Quartile Range (IQR)

Measures the spread of the middle half (50%) of your distribution excluding the outliers. Found by measuring the range between the first quartile (lower) and the third quartile (upper) (Q3 - Q1)

<p><span>Measures the spread of the middle half (50%) of your distribution excluding the outliers. Found by measuring the range between the first quartile (lower) and the third quartile (upper) (Q<sub>3</sub> - Q<sub>1</sub>)</span></p>
New cards
18

Large test Statistic

It is less likely that your data could have occurred under the null hypothesis

New cards
19

Mean (measure of central tendency)

The average by adding all the data points together and dividing by the number of data points

New cards
20

Median (measure of central tendency)

The middle number of data when sorted into ascending/descending. Divides the data into two equal halves with 50% of observations being above and 50% being below to median.

New cards
21

Non-parametric test

Statistical test that doesn't rely on specific assumptions about distribution of data. Used when data doesn't meet assumptions required for parametric tests. Based on ranks or ordering of data and suitable for analysing categorical or ordinal data.

New cards
22

Power (Beta)

Probability of correctly rejecting a false null hypothesis. Measures the ability of a statistical test/study to detect a true effect or relationship. Higher power = higher likelihood of detecting a true effect

New cards
23

Regression Slope (coefficient or beta coefficient)

Measure of change in dependent variable associated with a one-unit change in independent variable in a regression mode. Represents the slope of the regression line and the strength/direction of the relationship between variables

New cards
24

Rules of probability

Addition rule = P(A or B) = P(A) + P(B) - P(A and B); Conditional rule = P(A|B) = P(A and B)/P(B)

New cards
25

Standard deviation

A measure of dispersion/spread of numerical data that quantifies the average amount of deviation of each data point from the mean. Higher SD = greater variability

New cards
26

Type I error (a)

Null hypothesis is rejected even though it is true (incorrect rejection of a true null hypothesis). It represents probability of falsely concluding a relationship between variables when there is none.

New cards
27

Type II error (b)

Null hypothesis isn't rejected even though it is false (failure to reject a false null hypothesis). It represents the probability of failing to detect relationship between variables when there is one.

New cards
28

Z score

Quantifies distance between a data point and the mean of a data set to show you how many standard deviations the value is from the mean. Allows for standardisation.

New cards
29

Shape

Symmetrical (normal distribution), positive (right) skewed, negative (left) skewed, or uniform. Helps identify appropriate measure of centre/spread: symmetrical = mean and SD, skewed = median and IQR (+ mean and SD). Distributions can also be unimodal = one peak/mode or bi/multi modal.

<p>Symmetrical (normal distribution), positive (right) skewed, negative (left) skewed, or uniform. Helps identify appropriate measure of centre/spread: symmetrical = mean and SD, skewed = median and IQR (+ mean and SD). Distributions can also be unimodal = one peak/mode or bi/multi modal.</p>
New cards
30

Spread

Measured through range, variance, SD, and IQR.

New cards
31

Variance

The spread between numbers in a data set. Determines how far each number is from the mean and other numbers in the set. Used to determine SD.

New cards
32

Observational Study

Researchers observe participants with no manipulation of variables to assess the relationship/behaviour in natural setting

New cards
33

Experimental study

Uses random allocation of participants to establish cause-and-effect relationships AND manipulates/controls variables and measures outcome

New cards
34

Longitudinal study (observational)

follows a group of participants over an extended period to examine changes/trends and help establish temporal precedence. Involves collecting data at multiple time points from the same participants

New cards
35

Cross sectional study (observational)

Where you collect data from participants at a single point in time which provides a snapshot of populations characteristics.

New cards
36

Cohort study (observational)

Group of people with common characteristic are followed over time to find how many reach a certain health outcome of interest. Examines the relationship between exposure to certain factors and the development of outcomes/disease

New cards
37

Graphs for continuous (y) and categorical (x) variables

Side-by-side box plots (centre/spread) or vertically aligned histograms (shape)

<p>Side-by-side box plots (centre/spread) or vertically aligned histograms (shape)</p>
New cards
38

Graph for two continuous variables

Scatterplots

New cards
39

Describing scatterplot relationships

Consider strength (strong or weak), linear or non linear, and positive or negative. Non-linear relationships examples: exponential patterns or v shapes (can comment on strength but not direction)

<p>Consider strength (strong or weak), linear or non linear, and positive or negative. Non-linear relationships examples: exponential patterns or v shapes (can comment on strength but not direction)</p>
New cards
40

Outliers

Observations that deviate from distribution pattern caused by natural variation or measurement error. You should always try and explain outliers to discount error.

New cards
41

1.5IQR rule

Suspected outliers are values at least 1.5 x IQR above Q3 or below Q1. Values below Q1 - 1.5IQR are low outliers (lower threshold) and values above Q3 + 1.5IQR are high outliers (upper threshold).

New cards
42

Independent (explanatory) variable

Manipulated/controlled and causes changes in the DV. It’s plotted on the x axis (horizontal)

<p>Manipulated/controlled and causes changes in the DV. It’s plotted on the x axis (horizontal)</p>
New cards
43

Dependent (response) variable

Measured and records the outcome. It is dependent on the IV and plotted on the y axis (vertical)

<p>Measured and records the outcome. It is dependent on the IV and plotted on the y axis (vertical)</p>
New cards
44

How does the shape of distribution change the relationship between mean and median and why?

Symmetrical: mean = median, skewed left: mean < median, and skewed right: mean > median. This is because the mean is affected by outliers i.e. when skewed left there are more low value outliers that decrease the mean but when skewed right there are more high value outliers that increase the mean.

<p>Symmetrical: mean = median, skewed left: mean &lt; median, and skewed right: mean &gt; median. This is because the mean is affected by outliers i.e. when skewed left there are more low value outliers that decrease the mean but when skewed right there are more high value outliers that increase the mean.</p>
New cards
45

Graph for two continuous and one categorical variable

Scatterplot that has a key for the categorical data e.g. different colours or symbols for the different categorical levels.

<p>Scatterplot that has a key for the categorical data e.g. different colours or symbols for the different categorical levels.</p>
New cards
46

Table to describe one categorical variable.

Includes raw (counts) and relative (proportions) frequencies and is the table version of a bar chart.

<p>Includes raw (counts) and relative (proportions) frequencies and is the table version of a bar chart.</p>
New cards
47

Table to describe two categorical variables

Contingency table/cross tabulation that combines two frequency tables to summarise the relationship between the two variables.

<p>Contingency table/cross tabulation that combines two frequency tables to summarise the relationship between the two variables.</p>
New cards
48

Bias

Related to the location of a statistic sampling distribution compared to the location of the true parameter value. If difference is 0 the sample is unbiased. To reduce the bias you use random sampling.

<p>Related to the location of a statistic sampling distribution compared to the location of the true parameter value. If difference is 0 the sample is unbiased. To reduce the bias you use random sampling.</p>
New cards
49

Precision

Related to the spread of sampling distribution i.e. less spread = more precise. You can improve precision by increasing the sample size.

<p>Related to the spread of sampling distribution i.e. less spread = more precise. You can improve precision by increasing the sample size.</p>
New cards
50

Sampling error

The difference between statistic and parameter that is unavoidable but can be reduced in larger samples

New cards
51

Non-sampling error

Any error not caused by sampling size e.g. selection bias and measurement bias

New cards
52
<p>Simpsons paradox</p>

Simpsons paradox

Description of a linear relationship when data is combined is positive however when split into groups it is negative (and vice versa)

<p>Description of a linear relationship when data is combined is positive however when split into groups it is negative (and vice versa)</p>
New cards
53

3 R’s of study design

Randomisation, replication and reducing variation (blocking)

New cards
54

Simple random sampling (probability)

Researchers randomly select members of the population with each member having an equal probability of being selected.

New cards
55

Stratified sampling (probability)

Divide the population into subgroups and randomly sample from each subgroup. This can reduce bias and increase precision.

New cards
56

Cluster sampling (probability)

Split population into groups then randomly select groups and test the entire group e.g. schools. It has the potential of bias and there is limited choice for subgroup representation.

New cards
57

Sequential/systematic sampling (non-probability)

Systematic selection of a sample. Uses a sampling interval determined by population size/desired sample e.g. select every 10th

New cards
58

Convenience sampling (non-probability)

Sample readily available participants however it can cause highly biased data.

New cards
59

Snowball sampling (non-probability)

Sample by using one participant to find others e.g. “do you know anyone else who could participate in the study”

New cards
60

Line-intercept sampling (non-probability)

Line is chosen and any elements in that line form the sample e.g. flight patterns

New cards
61

Sensitivity

The probability of a test or measure to have a true positive result of the condition/disease = P(Positive test|Disease present)

New cards
62

Specificity

The probability of a test or measure to have a true negative result of the condition/disease = P(Negative test|No disease)

New cards
63

Probability notation

P(x) = probability of x event occurring which is always between 1 and 0

P(xc) = probability of a complementary event occurring e.g. x not occurring

New cards
64

Mutually exclusive events

When two (or more) events can’t occur at the same time e.g. roll a 2 and 3 on one die roll.

New cards
65

Collectively exhaustive events

Set of events that encompasses all possible outcomes e.g. 1, 2, 3, 4, 5, and 6 for a die roll.

New cards
66

Marginal probability

The probability of a single event occurring = P(A)

New cards
67

Joint probability

Probability of the intersection of two events = P(A B)

<p>Probability of the intersection of two events = P(A <span>∩</span> B)</p>
New cards
68

Union probability

The probability of A or B or both occurring = P(A ∪ B)

<p>The probability of A or B or both occurring = P(A <span>∪ B)</span></p>
New cards
69

Conditional probability

The probability of two events where A is going to happen given B has already happening = P(A|B)

New cards
70

Probability rules

Union rule: P(A∪B) = P(A) + P(B)

Addition rule: P(A or B) = P(A) + P(B) - P(A and B)

Conditional rule: P(A|B) = P(A ∩ B) divided by P(B)

Multiplication (rearrangement of conditional rule) = P(A∩B) = P(A|B) x P(B)

New cards
71

Independent events

Whether A event happens or not has no effect of P(B). Determined by either equation (you only need to test one):

P(A∩B) = P(A|B) x P(B)

P(A|B) = P(A)

P(B|A) = P(B)

New cards
72

Contingency tables

Useful for joint and marginal probabilities

<p>Useful for joint and marginal probabilities</p>
New cards
73

Venn diagrams

More useful for graphical representations than calculations.

<p>More useful for graphical representations than calculations.</p>
New cards
74

Tree diagrams

Useful for marginal and conditional probabilities

<p>Useful for marginal and conditional probabilities </p>
New cards
robot