Statistics - Unit 1

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/82

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

83 Terms

1
New cards

What is data?

Information about individuals or subjects in a population

2
New cards

What is a variable?

Any characteristic, numerical value, or quantity that can be measured or counted

3
New cards

What are two examples of variables?

Eye colour, height

4
New cards

What are the different types of data?

Qualitative (categorical) and quantitative (numerical)

5
New cards

What are the two types of quantitative/numerical data?

Discrete and continuous data

6
New cards

What is discrete data?

Countable data that only results in whole numbers

7
New cards

What is continuous data?

Ranges of values that are not exact

8
New cards

What is population?

The collection of individuals or subjects that are being studied by the researcher

9
New cards

What is a sample?

A subset of the population where the collection is made without bias and the sample is highly representative of the total population

10
New cards

What is a census?

A collection of information from every individual or subject of the population

11
New cards

What is a parameter?

A numerical value or quantity measuring some aspect of the population

12
New cards

What is a statistic?

A numerical value or quantity measuring some aspect of the population

13
New cards

What is distribution?

The variation of data

14
New cards

What are outliers?

Data points that are either too high or too low when compared to the other data points

15
New cards

What is a frequency table?

A chart of the number of times each value occurs

16
New cards

What is a bar graph?

A visual display of data in which quantities are represented by bars of equal width

17
New cards

When are bar graphs used?

Bar graphs are used for discrete data

18
New cards

What is a histogram?

A graph representing ranges of data

19
New cards

When is a histogram used?

A histogram is used with continuous data

20
New cards

How is range calculated?

Max value - min value

21
New cards

How is the number of classes chosen?

Based upon the grapher’s discretion, usually a minimum of 5 bins and a maximum of 15

22
New cards

How is class interval/bin width calculated?

Range/# of classes

23
New cards

What is a class interval or bin?

The ranges of values to encompass continuous data

24
New cards

What is a sampling technique?

A method of selecting a sample that will be representative of the overall population

25
New cards

What are the defining characteristics of simple random sampling?

  • Every member of the population has an equal change of being selected

  • The selection of any particular individual does not impact the chances of any other from being chosen

26
New cards

What is the effectiveness of simple random sampling?

  • Reduction of sample bias

  • May not be representative of the population, but these derivations are due only to chance

27
New cards

What are the defining characteristics of stratified sampling?

  • The population is divided into groups or members who share common characteristics such as gender, age, education level, geographic areas, etc. which are called strata

  • A stratified sample has the same proportion of members from each stratum as the population does

  • A simple random sample for the members of each stratum is taken

28
New cards

What is the effectiveness of stratified sampling?

  • Ensures each subgroup within the population receives proper representation'

  • Many conditions have to be met, so it cannot be used for every study if you cannot classify every member of the population into a stratum

29
New cards

What are the defining characteristics of systematic sampling?

  • Used to sample a fixed percent of the population

  • A random starting point is chosen and every individual from that point is determined by:

    n = population size ÷ sample size

30
New cards

What is the effectiveness of systematic sampling?

  • It is simple and is therefore popular among researchers

  • Low probability of contaminating data

  • If every nth data point has a random characteristic the sample may disproportionately represent the population

31
New cards

What are the defining characteristics of convenience sampling?

  • Made up of a conveniently available pool of respondents

  • Members are chosen based on proximity rather than population representation

32
New cards

What is the effectiveness of convenience sampling?

  • Commonly used as it is prompt, simple, and economical

  • Possibility of bias as some groups will be over-represented while others with be under-represented

  • Since the selection is biased, there will be inaccuracies in the study

33
New cards

What is sampling bias?

Inconsistencies in studies caused by biased selection of samples

34
New cards

What are the defining characteristics of quota sampling?

  • Survey population is divided into mutually exclusive subgroups

  • Subgroups are selected with respect to known (non-random) features, traits, or interests

35
New cards

What is the effectiveness of quota sampling?

  • Inexpensive method of selecting a sample

  • Guarantees the inclusion of people you need

  • Participants are not randomly drawn and may have specific characteristics meaning it is impossible to know how well they represent the groups in a population

36
New cards
<p>What is statistical bias?</p>

What is statistical bias?

Any factor that favours certain outcomes on responses, skewing the results, and can be unintentional or deliberate

37
New cards

What is cumulative frequency?

The cumulative frequency of the previous class added to the frequency of the current class that adds to the total frequency

38
New cards

What is relative frequency?

The frequency of a class divided by the total frequency

39
New cards

What is the measure of central tendency?

The measure of the location of the middle of a data with the purpose of describing a set of numerical data using a single value

40
New cards

What are the measures of central tendency for ungrouped data?

Mean, median, and mode

41
New cards

What is the mode?

The value(s) that occur(s) the most often, and can be more than one value depending on the distribution (ex. bimodal distributions)

42
New cards

What is the mean?

The average of a set of values

43
New cards

How is mean calculated?

xˉ = (x ₁ + x ₂ + x ₃ … + x ₙ) ÷ (n)

44
New cards

What measure of central tendency is the most common?

The mean

45
New cards

What is the median?

The middle value in a data distribution

46
New cards

What measure of central tendency do outliers impact?

Mean

47
New cards

What measure of central tendency should be used if outliers are present?

Median

48
New cards

What measure of central tendency should be used if data is mostly symmetric?

Mean or median

49
New cards

What measure of central tendency should be used if frequency is important?

Mode

50
New cards

What measure of central tendency should be used if data is qualitative?

Mode

51
New cards

If a constant is added to each value in a data set, what is the impact on mean and standard deviation?

Mean would increase by the added value but standard deviation would not change

52
New cards

If a constant is multiplied by each value in a data set, what is the impact on mean and standard deviation?

Mean would be multiplied by the value, standard deviation would also increase

53
New cards

What is a weighted mean?

A measure of central tendency that reflects the relative importance of data

54
New cards

What is the formula for weighted mean?

xˉ = (∑ f * x) ÷ (n)

55
New cards

What does x represent in the formula for weighted mean?

The mid-interval value for a class interval

56
New cards

When is weighted mean used?

When a central tendency measurement is required for a set of grouped data

57
New cards

What is another term for a cumulative frequency graph?

An Ogive

58
New cards

How is a cumulative frequency graph built for ungrouped data?

Cumulative frequency is on the y axis and discrete data is on the x axis

59
New cards

How is a cumulative frequency graph built for grouped data?

Cumulative frequency is on the y axis and the upper class limit is on the x axis

60
New cards

What is a cumulative frequency graph used for?

To study the growth rate of data by showing the accumulation of frequency and to determine estimates of the percentiles and quartiles of the data

61
New cards

What are the features of the Ogive?

  • S-shape used to estimate some values

  • The ability to determine median by dividing the final cumulative frequency by 2

62
New cards

What are percentiles?

Separations of large ordered data into hundredths

63
New cards

What are quartiles?

Separations of large ordered data into quarters

64
New cards

What is the point showing lower quartile on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (4)

65
New cards

What is the point showing the median on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (2)

66
New cards

What is the point showing the upper quartile on an Ogive?

The x point when the cumulative frequency is 3 * (n + 1) ÷ 4

67
New cards

What are the points showing percentiles on an Ogive?

The x point when the cumulative frequency is p * (n + 1) ÷ 100

68
New cards

What is the formula for interquartile range (IQR)?

IQR = Q ₃ - Q ₁

69
New cards

What is the measure of spread?

The distance of each data point from the mean

70
New cards

Why is the measure of spread important?

It shows how well a mean represents the rest of the data

71
New cards

When is range used as a measure of spread?

When the sample sizes are small

72
New cards

What is variance?

A method of measuring spread by taking the sum of the squares of the difference between each data point and the average

73
New cards

What is the formula for variance (σ²)

σ² = (∑ (x - xˉ)²) ÷ (n)

74
New cards

What is standard deviation?

An average of the square of the distance of each piece of data from the mean, meaning the smaller the standard deviation, the more compact the data set

75
New cards

What is the formula for standard deviation (σ)

σ = √((∑ (x - xˉ)²) ÷ (n))

76
New cards

Why is standard deviation an approximation?

Because when variables are grouped and the midpoint is used, the spread of observation within the interval is ignored, causing the standard deviation to be lower than the true value

77
New cards

What is a box and whisker plot?

A plot showing the lower extreme, lower quartile, median, upper quartile, and upper extremes of a data set, with a box showing the lower-upper quartiles and whiskers showing the extremes

78
New cards
<p>What do pirates and global warming show?</p>

What do pirates and global warming show?

Causation ≠ correlation

79
New cards

What is the name for the values of r corresponding to correlations?

Pearson’s product-moment correlation

80
New cards

How do you draw a regression line?

  1. Start at the mean

  2. Place the ruler on the mean

  3. Align the ruler so that the amount of data points above and below are equal

  4. Draw the line

81
New cards

What is linear regression?

A line of best fit

82
New cards

What are the two equations for lines of best fit?

y on x, y = ax + b
x on y, x = cy + d

83
New cards

How do you find a point (p, q) from a regression line?

  1. Find the mean of x

  2. Find the mean of y

  3. Draw two lines of best fit, one for y = ax +b and another for x = cy + d

  4. Find the point at which line ax + b intersects with line cy + d

  5. The calculated (x,y) coordinate is equal to (p,q)