Unit 1 (STATS - 1000)

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/139

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

140 Terms

1
New cards

Statistics

Set of methods for obtaining, organizing, summarizing, presenting and analyzing data

2
New cards

Data

Comes from characteristics measured on individuals, or units

3
New cards

Individuals/ Units

Nearly anything: people, animals, places, things, etc

4
New cards

Observations

collected data values

5
New cards

Population

Totality of individuals about which we want information

6
New cards

Sample

Subset of the individuals in a population that we actually examine in order to gather information

7
New cards

Good sample

Representative of the populations

8
New cards

Identifying the population that a sample represents

replace the sample size with “all”

9
New cards

Variable

characteristic or property of an individual.

10
New cards

Examples of possible variables

Lifespan of a light bulb, The number of heads in five tosses of a quarter, Hair colour

11
New cards

Classifications of data

categorical and quantitative

12
New cards

Categorical data

  • values of categorical/qualitative variables.

  • These are variables that place individuals into one of several groups categories.

13
New cards

Categorical variables (examples)

  • Eye colour

  • Favourite singer

  • Reason for taking STAT 1000

14
New cards

Categorical and ordinal

meaningful, logical ordering to the values of a categorical variable.

15
New cards

Categorical and nominal

not a meaningful, logical ordering to the values of a categorical variable

16
New cards

Quantitative data

Represents quantitative variables

17
New cards

Quantitative variables are

Take numerical values for which arithmetic operations (such as subtracting, averaging, etc.) make sense (i.e. their results are meaningful).

18
New cards

Quantitative variables (examples)

  • Height

  • Volume of air in a balloon

  • Exam score

  • Time

19
New cards

Data distribution tells us:

What values a variable takes, and How often it takes these values

20
New cards

Bar Charts

Display variable values on one axis, and frequencies on the other.

  • Bars don’t touch (not continuous)

<p><span>Display variable values on one axis, and frequencies on the other.</span></p><ul><li><p>Bars don’t touch (not continuous)</p></li></ul><p></p>
21
New cards
<p>Pie charts</p>

Pie charts

visual representation of the relative frequency/proportion of the observed values for a categorical variable

22
New cards

Frequency distribution

count of how many of our data values fall into various predetermined classes or intervals

23
New cards
<p>Frequency Distribution <strong>Example</strong></p>

Frequency Distribution Example

knowt flashcard image
24
New cards

Relative frequency distributions

Dividing the number of data values in each class by the total number of data values, we get the relative frequency, or proportion of individuals in each class

25
New cards

Proportions (relative frequency distributions)

Values between 0 and 1 that are decimal representations of fractions. You can convert proportions to percentages by multiplying by 100.

26
New cards
<p>Relative frequency distribution <strong>Example</strong></p>

Relative frequency distribution Example

knowt flashcard image
27
New cards

Frequency distribution (intervals)

  • choose them ourselves

28
New cards

Frequency distribution (interval rules)

  • Our first interval must include the lowest data value (called the minimum)

  • Our last interval must contain the highest data value (called the maximum)

  • All intervals should be of equal length

  • Each interval includes the left endpoint, but not the right

29
New cards

Choosing the intervals (frequency distribution)

“nice choices”, that summarize our data well. We’d typically use around 5 - 10 intervals total

30
New cards

Why cant we just use non-overlapping intervals?

because of decimals (continuous variables)

  • 70-79 how about 79.5?

31
New cards

Continuous variables

These are quantitative variables that can take any value within a given range.

32
New cards

Continuous variables (examples)

Test scores, age, height, distance

33
New cards

Discrete variables

These are quantitative variables that can only take a “countable” number of values: i.e. they can only take a specific, distinct values.

34
New cards

discrete variables (examples)

  • The number of children in a family

  • The number of days of rain in a month

  • The number of books a person has read in their life

35
New cards

Histograms

More useful and commonly used display of continuous data

  • Graphical displays of the frequency (or relative frequency) of data values falling into each of several intervals.

  • Histograms are especially useful when we’re dealing with large data sets.

<p><span>More useful and commonly used display of continuous data</span></p><ul><li><p><span>Graphical displays of the frequency (or relative frequency) of data values falling into each of several intervals.</span></p></li><li><p><span>Histograms are especially useful when we’re dealing with large data sets.</span></p></li></ul><p></p>
36
New cards

What type of data is used for a histogram

continuous, quantitative data

37
New cards

Why is there no spaces between the bars in a histogram

because they are continuous data

38
New cards

What does the base of a histogram represent

length of the interval (equal length)

39
New cards

What does the height of a histogram represent

the frequency of the data in each interval

40
New cards

Distribution shape (histogram)

A histogram can be used to characterize the shape of the data distribution

  • Symmetric

  • Skewed to the right

  • Skewed to the left

41
New cards

Symmetric shape (histogram)

center divides it into two approximate mirror images

<p><span>center divides it into two approximate mirror images</span></p>
42
New cards

Skewed to the right (Histogram)

longer tail on the right side

  • most of the data values are concentrated on the left

<p><strong>longer tail on the right side </strong></p><ul><li><p>most of the data values are concentrated on the left</p></li></ul><p></p>
43
New cards

Skewed to the left (Histogram)

longer tail on the left side

  • most of the data values are concentrated on the right.

<p><strong>longer tail on the left side</strong></p><ul><li><p>most of the data values are concentrated on the right.</p></li></ul><p></p>
44
New cards

Distribution shape (!!WARNING!!)

Be careful interpreting the shape of a histogram if it’s displayed vertically!!

  • x-axis has to start at 0 (when flipped horizontal)

<p><span>Be careful interpreting the shape of a histogram if it’s displayed vertically!!</span></p><ul><li><p><strong>x-axis has to start at 0</strong> (when flipped horizontal)</p></li></ul><p></p>
45
New cards

Time series data

which are values for a variable measured over time

46
New cards

How can you visually display time series data

time plots

47
New cards

What constitutes a Time Plot

Time is plotted on the x - axis, and variable values are plotted on the y - axis

<p><span>Time is plotted on the x - axis, and variable values are plotted on the y - axis</span></p>
48
New cards

How is data represented on a Time Plot

Data values are represented by points. We connect these points to better visualize the pattern/trend.

49
New cards

Seasonal variation (time plot) {example}

fluctuations in data values that occur at regular intervals due to seasonal factors, showing predictable changes at specific times of the year.

<p>fluctuations in data values that occur at regular intervals due to seasonal factors, showing predictable changes at specific times of the year.</p>
50
New cards

Numerical Summaries of Data

Two important features of a data set that we describe using numbers are its location and variability

51
New cards

Measures of Location

our data is determined by where the center of our data falls.

  • mean

  • median

  • mode

52
New cards

Mode

Most frequently observed data value

53
New cards

Can you have more than one mode

it is possible

54
New cards

Median value

“middle value” in an ordered data set.

  • Half of the data values are less than or equal to the median, and the other

  • half of the data values are greater than or equal to the median.

<p><span>“middle value” in an ordered data set.</span></p><ul><li><p><span>Half of the data values are less than or equal to the median, and the other</span></p></li><li><p><span>half of the data values are greater than or equal to the median.</span></p></li></ul><p></p>
55
New cards

What is the first step to make sure the median is accurate

Ensure the data set is ordered.

56
New cards

What must you do if the n is odd (median)

You locate the middle value directly.

57
New cards

What must you do if the n is even (median)

take the average of the two middle values.

58
New cards

Mean

The average of a data set, calculated by adding all values together and dividing by the number of values.

<p>The average of a data set, calculated by adding all values together and dividing by the number of values. </p>
59
New cards

What is a extreme value (or outlier)

An extreme value, or outlier, is a data point that significantly differs from other observations in a data set, potentially skewing the results.

60
New cards

Which is resistant to outliers

The median

61
New cards

Which is not resistant to outliers

The mean

62
New cards

Is resistance to outliers a good thing?

Yes

  • a more accurate representation of the central tendency of the data, making analyses less sensitive to extreme values.

63
New cards

what is the advantage of the mean

It takes all data points into account, providing a measure of central tendency that reflects the overall dataset.

64
New cards

Median as a Measure of Center

It is simple to visualize how the median measures the center of the data: it divides the data set in half

65
New cards

Mean as a Measure of Center

center of mass” or “balance point” of the data.

66
New cards

How do the mean and the median for a given data set compare?

  • symmetric distribution

  • In a skewed distribution

67
New cards

Symmetric distribution (given data set)

the mean and median are equal

<p>the mean and median are equal</p>
68
New cards

Skewed distribution (given data set)

The mean follows the tail

  • right-skewed

  • Left skewed

69
New cards

Right skewed (skewed distribution)

The mean is greater than the median

<p><span>The mean is greater than the median</span></p>
70
New cards

Left skewed (skewed distribution)

The mean is less than the median

<p><span>The mean is less than the median</span></p>
71
New cards
<p>Weighted mean</p>

Weighted mean

Sometimes when we’re calculating the mean, some data values are given more weight than others

  • Some values are observed more frequently, or because some values are more “important” than others

72
New cards

Variability

How going to discuss how to numerically describe the variability of quantitative data

73
New cards
<p>Difference</p>

Difference

  • Both are approximately symmetric

  • The center of the distributions are approximately equal

  • The variability/spread is different:

    • The distribution on top has higher variability than the distribution on the bottom (the data is more “spread out”)

74
New cards

Measures of variability

  • Range

  • Interquartile Range

75
New cards

Range

This is the difference between the greatest observation and the least observation

76
New cards

Range formula

R = maximum - minimum

<p><span>R = maximum - minimum</span></p>
77
New cards

is range affected by extreme values?

Yes, range is sensitive to extreme values.

78
New cards

outliers how can they occur

  • in measurement

  • legitimate observations, BUT we might not be interested in including these extreme values in our numerical summary of the data

79
New cards

Interquartile range

measures the length of the interval that covers the middle 50% of the ordered observations.

80
New cards

Does the interquartile range exclude outliers?

Yes, it excludes outliers because it focuses only on the central 50% of data.

81
New cards

What is the first and third quartile

The endpoints of this interval

82
New cards

first quartile

  • Value where at least 25% of our observations are less than or equal to Q1

  • 75% of our observations are greater than or equal to Q1

83
New cards

Third quartile

  • Value where at least 75% of our observations are less than or equal to Q3

  • 25% of our observations are greater than or equal to Q3

84
New cards

How to find Q1 (first quartile)

Take the median of all the data values lower than the (data’s) median

  • don’t include counting the median

85
New cards

How to find Q3 (third quartile)

Take the median of all the data values higher than the (data’s) median

  • count from the maximum of the data set

86
New cards

how to solve for the interquartile range

Subtract Q1 from Q3 to find the IQR

<p>Subtract Q1 from Q3 to find the IQR</p>
87
New cards

Percentiles

Percentiles are values that divide a dataset into 100 equal parts, indicating the percentage of scores that fall below a particular data point.

88
New cards

Percentile (class)

P-th percentile of a data set is a value such that p% of observations are less than or equal to the p-th percentile, and at least (100-p)% of observations are greater than or equal to the p-th percentile

89
New cards

What is the five-number summary

The five-number summary consists of five descriptive statistics that provide a quick overview of a dataset:

  • The minimum

  • first quartile (Q1)

  • median

  • third quartile (Q3)

  • maximum

<p>The five-number summary consists of five descriptive statistics that provide a quick overview of a dataset:</p><ul><li><p>The minimum</p></li><li><p>first quartile (Q1)</p></li><li><p>median</p></li><li><p>third quartile (Q3)</p></li><li><p>maximum</p></li></ul><p></p>
90
New cards

What does the five number summary divide the data into

The five-number summary divides the data into four intervals,

  • 25% each

91
New cards

What does the Five number summary describe?

  • The center/location of our data

  • The spread/variability of our data

  • The shape of our data

92
New cards

Quantile boxplot

five - number summary to get a “picture” of our data,

93
New cards

What does a quantile boxplot consist of

  • A number line at the bottom, drawn horizontally

  • A vertical line at the median

  • A box around the median that covers the IQR

  • Lines (called “whiskers”) that extend from the box out to the minimum and maximum

<ul><li><p><span>A number line at the bottom, drawn horizontally</span></p></li><li><p><span>A vertical line at the median</span></p></li><li><p><span>A box around the median that covers the IQR</span></p></li><li><p><span>Lines (called “whiskers”) that extend from the box out to the minimum and maximum</span></p></li></ul><p></p>
94
New cards

How do you know if the boxplot is skewed to the left

left is longer than on the right, it indicates left skew.

95
New cards

How do you know if the boxplot is skewed to the right

If the right whisker is longer than the left, it indicates right skew.

96
New cards

Vertical boxplot

knowt flashcard image
97
New cards

How do you know if the boxplot is skewed to the left (vertical)

The lower line is longer

98
New cards

How do you know if the boxplot is skewed to the right (vertical)

The upper line is longer.

99
New cards

side-by-side boxplots

Comparative visual display of two or more boxplots to analyze differences in distributions.

100
New cards
<p> <span>The side-by-side boxplots below compare the height distributions for Toronto Blue Jays pitchers and players in other fielding positions: (</span>example)</p>

The side-by-side boxplots below compare the height distributions for Toronto Blue Jays pitchers and players in other fielding positions: (example)

  • The median heights for pitchers and fielders are equal

  • The distribution for pitchers is skewed to the right and the distribution for fielders is skewed to the left

  • The IQR for pitchers and fielders are equal, but the range for fielders is greater