STAT 200 pre-midterm

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/30

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:14 PM on 5/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

31 Terms

1
New cards

what is a numerical vs categorical variable?

categorical = outcomes fall in different categories

numerical = outcomes can be measured on a numerical scale

  • numerical variables can be transformed into categorical (ex. age → age range)

  • subgroups = levels (ex. category = faculty, levels = science, arts, etc.)

2
New cards

what are the ways to summarize categorical variables?

  • frequency / relative frequency tables

  • contingency tables (two-way table)

  • graphical displays (bar charts, pie charts - based on frequencies)

3
New cards

what are marginal distributions?

  • separate variables into separate tables

  • determine distribution of each table

4
New cards

what are conditional distributions and what are contingency tables used for?

  • set one level as a condition - this is the total used for determining frequency

    • ex. a place of residence for arts students - arts students is the condition

5
New cards

what are the ways to summarize numerical data?

  • graphical displays (histograms, stem-and-leaf displays, boxplots)

  • shape of distribution of data

  • numerical summaries

6
New cards

how are histograms used for numerical data?

  • make categories for numerical values

    • find frequencies for numerical values

  • advantage - helps us look at shape of distribution

    • modality, symmetry of distribution, presence of outliers

  • disadvantage - lose actual data points

7
New cards

what is modality?

  • number of peaks - unimodal, bimodal, multimodal

8
New cards

what are the different symmetries of distribution?

  • symmetric

  • skewed to the right / positively skewed

    • long right tail

  • skewed to the left / negatively skewed

    • long left tail

9
New cards

what are the numerical summaries for numerical data?

  • measures of center

    • mean, median

  • measures of spread

    • variance, standard deviation, interquartile range

  • percentiles (quantiles) / quartiles

  • 5-number summary

    • minimum, first quartile (Q1), second quartile (Q2), third quartile (Q3), maximum

10
New cards

how are stem-and-leaf displays used for numerical data?

  • split data into 2 parts

    • all except last digit of data = stem

    • last digit of data = leaf

  • list unique stems

  • list leafs in ascending order

  • rotating should match histogram shape

11
New cards

how are boxplots used for numerical data?

makes use of 5 number summary

  • draw Q1 and Q3 → make box

  • find Q2 / median → draw line

  • find boundaries for outliers

    • LB = Q1 - 1.5(IQR)

    • UB = Q3 + 1.5(IQR)

  • draw boundary lines (whiskers) → line at value closest to boundary that is not an outlier

    • if no outliers → extend whiskers to min and max

  • outliers outside of boundaries marked by circles

  • draw min and max

12
New cards

what is the mean and how to calculate?

  • the average of a dataset

  • sum of all observations / number of observations

13
New cards

what is the median and how to calculate?

  • exact middle value of a dataset

  • if odd number of data points

    • = ((n+1) / 2)th data point

  • if even number of data points

    • = average of (n/2)th + (n/2+1)th data points

14
New cards

what is variance and how to calculate?

  • shows total variation

  • squared deviations of values from the mean

<ul><li><p>shows total variation</p></li><li><p>squared deviations of values from the mean</p></li></ul><p></p>
15
New cards

what is standard deviation and how to calculate?

  • the square root of variance

  • s = sqrt(s2)

16
New cards

what are percentiles / quartiles?

  • position where a certain amount of data points are below it

    • quartile 1 = value in data set that has 25% of values below it

    • quartile 2 = 50%

    • quartile 3 = 75%

17
New cards

what is the interquartile range and when is it used?

  • different between Q1 and Q3 (Q3 - Q1)

  • used when you have skewness or outliers

    • better that using standard deviation / variance for these conditions

18
New cards

how does shifting data affect measures of center / spread?

  • add a constant c to each observation in the data

  • any measure of center (median / mean) shifts by constant c

  • shifting the data does not change the spread (variance, SD, range, IQR)

19
New cards

how does scaling data affect measures of center / spread?

  • multiply each observation in the data by a positive constant c

  • measures of center and spread will be multiplied by constant c

  • variance of the new data will be c2 times the original variance

20
New cards

when is standardizing data used?

  • to compare observations measured on different scales

    • ex. different currencies

  • to compare observations from two different distributions

    • ex. class averages across different semesters

21
New cards

what is a z-score and how to calculate?

  • z = observation - mean / SD

  • gives the distance between an observation and the mean in units equal to the standard deviation

    • the number of standard deviations that a value is above of below the mean

    • z = 0 → observation = mean

22
New cards

what are characteristics of the normal model?

  • bell-shaped, unimodal

  • symmetric about the mean 𝜇

  • spread of distribution determined by the value of SD 𝝈

  • denoted by N(𝜇, 𝝈)

23
New cards

what are terms used for population vs sample standard deviations?

  • population numerical summaries = parameters

    • 𝜇 = mean, 𝝈2 = variance, 𝝈 = SD

  • sample numerical summaries = statistics

    • ȳ = mean, s2 = variance, s = SD

24
New cards

how are values from the normal model standardized?

  • calculate z-score

  • z-score follows the standard normal model with mean = 0 and SD = 1

25
New cards

what is the 68-95-99.7% rule?

  • Interval → % data falling in interval

  • Within 1 SD of mean = ~68%

  • Within 2 SD of mean = ~95%

  • Within 3 SD of mean = ~99.7%

26
New cards

what is a scatterplot and when is it used?

  • helps visualize possible relationships between 2 quantitative variables

  • explanatory variable plotted on x-axis

  • response variable plotted on y-axis

    • explanatory variable is believed to have influence on the value of the response variable

27
New cards

what are the patterns of a scatterplot that must be described?

  • direction

    • positive → x and y values tend to go in the same direction

    • negative → x and y values tend to go in the opposite direction

  • form

    • linear vs non-linear

  • how scattered are the points?

    • strong relationship → points close to each other

    • weak / no relationship → points spread out / randomly scattered

  • any outliers?

    • any points outside of pattern seen

28
New cards

what is correlation and the correlation coefficient?

  • correlation refers to the degree of linear association between 2 quantitative variables x and y

  • correlation coefficient r is a measure of the strength of a linear association between 2 quantitative variables

29
New cards

what are the different types of correlation

  • positive correlation = large values of x are linearly associated with large values of y

    • r = +1 gives perfect positive correlation

  • negative correlation = large values of x are linearly associated with small values of y

    • r = -1 gives perfect negative correlation

30
New cards

what are properties of the correlation coefficient r?

  • swapping x and y values does not affect the value of r

  • the value of r does not change if all values are shifted or scattered;ed

  • r is sensitive to outliers, may not give a reliable measure of strength of a linear relationship in the presence of outliers

31
New cards

how do association and causality differ?

  • the existence of a linear relationship between 2 variables x and y does not imply that an increase in one variable leads to an increase of decrease in another

    • association does not imply causation

  • there may be a lurking variable (third variable) that associates both x and y