Exploring Data

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/48

Earn XP

Description and Tags

Statistics

AP Statistics

11th

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

49 Terms

New cards

descriptive methods

the different methods for organizing and summarizing collected data, including tabular methods, graphical methods, and numerical methods

New cards

categorical/qualitative variables

variables that place the individuals being studied into one of several groups or categories

New cards

numerical/quantitative variables

variables that have outcomes that can be analysed using arithmetic operations

New cards

univariate data

data with only one measurement on each object

New cards

bivariate data

data with two measurements on each object

New cards

frequency

the number of times that an observation occurs, usually denoted with f

New cards

relative frequency

the ratio of the frequency (f) to the total number of observations (n), usually denoted by rf, and rf = f/n

New cards

cumulative frequency

the number of observations less than or equal to a specified value, usually denoted by cf

New cards

frequency distribution table

a table giving all possible values of a variables and their frequencies

New cards

center of a distribution

the “typical” or central data point, measured in several ways, including mean, median, and mode

New cards

spread of a distribution

how far the data points are from the center, measured through the range, standard deviation, or variance

New cards

shape of a distribution

tells where most of the data is, can be symmetric or skewed

New cards

symmetric distribution

when the left half of the distribution is approximately a mirror image of the right half, meaning that the data is spread out in the same way on both sides, with the same amount of data on both sides of the center

New cards

skewed distribution

when there are extreme values in only one direction that causes one side to have a longer tail, being right-skewed if the tail is on the right, and left-skewed if the tail is on the left

New cards

outliers

an observation that is surprisingly different from the rest of the data

New cards

stem in stemplot

the left-most part of each observation

New cards

leaf in stemplot

the remaining part of each observation, excluding the left-most part

New cards

percentage frequency/relative frequency

the frequency of an observation in relativity to the whole sample

New cards

population

the entire group of individuals or things

New cards

sample

the part of the population that is studied

New cards

mean

the average value in a data set. nonresistant and affected by extreme or outlier measurements. for a population, denoted by μ, and for a sample, denoted by x̄

New cards

median

the point that divides the measurements in half. resistant and not affected by extreme or outlier measurements, better to use for skewed data or data sets with outliers. sometimes denoted as M

New cards

range

the difference between the largest and smallest measurement in a data set, not reliable as it depends on the two extreme measurements

New cards

interquartile range (IQR)

the range of the middle 50% of the data, or the difference between the third and first quartiles. resistant and not affected by extreme or outlier measurements

New cards

standard deviation

a measure of variation that takes every measurement into account. nonresistant and affected by extreme or outlier measurements

New cards

variance

the square of the standard deviation

New cards

percentiles

the division of a set of values into 100 equal parts

New cards

quartiles

the division of a set of values into four equal parts by using the 25th, 50th, and 75th percentiles

New cards

standardized scores/z-scores

(Observed measurement - mean) / standard deviation

New cards

linear regression

a model to measure the strength of the relationship between two quantitative variables with a linear relation

New cards

Pearson’s correlation coefficient

a numerical summary measure calculated to represent the linear dependence of two variables between -1 and 1. the further away from 0, the stronger the relationship

New cards

scatterplot

a graphical summary measure used to describe the nature, degree, and direction of the relation between two variables x and y, where (x, y) gives a pair of measurements

New cards

linear regression model equation

Y = α + βX where Y is the response variable, X is the explanatory variable, α is the y-intercept, and β is the slope

New cards

predicted value of y

ŷ = a + bx

New cards

least-squares regression line

a line that minimizes the sum of the squares of the residuals, otherwise known as the line of best fit. the line will always pass through the point (X̄, Ȳ) and will always have the slope β₁ = (r) [S_y/S_x]

New cards

coefficient of determination

measures the percent of variation in Y-values explained by the linear relation between X- and Y-values. denoted by R², which is equal to the square of the correlation coefficient. always between 0 and 1.

New cards

random error

a measure of how wrong the predicted values were from the measured values - denoted with ε

New cards

influential observation

an observation that strongly affects a statistic

New cards

residual plot

a plot of residuals versus the predicted values of Y

New cards

transformation

a change made to the equation for variables to make a linear form