Statistics
Collecting Data
AP Statistics
Unit 1: Exploring One-Variable Data
Exploring Data
Statistics
Descriptive Methods
Types of Variables
Graphical Methods
Qualitative Data
Quantitative Data
Summarizing Distribution
Measures of Central Tendency, Variation, and Position
Univariate Data
Bivariate Data
Least Squares Regression Line
Outliers and Influential Points
11th
Mosaic Plots
________: Stacked bar chart that shows percentages of data in groups.
Box plots
________: a graph that gives a quick picture of the middle 50 % of the data.
Outliers
________: An observation that is surprisingly different from the rest of the data.
Bivariate data
________: Taking two measurements on each object (Ex.
Dotplot
________: Best for small data sets, similar to histograms and bar plots.
Numerical
________ or Qualitative: Outcomes can be measured arithmetically.
Sample
________: The part of the population that is actually studied.
Quartiles
________: Divide a set of values into four equal parts by using the 25th, 50th, and 75th.
Q1
________: 25 % of values are below and 75 % of values are above.
Correlation Coefficient
________: Numerical measures used to judge the relation between two variables.
standard deviation
Can be qualified through the range, ________, or variance of a distribution.
Q2
________: 50 % of the values are below and 50 % of the values are above.
Spread
________: Describes how far the data points are from the center.
Univariate data
________: Taking only one measurement on each object (Ex.
Histogram
________: a graphical representation in the x- y form of the distribution of data in a data set; x represents the data and y represents the frequency or relative frequency.
Shape
________: Distribution can tell us where most of the data is.
Categorical
________ or Qualitative: Places the individual being studied into one of several groups.
Error
________ or residual= e= y- ŷ= observed values of Y for a given value of X- predicted value of Y for a given value of X.
Population
________: The entire group of individuals or things that we are interested in.
Range
________: The difference between the largest and the smallest measurement in a data set.
graph
The ________ consists of contiguous rectangles.
Scatterplot
________: Graphical summary measure.
Linear regression mode
________: Is an equation that gives a straight- line relationship between two variables.
Direction
________: The scatterplot will show whether the y- value increases or decreases as the x increases, or that it changes ________.
Positive z score
________: Indicates that the measurement is larger than the mean.
Linear Regression
________: If two different qualitative variables have a linear relation, then we can measure the strength of that relationship using this.
Statistics
________: The science of data.
Stem
________- and- leaf graph or stemplot: easy to compute the median and other quantiles.
Positive relation
________: Increasing or upward trend between two variables.
Tabular Methods
________: Frequency distribution table (it facilitates the analysis of patterns of variation among observed data)
regression line
Predicted value: computed using the estimated ________ and is also known as "y hat.
Coefficient of determination
________: measures the percent of the variation in Y- values explained by the linear relation between X- and Y- values.
Descriptive methods
________: The different methods used collect data.
Population mean
________: Adding up all the values in the entire population and dividing by the number of values.
Frequency
________** (f): Number of times that observation has occurred.
Bar Charts
________: The length of the bar for each category is proportional to the number or percent of individuals in each category.
Cumulative Frequency Charts
________: Frequency for that group plus the frequencies of all groups of small observations.
Statistics
The science of data
Descriptive methods
The different methods used collect data
Categorical or Qualitative
Places the individual being studied into one of several groups
Numerical or Qualitative
Outcomes can be measured arithmetically
Univariate data
Taking only one measurement on each object (Ex
Bivariate data
Taking two measurements on each object (Ex
Tabular Methods
Frequency distribution table (it facilitates the analysis of patterns of variation among observed data)
n
Denotes the number of observations
**Frequency (**f)
Number of times that observation has occurred
Relative frequency
Ratio of the frequency to the total number of observations
Cumulative frequency
Gives the number of observations less than or equal to a specific value
Frequency distribution table
A table giving all possible values of a variable and their frequencies
Bar Charts
The length of the bar for each category is proportional to the number or percent of individuals in each category
Pie Chart
Categories of data are represented by wedges in a circle and are proportional in size to the percentage of individuals in each category
Segmented Bar Chart
Takes the distribution from each group and arranges them along either the horizontal or vertical axis and shows the relative frequency of each group represented in one bar for each group
Mosaic Plots
Stacked bar chart that shows percentages of data in groups
Center
Describes the "typical" or central data points
Spread
Describes how far the data points are from the center
Shape
Distribution can tell us where most of the data is
Symmetrical Distribution
The data is spread out in the same way on both sides and there is the same amount of data on each side of the center
Skewed Distribution
If there is an extreme value in only one direction that causes one side to have a longer tail
Cluster sample
A sample in which the researcher first divides the population into sections (or clusters), and then randomly selects all members from some of those clusters
Outliers
An observation that is surprisingly different from the rest of the data
Stem-and-leaf graph or stemplot
easy to compute the median and other quantiles
Dotplot
Best for small data sets, similar to histograms and bar plots
Histogram
a graphical representation in the x-y form of the distribution of data in a data set; x represents the data and y represents the frequency or relative frequency
Cumulative Frequency Charts
Frequency for that group plus the frequencies of all groups of small observations
Population
The entire group of individuals or things that we are interested in
Sample
The part of the population that is actually studied
Mean
The arithmetic means AKA average
Population mean
Adding up all the values in the entire population and dividing by the number of values
Median
Point that divides the measurements in half
Range
The difference between the largest and the smallest measurement in a data set
Interquartile range
The range of the middle 50% of the data, the difference between the third quartile and the first quartile
Standard deviation
A number that is equal to the square root of the variance and measures how far data values are from their mean
Variance
Average of the squares of the deviation
Percentiles
Percentiles divide a set of values into 100 equal parts
Quartiles
Divide a set of values into four equal parts by using the 25th, 50th, and 75th
Q1
25% of values are below and 75% of values are above
Q2
50% of the values are below and 50% of the values are above
Q3
75% of values are below and 25% of values are above
Standardized scores or z-scores
Gives the distance between the measurements and the mean in terms of the number of standard deviations
Negative z-score
Indicated that the measurements are smaller than the mean
Positive z-score
Indicates that the measurement is larger than the mean
Box plots
a graph that gives a quick picture of the middle 50% of the data
Bivariate data
Data on two different variables collected from each item in a study
Linear Regression
If two different qualitative variables have a linear relation, then we can measure the strength of that relationship using this
Scatterplot
Graphical summary measure
Shape
A scatter plot tells us whether the nature of the relation between the two variables in linear or nonlinear
Direction
The scatterplot will show whether the y-value increases or decreases as the x increases, or that it changes direction
Positive relation
Increasing or upward trend between two variables
Negative relation
Decreasing or downward trend between the two variables
Strength of relationship
If the trend of the data can be described with a line of the curve then the spread of the data values around the line or curve describes the degree of the relation between the two
Correlation Coefficient
Numerical measures used to judge the relation between two variables
Linear regression mode
Is an equation that gives a straight-line relationship between two variables
Independent variable
x
Dependent variable
y
Slope
b
y-intercept
a
Predicted value
computed using the estimated regression line and is also known as "y hat"
Least square regression line
line that minimizes the sum of the squares of the residuals
Outliers
are observed data points that are far from the least squares line
Influential points
observed data points that are far from the other observed data points in the horizontal direction