1/77
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
statistics
the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. Providing a measure of confidence in any conclusions
data
a fact or proposition used to draw a conclusion or make a decision. Describes characteristics of an individual
population
the entire group of individuals to be studied
individual
a person or object that is a member of the population being studied.
sample
a subset of the population that is being studied
statistic
numerical summery based on a sample
descriptive statistics
consists of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs
inferential statistics
uses methods that take results from a sample, extends them to the population, and measures the reliability of the result
parameter
a numerical summary of a population
process of statistics
1. identify the research objective
2. collect the data needed to answer the question posed in
3. describe the data
4. perform inference
variables
characteristics of the individuals within the population
qualitative or categorical variables
allow for classificiation of individuals based on some attribute or characteristic
quantitative variables
provide numerical measures of individuals. The values of a quantitative variable can be added or subtracted and provide meaningful results
discrete variable
a quantitative variable that has either a finite number of possible values or a countable numner of possible values.
discrete variable characteristics
- countable (0,1, 2, 3)
- cannot take on every possible value between any two possible values
continuous variable
a quantitative variable that has an infinite number of possible values that it can take on and can be measured to any desired level of accuracy
raw data
data that is not organized
ways to organize data
tables
graphs
numerical summeries
frequency distribution
lists each category of data and the number of occurrences for each category of data
relative frequency
the proportion or percent of observations within a category
relative frequency formula
frequency/ sum of all frequencies
relative frequency distribution
lists each category of data with the relative frequency
bar graph
constructed by labelling each category of data on either the H or V axis and the frequency or relative frequency on the other. Rectangles of equal width drawn.
pareto chart
bar graph where the bars are drawn in decreasing order of frequency or relative frequency
side-by-side bar graphs
used to compare data sets. Comparisons are made using relative frequencies to deter confusion caused by different sample/pop sizes
horizontal bar graphs
preferable when category names are lengthy
pie chart
a circle divided into sectors, each representing a category of data
histogram
constructed by drawing rectangles for each class of data. Height is the frequency or relative frequency of the class. Width is equal and bars touch
classes
categories into which data and grouped.
lower class limit
smallest vaue within the class
upper class limit
the largest value within the class
class width
difference between consecutive lower class limits
determining class width
(largest data value - smallest data value)/number of classes
dot plot
drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed
uniform distribution
the frequency of each value of the variable is evenly spread out across the values ot the variable
bell-shaped distribution
the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle
skewed right
the tail to the right of the peak is longer than the tail to the left of the peak
skewed left
tail to the left of the peak is longer than the tail to the right of the peak
time series data
if the value of a variable is measured at different points of time
time-series plot
obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable is on the vertical axis. Line segments connect the points
arithmetic mean
computed by adding all the values of the variable in the data set and dividing by the number of observations
population mean is a
parameter
median (M)
value that lies in the middle of the data when arranged in ascending order
resistant
Extreme values (very large or small) relative to the data do not affect its value substantially
mode
the most frequent observation of the variable that occurs in the data set
when is range used
on the news or when talking of housing prices
skewed left distribution
mean < median < mode. Mean substantially smaller than median
skewed right distribution
mean > median > mode. Mean substantially larger than median
population standard deviation
the square root of the sum of squared deviations about the population mean divided by the # of observations in the population N
standard deviation percentages
34%, 13.5%, 2.35%, 0.15% (in half, x2 to equal 100)
symmetric distribution
mean roughly equally to median.
no mode
no observation occurs more than once
range (R)
the difference between the largest data value and the smallest data value
kth percentile (Pk)
a value such that k percent of the observations are less than or equal to the value
Interquartile Range (IQR)
the range of the middle 50% of the observations in a data set
fences
serve as cut off points for determining outliers.
lower fence formula
Q1 - 1.5(IQR)
upper fence formula
Q3 + 1.5(IQR)
five-number summary
consists of minimum, Q1, the median, Q3, and the largest data value
response variable (y)
variable whose value can be explained by the value of the explanatory or predictor variable
scatter diagram
graph that shows the relationship between two variables
variance
square of the standard deviation
the empirical rule
68% of data lies within 1 standard deviation.
95% of the data will lie within 2 standard deviations of the mean.
99.7% of the data lies within 3 standard deviations of the mean.
100% of the data lies within 4 standard deviations of the mean.
z-score
represents the distance that a data value is from the mean in terms of the number of standard deviations.
quartiles
divide data sets into fourths, or four equal parts
response variable
the dependent variable and is plotted on the vertical axis of a scatter diagram
positively associated
whenever the value of one variable increases, the value of the other variable also increases
negatively associated
two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases
explanatory variable
independent variable, plotted on the horizontal axis
correlation coefficient
a measure used to describe the strength and direction of a relationship between variables whose data points lie on or near a line
positive correlation range
r = 0 to 1
negative correlation range
r = -1 to 0
positive correlation
when one variable increases or decreases, the other one will also do the same
negative correlation
when one variable increases, the other will decrease and vice versa
properties of linear correlation coefficient
1. -1 to 1
2. r = 1, perf positive linear relation
3. r = -1, perf negative linear relation
4. r = 0, no LINEAR correlation
5. closer to -1, stronger the neg correlation
6. closer to 1, stronger the positive correlation
7. correlation coefficient is NOT RESISTANT
8. unit-less measure of association. Unit measure for x & y plays no role in interpretation of of r
least-squares regression
allows you to find a linear equation that describes the relation between 2 variables
residual
the difference between an observed value of the response variable and the value predicted by the regression line
what does each point on the least-squares regression line represent
each point represents the predicted y-value at the corresponding value of x