statistics
set of methods for obtaining, organizing, summarizing, presenting, and analyzing data.
data
characteristics measured on individuals or units
population
totality of individuals we want info
sample
subset of units in a population
variable
characteristics or property of an individual ex: TIME until light bulb burns out DISTANCE: traveled
categorical data
represent values of categorical variables, that places individuals into one of several group ex: gender of newborn eye colour
qualitative
relating to or involving comparisons based on qualities,
categorical
"types" names, symbol, things
categorical and ordinal
ordering makes sense for values of categorical variables
categorical and nominal
if the variables of ordering dont make sense
quantitative data
values quantitative variables for adding and averaging make sense ex: height exam scores volume sums
distribution of data
tells us what values a variable takes and how often it takes these values, VALUES dont have to be quantitative
bar charts
display variables values on one axis and frequencies on the other,
spaces imply no continuity
categorical variables and displays categorical data.
bar and pie charts
categorical data
stemplots, histogram, timeplot
quantitative data
pie charts
visual representation of the relative frequency
minimum
the smallest possible quantity
maximum
the largest possible quantity
frequency distribution
count of how many of our data values fall into various predetermined classes or intervals
continuous variable
take any value within a given range
type of quantitative -ex: weight
discrete variable
only take a countable number of values ex: # of children in a family.
measure in certains #'s ex: number of pets
relative frequency or proportion
dividing the number of data values in each class by the total number of data values (sum)
proportion
values between 0-1,
inclusive
decimal representation of fractions
proportion of intervals must add up to 1
histogram
large amount of data
form of a bar graph with no spaces
reflect continuity
symmetric data
if histogram is its centre and divides it into approximate mirror image
time plots
used for plotting time series data,
values measured over time
time plotted on x-axis
variable values plotted on y-axis presented by points, connected to make a trend
measure of centre
-mode -median -mean
measure of spread
range
standard deviation
location and variability
are the two important features of a data set
mode
most frequently observed
median
middle of the set
mean
average
outliers
extreme values that dont affect the value of the MEDIAN -"resistant" to the effect of outlier
mean, range, standard deviation
affected by outliers
median, mode
resistant by outlier
mean and median
symmetric distribution are equal
weighted mean
each of the data points contributing equally to the final average, some data points contribute more than others.
variability
the difference being exhibited by data points within a data set, as related to each other or as related to the mean
range
the difference between the highest and lowest values.
interquartile range
of a data set measures the length of an interval which covers the middle 50 percent of the ordered observations
percentiles
describes how a score compares to other scores from the same set.