1/40
oh dear
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
what is data
recorded observations within a certain context → “numbers in context”
what is statistics
the art and science of collecting and analyzing observations
what are variables
characteristics of people or things. have variability (change from one person to the next).
what is population of interest
Prompt: 125 colleges in the US with football teams
Population of interest: All colleges in the US with football teams
“the bigger picture,” entire data set
what is a sample?
collection of data, data set, subset of population
Prompt: 125 colleges in the US with football teams
Sample: 125 colleges in the US with football teams
what are quantitative / numerical variables
describe quantities of the objects of interest
exceptions are numerical identifiers that only label things instead of measuring them. eg. telephone numbers, student IDs, zip codes
what are qualitative / categorical variables
describe qualities of the objects of interest
answer is a word or label or category
includes labels like telephone numbers, student IDs, zip codes
how do you code categorical data?
use 0 and 1
0 means no and 1 means yes
sum of the 1s = number of “yes” answers
what is stacked data?
preferred format (treatment | observation)
treatments are in one column and observations are in the other
there is a connection between rows ——→
what is unstacked data?
each column represents a variable from a different group
(treatment A | treatment B | treatment C)
(observation A | observation B | observation C)
no connection between rows ——→
what are 8 questions to ask for context?
what are the objects of interest?
what variables were measured?
how were the variables measured?
what are the units of measurement?
who collected the data?
how did they collect the data?
where were the data collected?
why did they collect the data?
what is a two-way table?
table that organizes 2 categorical variables in rows and columns and shows how many times each combination of categories occurs
eg. (play an instrument | do not play an instrument)
play a sport | number | number
do not play a sport | number | number
what is special about two-way tables?
they allow us to observe associations between different combinations. BUT make sure you use percentages to calculate, not raw numbers. this is because raw numbers generally are not on the same scale (eg. 2/30 vs. 100/1000 are not on the same scale)
what is a treatment group?
group that receives the treatment of interest or has the characteristic of interest (eg. yes they smoke)
always categorical
what is a control group?
group that does not receive the treatment of interest. also known as comparison group. (eg. no, doesn’t smoke)
difference between observational and controlled experiments?
in observational studies, subjects are put into treatment or control by their own choice.
in controlled experiments, researchers randomly assign subjects to the control or treatment group and then record differences.
only through controlled experiments can you establish causality.
what is an anecdote?
someone’s story about their own experience.
CANNOT establish causality OR EVEN an association. basically useless in terms of statistics. they don’t even have a comparison group or control for the placebo effect.
what are the 4 gold standards for experiments?
sample sizes are large enough to account for variability (>10 usually)
assignment to treatment or control group is random and is controlled by the reseracher
a placebo is used if appropriate
the study is blind or double-blind
blind study = participants don’t know whether they are in the control group or treatment group
double-blind study = person administering treatment also doesn’t know who’s in the control group or treatment group. PREFERRED!
what are graphical and numerical summaries?
graphical: center, spread, and shape of graph
numerical: outliers
what is distribution of a data set
list that records values observed (x-axis) and frequency of values (y-axis)
what are 2 graphs for numerical data
dotplots
histograms
what summary do you want to see for a distribution
symmetry? most common value? outliers?
dotplot and its pros/cons
each value marked by a dot over a number line
pros: shows individual data values, easy to spot outliers, describes distribution visually
cons: not as common as histograms and other graphs. not good when data has many individual values
how do you make a histogram?
histograms group observations into intervals called bins or classes (long thin rectangles)
place numerical variable on x-axis, frequencies on y-axis
divide x-axis into intervals of equal width to form bins
count how many observations fall into each bin (height)
draw a vertical bar above each bin representing each frequency. these bars MUST touch for numerical data!!!! (when they don’t touch that’s a bar chart and it’s for categorical data ONLY)
histogram pros/cons
pros: good for large data sets, easy to spot outliers, describes distribution visually, compact display, flexible in defining intervals
cons: lose individual data values, requires more work to create, distribution changes shape as width of bins changes
what is a relative frequency histogram?
divide frequency by sample size (frequency/sample size)
relative frequencies (proportions) on y-axis, numerical variable on x-axis
how do you create a histogram on the ti-84?
STAT → edit → 1: Edit → enter data set in L1
2nd → [Y=] AKA stat plot → 1": stat plots → turn on → make sure Xlist is set to L1 top right corner for histogram → ZOOM → 9: ZoomStat → use TRACE and arrow keys to navigate graph
what are 3 aspects of data distribution for a numerical variable?
shape: symmetric or skewed, how many bumps or mounds, outliers
center: typical value
horizontal spread: close together (low variability) or spread out (high variability)
how do you know what kind of variability something has
in histogram, one bar > 50% of other bar’s height, or 2 bars both cross 50% frequency = high variability
if one bar sticks out by a lot = low variability
wide histogram = high variation, narrow histogram = low variation
what kinds of distributions are symmetric?
bell shaped, bimodal
what are the two types of skew
skewed right: tail on the right end is shorter
skewed left: tail on the left end is shorter
what are mounds?
the same thing as mode (most common number in set of data, greatest frequency)
unimodal has 1 mode
bimodal has 2 modes (equal or very close frequency)
multimodal has >2 modes (w/same frequency)
uniform has no modes (graph completely flat)
what are the 5 distribution shapes?
bell-shaped / normal
skewed left
skewed right
bimodal
uniform
characteristics of the normal distribution
symmetric, unimodal, and bell-shaped
what are the 2 graphs for categorical data
bar charts, pie charts
what is a bar chart
1 bar for each observed category
bars don’t touch
horizontal (x-axis) = each category of the variable, vertical (y-axis) = frequency
can put the variables (categories) in whatever order you want
what is a pareto chart
bar chart except you specifically order the categories from highest frequency to lowest frequency (tallest → smallest)
“informative” way to organize categorical data
what is the difference between bar charts and histograms?
in bar charts, you don’t care about order and width, but in histograms the bars have to be in numerical order and the widths of each bar must be the same size
in bar charts there are gaps between the bars but in a histogram the bars must touch
what is a pie chart and what is it used for
gives each category a “slice” of the whole whose size is proportional to the category’s frequency (if it’s higher frequency it gets a bigger chunk)
used to display how much of a share each category has of the whole. NOT commonly used by statisticians.
what are 2 aspects of data distribution for categorical variables?
mode (center)
variability
what characteristics may cause graphs to be misleading?
frequency scale (y-axis) doesn’t begin at 0 to create an illusion of greater differences
use symbols other than bars that hide or accentuate the real differences
use unequal width bars or force perspective (like tilting a pie chart to make one chunk look bigger)