stats ch. 1 and 2

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/40

Earn XP

Description and Tags

oh dear

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

41 Terms

New cards

what is data

recorded observations within a certain context → “numbers in context”

New cards

what is statistics

the art and science of collecting and analyzing observations

New cards

what are variables

characteristics of people or things. have variability (change from one person to the next).

New cards

what is population of interest

Prompt: 125 colleges in the US with football teams

Population of interest: All colleges in the US with football teams

“the bigger picture,” entire data set

New cards

what is a sample?

collection of data, data set, subset of population

Prompt: 125 colleges in the US with football teams

Sample: 125 colleges in the US with football teams

New cards

what are quantitative / numerical variables

describe quantities of the objects of interest

exceptions are numerical identifiers that only label things instead of measuring them. eg. telephone numbers, student IDs, zip codes

New cards

what are qualitative / categorical variables

describe qualities of the objects of interest

answer is a word or label or category

includes labels like telephone numbers, student IDs, zip codes

New cards

how do you code categorical data?

use 0 and 1

0 means no and 1 means yes

sum of the 1s = number of “yes” answers

New cards

what is stacked data?

preferred format (treatment | observation)

treatments are in one column and observations are in the other

there is a connection between rows ——→

New cards

what is unstacked data?

each column represents a variable from a different group

(treatment A | treatment B | treatment C)

(observation A | observation B | observation C)

no connection between rows ——→

New cards

what are 8 questions to ask for context?

what are the objects of interest?
what variables were measured?
how were the variables measured?
what are the units of measurement?
who collected the data?
how did they collect the data?
where were the data collected?
why did they collect the data?

New cards

what is a two-way table?

table that organizes 2 categorical variables in rows and columns and shows how many times each combination of categories occurs

eg. (play an instrument | do not play an instrument)

play a sport | number | number

do not play a sport | number | number

New cards

what is special about two-way tables?

they allow us to observe associations between different combinations. BUT make sure you use percentages to calculate, not raw numbers. this is because raw numbers generally are not on the same scale (eg. 2/30 vs. 100/1000 are not on the same scale)

New cards

what is a treatment group?

group that receives the treatment of interest or has the characteristic of interest (eg. yes they smoke)

always categorical

New cards

what is a control group?

group that does not receive the treatment of interest. also known as comparison group. (eg. no, doesn’t smoke)

New cards

difference between observational and controlled experiments?

in observational studies, subjects are put into treatment or control by their own choice.

in controlled experiments, researchers randomly assign subjects to the control or treatment group and then record differences.

only through controlled experiments can you establish causality.

New cards

what is an anecdote?

someone’s story about their own experience.

CANNOT establish causality OR EVEN an association. basically useless in terms of statistics. they don’t even have a comparison group or control for the placebo effect.

New cards

what are the 4 gold standards for experiments?

sample sizes are large enough to account for variability (>10 usually)
assignment to treatment or control group is random and is controlled by the reseracher
a placebo is used if appropriate
the study is blind or double-blind

blind study = participants don’t know whether they are in the control group or treatment group

double-blind study = person administering treatment also doesn’t know who’s in the control group or treatment group. PREFERRED!

New cards

what are graphical and numerical summaries?

graphical: center, spread, and shape of graph

numerical: outliers

New cards

what is distribution of a data set

list that records values observed (x-axis) and frequency of values (y-axis)

New cards

what are 2 graphs for numerical data

dotplots
histograms

New cards

what summary do you want to see for a distribution

symmetry? most common value? outliers?

New cards

dotplot and its pros/cons

each value marked by a dot over a number line

pros: shows individual data values, easy to spot outliers, describes distribution visually

cons: not as common as histograms and other graphs. not good when data has many individual values

New cards

how do you make a histogram?

histograms group observations into intervals called bins or classes (long thin rectangles)

place numerical variable on x-axis, frequencies on y-axis
divide x-axis into intervals of equal width to form bins
count how many observations fall into each bin (height)
draw a vertical bar above each bin representing each frequency. these bars MUST touch for numerical data!!!! (when they don’t touch that’s a bar chart and it’s for categorical data ONLY)

New cards

histogram pros/cons

pros: good for large data sets, easy to spot outliers, describes distribution visually, compact display, flexible in defining intervals

cons: lose individual data values, requires more work to create, distribution changes shape as width of bins changes

New cards

what is a relative frequency histogram?

divide frequency by sample size (frequency/sample size)

relative frequencies (proportions) on y-axis, numerical variable on x-axis

New cards

how do you create a histogram on the ti-84?

STAT → edit → 1: Edit → enter data set in L1

2nd → [Y=] AKA stat plot → 1": stat plots → turn on → make sure Xlist is set to L1 top right corner for histogram → ZOOM → 9: ZoomStat → use TRACE and arrow keys to navigate graph

New cards

what are 3 aspects of data distribution for a numerical variable?

shape: symmetric or skewed, how many bumps or mounds, outliers

center: typical value

horizontal spread: close together (low variability) or spread out (high variability)

New cards

how do you know what kind of variability something has

in histogram, one bar > 50% of other bar’s height, or 2 bars both cross 50% frequency = high variability

if one bar sticks out by a lot = low variability

wide histogram = high variation, narrow histogram = low variation

New cards

what kinds of distributions are symmetric?

bell shaped, bimodal

New cards

what are the two types of skew

skewed right: tail on the right end is shorter

skewed left: tail on the left end is shorter

New cards

what are mounds?

the same thing as mode (most common number in set of data, greatest frequency)

unimodal has 1 mode

bimodal has 2 modes (equal or very close frequency)

multimodal has >2 modes (w/same frequency)

uniform has no modes (graph completely flat)

New cards

what are the 5 distribution shapes?

bell-shaped / normal
skewed left
skewed right
bimodal
uniform

New cards

characteristics of the normal distribution

symmetric, unimodal, and bell-shaped

New cards

what are the 2 graphs for categorical data

bar charts, pie charts

New cards

what is a bar chart

1 bar for each observed category

bars don’t touch

horizontal (x-axis) = each category of the variable, vertical (y-axis) = frequency

can put the variables (categories) in whatever order you want

New cards

what is a pareto chart

bar chart except you specifically order the categories from highest frequency to lowest frequency (tallest → smallest)

“informative” way to organize categorical data

New cards

what is the difference between bar charts and histograms?

in bar charts, you don’t care about order and width, but in histograms the bars have to be in numerical order and the widths of each bar must be the same size

in bar charts there are gaps between the bars but in a histogram the bars must touch

New cards

what is a pie chart and what is it used for

gives each category a “slice” of the whole whose size is proportional to the category’s frequency (if it’s higher frequency it gets a bigger chunk)

used to display how much of a share each category has of the whole. NOT commonly used by statisticians.

New cards

what are 2 aspects of data distribution for categorical variables?

mode (center)
variability

New cards

what characteristics may cause graphs to be misleading?

frequency scale (y-axis) doesn’t begin at 0 to create an illusion of greater differences

use symbols other than bars that hide or accentuate the real differences

use unequal width bars or force perspective (like tilting a pie chart to make one chunk look bigger)