Ch1 - exploring data

0.0(0)

Studied by 2 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/43

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

44 Terms

New cards

Data Analysis

is the process of organizing/showing/summarizing/asking questions about data

New cards

individuals

objects described by a set of data

New cards

variables

any characteristic of an individual

New cards

Categorical Variable

places individuals into one of several groups of categories

zip codes
- 20s, 30s
bar graphs, pie charts

can still be %

New cards

Quantitative Variable

takes numerical values for which it makes sense to find an average

avg age of highschooler
dot plots, stem & leaf plots, histograms

diff from %

New cards

in a scenario data can either be

categorical or quantitative

New cards

pictographs/bar graphs

visual representations of data using pictures or bars to show frequency or quantity.

side-by side bar graphs
segmented bar graphs
mosaic plot

New cards

side-by side bar graphs

shows (relative) frequency proportions of quantitative data, side-by-side

only use a count when sample sizes are the same (two classes of 24)

New cards

segmented bar graph

shows (relative) frequency proportions of quantitative data, stacked

only use a count when sample sizes are the same (two classes of 24)

New cards

mosaic plot

shows (relative) frequency proportions of quantitative data, with areas that are proportional to the frequencies of categories.

New cards

cherrypicking

picking specific data points to show desired graph + changing scaling

New cards

Bar Graphs show

categorical data (also shown by pie charts)

in frequency (if same sample size)
and in relative frequency (if diff sample sizes)

New cards

frequency

for categorical data = the count (how many) of observations in each category or class

New cards

relative frequency

for categorical data = the % of observations in each category or class

New cards

Two Way Tables

show the frequency counts for each combination of categories.

show marginal distribution
show conditional distribution
can show association

New cards

marginal distribution

is the distribution of values of that variable among all individuals described by the table (total)

New cards

margin

a subset of values in a larger set of data, a variable NOT value

New cards

conditional distribution

describes the values of that variable among individuals who have a specific value of another variable (probability)

New cards

association

knowing the value of one variable helps predict the value of the other.

Since knowing the favorite core subject will help us predict favorite elective, we say there is an association between the two variables.

is proven stronger when there is a greater difference between the variables of two classes

New cards

distribution

the way in which values of a variable are spread across a range.

All the values that the zoologist records for body temperature and how many individual bears have each value.

marginal or conditional

New cards

dotplot

display of quantitative data

each data value is shown as a dot above its location on the # line

continuous x-axis
consistent scale
label units

New cards

stem & leaf plot

display of quantitative data

the stems must be continuous + 5 min
leaves are single digit
include key for 1st data idem
does not work well w big sets

if split stems, do so equally BY ONES PLACE: 0-1, 2-3, etc

<p>display of quantitative data</p><ul><li><p>the stems must be continuous + 5 min</p></li><li><p>leaves are single digit</p></li><li><p>include key for 1st data idem</p></li><li><p>does not work well w big sets</p></li></ul><p>if split stems, do so equally BY ONES PLACE: 0-1, 2-3, etc</p><p></p>

New cards

Histogram

display of quantitative data

divide range of data into classes of equal width
Find the count (frequency) or % (relative frequency) of individuals in each class
label & scale axes & draw histogram. adjacent bars touch unless a class contains no individuals
GIVE KEY (,] or [,)

New cards

vocab to describe histogram

left bound graph
right bound graph
skyscraper graph
pancake graph

New cards

left bound graph

histogram whose buckets include left [left,right) not right

New cards

right bound graph

histogram whose buckets include right (left,right] not left

New cards

skyscraper graph

too few classes in histogram graph

New cards

pancake graph

too many classes in histogram graph

New cards

Histogram on calc

List: stat, edit
Make Histogram: 2nd, Y= (STAT PLOT)
- press 1, turn on desired plot
press zoom 9
adjust window
press graph (do not press zoom again)

press TRACE to see 5# list

New cards

5 descriptors of a graph

dont forget you SOCCS

shape
outliers → “appears to be”
center → median or mean → “middle value”
context → units
spread → range (min-max), spread (min, max), IQR, standard deviation

New cards

describing shape

rough symmetry
uniform → height is roughly same for whole graph
right-skewed/left-skewed → which dir. graph tails

AND

unimodal → /\
bimodal → /\/\
multimodal

New cards

formula for sample mean

“x bar” = sum (all data items)/ # of items

New cards

Describing center

The measures that describe the center of the data 5pt)

the median: middle #, resistant to outliers
the mean: avg #, is impacted by outliers
Quartiles: the 25% ‘s
Spread; [9,24] or the range; 24
IQR: measures the range of the middle 50%

New cards

the median

middle # in order, resistant to outliers

New cards

the mean

avg #, is impacted by outliers

New cards

should we use mean or median

when graph is approx symmetric: mean or standard deviation is more accurate
when graph is skewed median/center, or IQR is more accurate bc mean is pulled to skew