Ch1 - exploring data

0.0(0)
studied byStudied by 2 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/43

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

Data Analysis

is the process of organizing/showing/summarizing/asking questions about data

2
New cards

individuals

objects described by a set of data

3
New cards

variables

any characteristic of an individual

4
New cards

Categorical Variable

places individuals into one of several groups of categories

  • zip codes

    • 20s, 30s

  • bar graphs, pie charts

can still be %

5
New cards

Quantitative Variable

takes numerical values for which it makes sense to find an average

  • avg age of highschooler

  • dot plots, stem & leaf plots, histograms

diff from %

6
New cards

in a scenario data can either be

categorical or quantitative

7
New cards

pictographs/bar graphs

visual representations of data using pictures or bars to show frequency or quantity.

  • side-by side bar graphs

  • segmented bar graphs

  • mosaic plot

<p>visual representations of data using pictures or bars to show frequency or quantity.</p><ul><li><p>side-by side bar graphs</p></li><li><p>segmented bar graphs</p></li><li><p>mosaic plot</p></li></ul><p></p>
8
New cards

side-by side bar graphs

shows (relative) frequency proportions of quantitative data, side-by-side

  • only use a count when sample sizes are the same (two classes of 24)

<p>shows (relative) frequency proportions of quantitative data, side-by-side</p><ul><li><p>only use a count when sample sizes are the same (two classes of 24)</p></li></ul><p></p>
9
New cards

segmented bar graph

shows (relative) frequency proportions of quantitative data, stacked

  • only use a count when sample sizes are the same (two classes of 24)

<p>shows (relative) frequency proportions of quantitative data, stacked</p><ul><li><p>only use a count when sample sizes are the same (two classes of 24)</p></li></ul><p></p>
10
New cards

mosaic plot

shows (relative) frequency proportions of quantitative data, with areas that are proportional to the frequencies of categories.

<p>shows (relative) frequency proportions of quantitative data, with areas that are proportional to the frequencies of categories. </p>
11
New cards

cherrypicking

picking specific data points to show desired graph + changing scaling

12
New cards

Bar Graphs show

categorical data (also shown by pie charts)

  • in frequency (if same sample size)

  • and in relative frequency (if diff sample sizes)

13
New cards

frequency

for categorical data = the count (how many) of observations in each category or class

14
New cards

relative frequency

for categorical data = the % of observations in each category or class

15
New cards

Two Way Tables

show the frequency counts for each combination of categories.

  • show marginal distribution

  • show conditional distribution

  • can show association

<p>show the frequency counts for each combination of categories.</p><ul><li><p>show marginal distribution</p></li><li><p>show conditional distribution</p></li><li><p>can show association</p></li></ul><p></p>
16
New cards

marginal distribution

is the distribution of values of that variable among all individuals described by the table (total)

<p>is the distribution of values of that variable among all individuals described by the table (total)</p>
17
New cards

margin

a subset of values in a larger set of data, a variable NOT value

18
New cards

conditional distribution

describes the values of that variable among individuals who have a specific value of another variable (probability)

<p>describes the values of that variable among individuals who have a specific value of another variable (probability)</p>
19
New cards

association

knowing the value of one variable helps predict the value of the other.

  • Since knowing the favorite core subject will help us predict favorite elective, we say there is an association between the two variables.

is proven stronger when there is a greater difference between the variables of two classes

20
New cards

distribution

the way in which values of a variable are spread across a range.

  • All the values that the zoologist records for body temperature and how many individual bears have each value.

marginal or conditional

21
New cards

dotplot

display of quantitative data

each data value is shown as a dot above its location on the # line

  • continuous x-axis

  • consistent scale

  • label units

<p>display of quantitative data</p><p>each data value is shown as a dot above its location on the # line</p><ul><li><p>continuous x-axis</p></li><li><p>consistent scale</p></li><li><p>label units</p></li></ul><p></p>
22
New cards

stem & leaf plot

display of quantitative data

  • the stems must be continuous + 5 min

  • leaves are single digit

  • include key for 1st data idem

  • does not work well w big sets

if split stems, do so equally BY ONES PLACE: 0-1, 2-3, etc

<p>display of quantitative data</p><ul><li><p>the stems must be continuous + 5 min</p></li><li><p>leaves are single digit</p></li><li><p>include key for 1st data idem</p></li><li><p>does not work well w big sets</p></li></ul><p>if split stems, do so equally BY ONES PLACE: 0-1, 2-3, etc</p><p></p>
23
New cards

Histogram

display of quantitative data

  • divide range of data into classes of equal width

  • Find the count (frequency) or % (relative frequency) of individuals in each class

  • label & scale axes & draw histogram. adjacent bars touch unless a class contains no individuals

  • GIVE KEY (,] or [,)

<p>display of quantitative data</p><ul><li><p>divide range of data into classes of equal width</p></li><li><p>Find the count (frequency) or % (relative frequency) of individuals in each class</p></li><li><p>label &amp; scale axes &amp; draw histogram. adjacent bars touch unless a class contains no individuals</p></li><li><p>GIVE KEY (,] or [,)</p></li></ul><p></p>
24
New cards

vocab to describe histogram

  • left bound graph

  • right bound graph

  • skyscraper graph

  • pancake graph

25
New cards

left bound graph

histogram whose buckets include left [left,right) not right

26
New cards

right bound graph

histogram whose buckets include right (left,right] not left

27
New cards

skyscraper graph

too few classes in histogram graph

28
New cards

pancake graph

too many classes in histogram graph

29
New cards

Histogram on calc

  • List: stat, edit

  • Make Histogram: 2nd, Y= (STAT PLOT)

    • press 1, turn on desired plot

  • press zoom 9

  • adjust window

  • press graph (do not press zoom again)

press TRACE to see 5# list

30
New cards

5 descriptors of a graph

dont forget you SOCCS

  • shape

  • outliers → “appears to be”

  • center → median or mean → “middle value”

  • context → units

  • spread → range (min-max), spread (min, max), IQR, standard deviation

31
New cards

describing shape

  • rough symmetry

  • uniform → height is roughly same for whole graph

  • right-skewed/left-skewed → which dir. graph tails

AND

  • unimodal → /\

  • bimodal → /\/\

  • multimodal

32
New cards
term image

formula for sample mean

“x bar” = sum (all data items)/ # of items

33
New cards

Describing center

The measures that describe the center of the data 5pt)

  1. the median: middle #, resistant to outliers

  2. the mean: avg #, is impacted by outliers

  3. Quartiles: the 25% ‘s

  4. Spread; [9,24] or the range; 24

  5. IQR: measures the range of the middle 50%

<p>The measures that describe the center of the data 5pt)</p><ol><li><p>the median: middle #, resistant to outliers</p></li><li><p>the mean: avg #, is impacted by outliers</p></li><li><p>Quartiles: the 25% ‘s</p></li><li><p>Spread; [9,24] or the range; 24</p></li><li><p>IQR: measures the range of the middle 50%</p></li></ol><p></p>
34
New cards

the median

middle # in order, resistant to outliers

35
New cards

the mean

avg #, is impacted by outliers

36
New cards

should we use mean or median

  • when graph is approx symmetric: mean or standard deviation is more accurate

  • when graph is skewed median/center, or IQR is more accurate bc mean is pulled to skew

<ul><li><p>when graph is approx symmetric: mean or standard deviation is more accurate</p></li><li><p>when graph is skewed median/center, or IQR is more accurate bc mean is pulled to skew</p></li></ul><p></p>
37
New cards

Quartiles

the 25% ‘s

  • 1st quartile = median of lower half

  • 3rd quartile = median of upper half

(if even then include median, if odd then exclude median)

<p>the 25% ‘s</p><ul><li><p>1st quartile = median of lower half</p></li><li><p>3rd quartile = median of upper half</p></li></ul><p>(if even then include median, if odd then exclude median)</p>
38
New cards

IQR

measures the range of the middle 50%

  • IQR = Q3-Q1

39
New cards

graphically representing the center (box plot/box & whisker plot)

box plot/box & whisker plot

  • box shows middle 50%

  • line is median

  • lines show quartiles and outliers with *

<p>box plot/box &amp; whisker plot</p><ul><li><p>box shows middle 50%</p></li><li><p>line is median</p></li><li><p>lines show quartiles and outliers with *</p></li></ul><p></p>
40
New cards

is it an outlier

an observation that falls 1.5 x IQR = outside boundary

  • above Q3: Q3 + 1.5(IQR)

  • below Q1: Q1 - 1.5(IQR)

if on boundary its included

<p>an observation that falls 1.5 x IQR = outside boundary</p><ul><li><p>above Q3: Q3 + 1.5(IQR)</p></li><li><p>below Q1: Q1 - 1.5(IQR)</p></li></ul><p>if on boundary its included</p>
41
New cards

standard deviation

measures the avg distance from the mean (should only be used when mean measure of center) always > or = 0

  • calculated by taking square root of variaance / square root of (Sx²)

  • = Sx (same units as og observations)

is not resistant - very impacted by skew + outliers bc of square (goes toward skew/outliers)

42
New cards
term image

Sx = root (variance), standard deviation

43
New cards

variance

how spread out the data is

  • (Sx²)

is not resistant - impacted by skew and outliers in the data set (more skew/outliers = greater spread)

44
New cards

measures of center vs variability

center: median, mean

variability: range, spread, IQR, standard deviation

Explore top flashcards

Earth week 3
Updated 217d ago
flashcards Flashcards (28)
quizlet vocab 3
Updated 1007d ago
flashcards Flashcards (25)
chapter 8
Updated 989d ago
flashcards Flashcards (38)
Spanish Vocab
Updated 201d ago
flashcards Flashcards (71)
bio 245 muscle O & I
Updated 588d ago
flashcards Flashcards (56)
Earth week 3
Updated 217d ago
flashcards Flashcards (28)
quizlet vocab 3
Updated 1007d ago
flashcards Flashcards (25)
chapter 8
Updated 989d ago
flashcards Flashcards (38)
Spanish Vocab
Updated 201d ago
flashcards Flashcards (71)
bio 245 muscle O & I
Updated 588d ago
flashcards Flashcards (56)