Unit 4: Exploring Data

0.0(0)
studied byStudied by 18 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/100

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

101 Terms

1
New cards
Mosaic Plots
________: Stacked bar chart that shows percentages of data in groups.
2
New cards
Box plots
________: a graph that gives a quick picture of the middle 50 % of the data.
3
New cards
Outliers
________: An observation that is surprisingly different from the rest of the data.
4
New cards
Bivariate data
________: Taking two measurements on each object (Ex.
5
New cards
Dotplot
________: Best for small data sets, similar to histograms and bar plots.
6
New cards
Numerical
________ or Qualitative: Outcomes can be measured arithmetically.
7
New cards
Sample
________: The part of the population that is actually studied.
8
New cards
Quartiles
________: Divide a set of values into four equal parts by using the 25th, 50th, and 75th.
9
New cards
Q1
________: 25 % of values are below and 75 % of values are above.
10
New cards
Correlation Coefficient
________: Numerical measures used to judge the relation between two variables.
11
New cards
standard deviation
Can be qualified through the range, ________, or variance of a distribution.
12
New cards
Q2
________: 50 % of the values are below and 50 % of the values are above.
13
New cards
Spread
________: Describes how far the data points are from the center.
14
New cards
Univariate data
________: Taking only one measurement on each object (Ex.
15
New cards
Histogram
________: a graphical representation in the x- y form of the distribution of data in a data set; x represents the data and y represents the frequency or relative frequency.
16
New cards
Shape
________: Distribution can tell us where most of the data is.
17
New cards
Categorical
________ or Qualitative: Places the individual being studied into one of several groups.
18
New cards
Error
________ or residual= e= y- ŷ= observed values of Y for a given value of X- predicted value of Y for a given value of X.
19
New cards
Population
________: The entire group of individuals or things that we are interested in.
20
New cards
Range
________: The difference between the largest and the smallest measurement in a data set.
21
New cards
graph
The ________ consists of contiguous rectangles.
22
New cards
Scatterplot
________: Graphical summary measure.
23
New cards
Linear regression mode
________: Is an equation that gives a straight- line relationship between two variables.
24
New cards
Direction
________: The scatterplot will show whether the y- value increases or decreases as the x increases, or that it changes ________.
25
New cards
Positive z score
________: Indicates that the measurement is larger than the mean.
26
New cards
Linear Regression
________: If two different qualitative variables have a linear relation, then we can measure the strength of that relationship using this.
27
New cards
Statistics
________: The science of data.
28
New cards
Stem
________- and- leaf graph or stemplot: easy to compute the median and other quantiles.
29
New cards
Positive relation
________: Increasing or upward trend between two variables.
30
New cards
Tabular Methods
________: Frequency distribution table (it facilitates the analysis of patterns of variation among observed data)
31
New cards
regression line
Predicted value: computed using the estimated ________ and is also known as "y hat.
32
New cards
Coefficient of determination
________: measures the percent of the variation in Y- values explained by the linear relation between X- and Y- values.
33
New cards
Descriptive methods
________: The different methods used collect data.
34
New cards
Population mean
________: Adding up all the values in the entire population and dividing by the number of values.
35
New cards
Frequency
________** (f): Number of times that observation has occurred.
36
New cards
Bar Charts
________: The length of the bar for each category is proportional to the number or percent of individuals in each category.
37
New cards
Cumulative Frequency Charts
________: Frequency for that group plus the frequencies of all groups of small observations.
38
New cards
Statistics
The science of data
39
New cards
Descriptive methods
The different methods used collect data
40
New cards
Categorical or Qualitative
Places the individual being studied into one of several groups
41
New cards
Numerical or Qualitative
Outcomes can be measured arithmetically
42
New cards
Univariate data
Taking only one measurement on each object (Ex
43
New cards
Bivariate data
Taking two measurements on each object (Ex
44
New cards
Tabular Methods
Frequency distribution table (it facilitates the analysis of patterns of variation among observed data)
45
New cards
n
Denotes the number of observations
46
New cards
**Frequency (**f)
Number of times that observation has occurred
47
New cards
Relative frequency
Ratio of the frequency to the total number of observations
48
New cards
Cumulative frequency
Gives the number of observations less than or equal to a specific value
49
New cards
Frequency distribution table
A table giving all possible values of a variable and their frequencies
50
New cards
Bar Charts
The length of the bar for each category is proportional to the number or percent of individuals in each category
51
New cards
Pie Chart
Categories of data are represented by wedges in a circle and are proportional in size to the percentage of individuals in each category
52
New cards
Segmented Bar Chart
Takes the distribution from each group and arranges them along either the horizontal or vertical axis and shows the relative frequency of each group represented in one bar for each group
53
New cards
Mosaic Plots
Stacked bar chart that shows percentages of data in groups
54
New cards
Center
Describes the "typical" or central data points
55
New cards
Spread
Describes how far the data points are from the center
56
New cards
Shape
Distribution can tell us where most of the data is
57
New cards
Symmetrical Distribution
The data is spread out in the same way on both sides and there is the same amount of data on each side of the center
58
New cards
Skewed Distribution
If there is an extreme value in only one direction that causes one side to have a longer tail
59
New cards
Cluster sample
A sample in which the researcher first divides the population into sections (or clusters), and then randomly selects all members from some of those clusters
60
New cards
Outliers
An observation that is surprisingly different from the rest of the data
61
New cards
Stem-and-leaf graph or stemplot
easy to compute the median and other quantiles
62
New cards
Dotplot
Best for small data sets, similar to histograms and bar plots
63
New cards
Histogram
a graphical representation in the x-y form of the distribution of data in a data set; x represents the data and y represents the frequency or relative frequency
64
New cards
Cumulative Frequency Charts
Frequency for that group plus the frequencies of all groups of small observations
65
New cards
Population
The entire group of individuals or things that we are interested in
66
New cards
Sample
The part of the population that is actually studied
67
New cards
Mean
The arithmetic means AKA average
68
New cards
Population mean
Adding up all the values in the entire population and dividing by the number of values
69
New cards
Median
Point that divides the measurements in half
70
New cards
Range
The difference between the largest and the smallest measurement in a data set
71
New cards
Interquartile range
The range of the middle 50% of the data, the difference between the third quartile and the first quartile
72
New cards
Standard deviation
A number that is equal to the square root of the variance and measures how far data values are from their mean
73
New cards
Variance
Average of the squares of the deviation
74
New cards
Percentiles
Percentiles divide a set of values into 100 equal parts
75
New cards
Quartiles
Divide a set of values into four equal parts by using the 25th, 50th, and 75th
76
New cards
Q1
25% of values are below and 75% of values are above
77
New cards
Q2
50% of the values are below and 50% of the values are above
78
New cards
Q3
75% of values are below and 25% of values are above
79
New cards
Standardized scores or z-scores
Gives the distance between the measurements and the mean in terms of the number of standard deviations
80
New cards
Negative z-score
Indicated that the measurements are smaller than the mean
81
New cards
Positive z-score
Indicates that the measurement is larger than the mean
82
New cards
Box plots
a graph that gives a quick picture of the middle 50% of the data
83
New cards
Bivariate data
Data on two different variables collected from each item in a study
84
New cards
Linear Regression
If two different qualitative variables have a linear relation, then we can measure the strength of that relationship using this
85
New cards
Scatterplot
Graphical summary measure
86
New cards
Shape
A scatter plot tells us whether the nature of the relation between the two variables in linear or nonlinear
87
New cards
Direction
The scatterplot will show whether the y-value increases or decreases as the x increases, or that it changes direction
88
New cards
Positive relation
Increasing or upward trend between two variables
89
New cards
Negative relation
Decreasing or downward trend between the two variables
90
New cards
Strength of relationship
If the trend of the data can be described with a line of the curve then the spread of the data values around the line or curve describes the degree of the relation between the two
91
New cards
Correlation Coefficient
Numerical measures used to judge the relation between two variables
92
New cards
Linear regression mode
Is an equation that gives a straight-line relationship between two variables
93
New cards
Independent variable
x
94
New cards
Dependent variable
y
95
New cards
Slope
b
96
New cards
y-intercept
a
97
New cards
Predicted value
computed using the estimated regression line and is also known as "y hat"
98
New cards
Least square regression line
line that minimizes the sum of the squares of the residuals
99
New cards
Outliers
are observed data points that are far from the least squares line
100
New cards
Influential points
observed data points that are far from the other observed data points in the horizontal direction