Statistics Vocabulary Ch. 1-3

4.5(2)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/93

flashcard set

Earn XP

Description and Tags

Statistics

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

94 Terms

1
New cards
Data
Collections of observations, such as measurements, genders, or `survey` responses
2
New cards
Statistics
The science of planning studies and experiments; obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data
3
New cards
Population
the complete collection of all measurements or data that are being considered
4
New cards
Census
the collection of data from every member of the population
5
New cards
Sample
Subcollection of members selected from a population
6
New cards
Voluntary Response Sample
one in which the respondents themselves decide whether to be included
7
New cards
Parameter
a numerical measurement describing some characteristic of a population
8
New cards
Statistic
a numerical measurement describing some characteristic of a sample
9
New cards
Quantitative Data
Data consisting of numbers representing counts or measurements
10
New cards
Qualitative (Categorial data)
Data consisting of names or labels (not numbers that represent counts or measurements)
11
New cards
Discrete Data
result when the data values are quantitative and the number of values is finite or "countable"
12
New cards
Continuous Data
result from infinitely many possible quantitative values, where the collection of values is not countable
13
New cards
Nominal Level of Measurement
characterized by data that consist of names, labels, or categories only. The data can not be arranged in an ordering scheme (such as low to high)
14
New cards
Ordinal Level of Measurement
data that can be arranged in some order, but differences (obtained by subtraction) between data values either can not be determined or are meaningless
15
New cards
Interval Level of Measurement
Data that can be arranged in order, and differences between data values can be found and are meaningful. Data at the _____ Level does NOT have a natural zero starting point at which none of the quantity is present.
16
New cards
Ratio Level of Measurement
Data that can be arranged in order, differences can be found and are meaningful, and there IS a natural zero starting point
17
New cards
Big Data
Data sets that are too large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of _____ may require software simultaneously running in parallel on many different computers
18
New cards
Data Science
Involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as sociology or finance).
19
New cards
Missing Completely at Random
A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value.
20
New cards
Missing Not at Random
A data value is missing not at random if the missing value is related to the reason that it is missing.
21
New cards
Placebo
A harmless and ineffective pill, medicine, or procedure sometimes used for psychological benefit or sometimes used by researchers for comparison to other treatments
22
New cards
Experiment
in an experiment, we apply some treatment and then proceed to observe its effects on the individuals. (these individuals are referred to as experimental units, and often called subjects when they are people)
23
New cards
Observational Study
observe and measure specific characteristics, but we don't attempt to modify the individuals being studied
24
New cards
Replication
Repetition of an experiment on more than one individual
25
New cards
Blinding
Used when the subject doesn't know whether he or she is receiving a treatment or a placebo
26
New cards
Placebo Effect
Used when individuals are assigned to different groups through a process of random selection
27
New cards
Double Blinding
the act of blinding both the subjects of an experiment and the researchers who work with the subjects.
28
New cards
Confounding
occurs when we can see some effect, but we can not identify the specific factor that caused it.
29
New cards
Simple Random Sample
A sample of size n selected from the population in such a way that each possible sample of size n has an equal chance of being selected.
30
New cards
Random Sample
has a weaker requirement (as compared to a simple random sample) that all members of the population have the same chance of being selected
31
New cards
Systematic Sampling
we select some starting point and then select every kth (such as every 50th) element in the population
32
New cards
Convenience Sampling
we simply use data that is very easy to get
33
New cards
Stratified Sampling
we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender). Then we draw a sample from each subgroup
34
New cards
Cluster Sampling
we first divide the population area into sections (or clusters). Then we randomly select some of those clusters and choose all the members from those selected clusters.
35
New cards
Cross-Sectional Study
data are observed, measured, and collected at one point in time
36
New cards
Retrospective Study
data are collected from a past time period by going back in time (through examination of records, interviews, and so on)
37
New cards
Prospective (Longitudinal Study)
data are collected in the future from groups that share common factors
38
New cards
Sampling Error
occurs when the sample has been selected with a random method, but there is a discrepancy between a sample result and the true population result; such an error results from chance sample fluctuations
39
New cards
Non-Sampling Error
the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances
40
New cards
Nonrandom Sampling Error
the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample
41
New cards
Statistically significant result
one that is very unlikely to occur by chance
42
New cards
Lower Class Limit
End value of a class limit.
43
New cards
Upper Class Limits
Beginning value of a class limit.
44
New cards
Class Boundaries
the numbers used to separate the classes, but without the gaps created by class limits. (The numbers between classes, Ex. Class : 10 - 19 , boundaries \= 9.5, 19.5 )
45
New cards
Class Midpoint
the values in the middle of the classes. (Upper Class Limit + Lower Class Limit / 2)
46
New cards
Class Width
the difference between two consecutive lower class limits
47
New cards
Frequency Table (Distribution)
shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.
48
New cards
Relative Frequency Distribution
the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies.
49
New cards
Cumulative Frequency Distribution
A variation of the basic frequency distribution, in which the frequency for each class is the sum of the frequencies for that class and all previous classes.
50
New cards
Histogram
A graph used to show frequency distributions of data points of one variable. (Bar Graph that touches, Each bar sits within the boundaries of each class)
51
New cards
Relative Frequency Histogram
A Histogram that measures the vertical scale on Frequency Percentages % instead of \#'s
52
New cards
Normal Distribution
a distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. (Bell Shaped)
53
New cards
Skewed Right Distribution
a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the right side
54
New cards
Skewed Left Distribution
a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the left side.
55
New cards
Uniform Distribution
a type of distribution in which all different possible values occur with approximately the same frequency, so the heights of the bars in the histogram are approximately uniform
56
New cards
DotPlot
a graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values. Dots representing equal values are stacked.
57
New cards
Stem-and-Leaf Plot
represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit, 10's) and the leaf (such as the rightmost digit, 1's). Can reconstruct data sets from graph
58
New cards
Time-Series Graph
a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly.
59
New cards
Bar Graph
uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. Typically has spaces between bars
60
New cards
Pareto Chart
a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right. (NO spaces between bars)
61
New cards
Pie Chart
a very common graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category.
62
New cards
Frequency Polygon
uses line segments connected to points located directly above class midpoint values. A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars.
63
New cards
Relative Frequency Polygon
uses line segments connected to points located directly above class midpoint values but uses relative frequencies (proportions or percentages) for the vertical scale instead.
64
New cards
Pictographs
Drawings of objects. Data that are one-dimensional in nature (such as budget amounts) are often depicted with two-dimensional objects (such as dollar bills) or three-dimensional objects (such as stacks of dollar bills). By using pictographs, artists can create false impressions that grossly distort differences by using these simple principles of basic geometry.
65
New cards
Correlation
a relationship that exists between two variables when the values of one variable are somehow associated with the values of the other variable.
66
New cards
Linear Correlation
exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.
67
New cards
Scatter Plot
is a plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first variable (x), and the vertical axis is used for the second variable (y).
68
New cards
Linear Correlation Coefficient
is denoted by r, and it measures the strength of the linear association between two variables.
69
New cards
P-Value
is the probability of getting paired sample data with a linear correlation coefficient r that is at least as extreme as the one obtained from the paired sample data.
70
New cards
Regression Line
is the straight line that "best" fits the scatterplot of the data.
71
New cards
Descriptive Statistics
summarize or describe relevant characteristics of data
72
New cards
Inferential Statistics
used to make inferences or generalizations about a population
73
New cards
Measure of Center
used to measure the center of a data by finding the Mean, Median, Mode, and Midrange
74
New cards
Mean - (or arithmetic mean)
of a set of data is the measure of center found by adding all of the data values and dividing the total by the number of data values. Also known as the average
75
New cards
Resistant
if the presence of extreme values (outliers) does not cause it to change very much
76
New cards
Median
of a data set is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.
77
New cards
Mode
of a data set is the value(s) that occur(s) with the greatest frequency.
78
New cards
Bimodal
When two data values occur with the same greatest frequency, each one is a mode
79
New cards
Multimodal
When more than two data values occur with the same greatest frequency, each is a mode
80
New cards
No mode
When no data value is repeated
81
New cards
Midrange
of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2
82
New cards
Variation
Describes the spread of data by finding values of range, variance, and standard deviation
83
New cards
Range
of a set of data values is the difference between the maximum data value and the minimum data value.
84
New cards
Standard Deviation
Sample \= s, Population \= σ. is a measure of how much data values deviate away from the mean.
85
New cards
Biased Estimator
which means that values of the sample standard deviation s do not tend to center around the value of the population standard deviation σ.
86
New cards
Unbiased Estimator
which means that values of s^2 tend to center around the value of σ^2 instead of systematically tending to overestimate or underestimate σ^2
87
New cards
Range Rule of Thumb
Subtract the smallest value in a dataset from the largest and divide the result by four to estimate the standard deviation.
88
New cards
Variance
of a set of values is a measure of variation equal to the square of the standard deviation.
89
New cards
Coefficient of Variation (or CV)
for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean
90
New cards
Z-Score (or standard score or standardized value)
is the number of standard deviations that a given value x is above or below the mean
91
New cards
Percentile
are measures of location, denoted which divide a set of data into 100 groups with about 1% of the values in each group
92
New cards
Quartiles
are measures of location, denoted and which divide a set of data into four groups with about 25% of the values in each group.
93
New cards
Boxplot (or box-and-whisker diagram)
is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3
94
New cards
Skewed
if the spread of data is not symmetric and extends more to one side than to the other.