Statistics Vocabulary Ch. 1-3

studied byStudied by 7 people
4.5(2)
Get a hint
Hint

Data

1 / 93

flashcard set

Earn XP

Description and Tags

Statistics

94 Terms

1

Data

Collections of observations, such as measurements, genders, or survey responses

New cards
2

Statistics

The science of planning studies and experiments; obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data

New cards
3

Population

the complete collection of all measurements or data that are being considered

New cards
4

Census

the collection of data from every member of the population

New cards
5

Sample

Subcollection of members selected from a population

New cards
6

Voluntary Response Sample

one in which the respondents themselves decide whether to be included

New cards
7

Parameter

a numerical measurement describing some characteristic of a population

New cards
8

Statistic

a numerical measurement describing some characteristic of a sample

New cards
9

Quantitative Data

Data consisting of numbers representing counts or measurements

New cards
10

Qualitative (Categorial data)

Data consisting of names or labels (not numbers that represent counts or measurements)

New cards
11

Discrete Data

result when the data values are quantitative and the number of values is finite or "countable"

New cards
12

Continuous Data

result from infinitely many possible quantitative values, where the collection of values is not countable

New cards
13

Nominal Level of Measurement

characterized by data that consist of names, labels, or categories only. The data can not be arranged in an ordering scheme (such as low to high)

New cards
14

Ordinal Level of Measurement

data that can be arranged in some order, but differences (obtained by subtraction) between data values either can not be determined or are meaningless

New cards
15

Interval Level of Measurement

Data that can be arranged in order, and differences between data values can be found and are meaningful. Data at the _____ Level does NOT have a natural zero starting point at which none of the quantity is present.

New cards
16

Ratio Level of Measurement

Data that can be arranged in order, differences can be found and are meaningful, and there IS a natural zero starting point

New cards
17

Big Data

Data sets that are too large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of _____ may require software simultaneously running in parallel on many different computers

New cards
18

Data Science

Involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as sociology or finance).

New cards
19

Missing Completely at Random

A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value.

New cards
20

Missing Not at Random

A data value is missing not at random if the missing value is related to the reason that it is missing.

New cards
21

Placebo

A harmless and ineffective pill, medicine, or procedure sometimes used for psychological benefit or sometimes used by researchers for comparison to other treatments

New cards
22

Experiment

in an experiment, we apply some treatment and then proceed to observe its effects on the individuals. (these individuals are referred to as experimental units, and often called subjects when they are people)

New cards
23

Observational Study

observe and measure specific characteristics, but we don't attempt to modify the individuals being studied

New cards
24

Replication

Repetition of an experiment on more than one individual

New cards
25

Blinding

Used when the subject doesn't know whether he or she is receiving a treatment or a placebo

New cards
26

Placebo Effect

Used when individuals are assigned to different groups through a process of random selection

New cards
27

Double Blinding

the act of blinding both the subjects of an experiment and the researchers who work with the subjects.

New cards
28

Confounding

occurs when we can see some effect, but we can not identify the specific factor that caused it.

New cards
29

Simple Random Sample

A sample of size n selected from the population in such a way that each possible sample of size n has an equal chance of being selected.

New cards
30

Random Sample

has a weaker requirement (as compared to a simple random sample) that all members of the population have the same chance of being selected

New cards
31

Systematic Sampling

we select some starting point and then select every kth (such as every 50th) element in the population

New cards
32

Convenience Sampling

we simply use data that is very easy to get

New cards
33

Stratified Sampling

we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender). Then we draw a sample from each subgroup

New cards
34

Cluster Sampling

we first divide the population area into sections (or clusters). Then we randomly select some of those clusters and choose all the members from those selected clusters.

New cards
35

Cross-Sectional Study

data are observed, measured, and collected at one point in time

New cards
36

Retrospective Study

data are collected from a past time period by going back in time (through examination of records, interviews, and so on)

New cards
37

Prospective (Longitudinal Study)

data are collected in the future from groups that share common factors

New cards
38

Sampling Error

occurs when the sample has been selected with a random method, but there is a discrepancy between a sample result and the true population result; such an error results from chance sample fluctuations

New cards
39

Non-Sampling Error

the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances

New cards
40

Nonrandom Sampling Error

the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample

New cards
41

Statistically significant result

one that is very unlikely to occur by chance

New cards
42

Lower Class Limit

End value of a class limit.

New cards
43

Upper Class Limits

Beginning value of a class limit.

New cards
44

Class Boundaries

the numbers used to separate the classes, but without the gaps created by class limits. (The numbers between classes, Ex. Class : 10 - 19 , boundaries = 9.5, 19.5 )

New cards
45

Class Midpoint

the values in the middle of the classes. (Upper Class Limit + Lower Class Limit / 2)

New cards
46

Class Width

the difference between two consecutive lower class limits

New cards
47

Frequency Table (Distribution)

shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.

New cards
48

Relative Frequency Distribution

the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies.

New cards
49

Cumulative Frequency Distribution

A variation of the basic frequency distribution, in which the frequency for each class is the sum of the frequencies for that class and all previous classes.

New cards
50

Histogram

A graph used to show frequency distributions of data points of one variable. (Bar Graph that touches, Each bar sits within the boundaries of each class)

New cards
51

Relative Frequency Histogram

A Histogram that measures the vertical scale on Frequency Percentages % instead of #'s

New cards
52

Normal Distribution

a distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. (Bell Shaped)

New cards
53

Skewed Right Distribution

a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the right side

New cards
54

Skewed Left Distribution

a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the left side.

New cards
55

Uniform Distribution

a type of distribution in which all different possible values occur with approximately the same frequency, so the heights of the bars in the histogram are approximately uniform

New cards
56

DotPlot

a graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values. Dots representing equal values are stacked.

New cards
57

Stem-and-Leaf Plot

represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit, 10's) and the leaf (such as the rightmost digit, 1's). Can reconstruct data sets from graph

New cards
58

Time-Series Graph

a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly.

New cards
59

Bar Graph

uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. Typically has spaces between bars

New cards
60

Pareto Chart

a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right. (NO spaces between bars)

New cards
61

Pie Chart

a very common graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category.

New cards
62

Frequency Polygon

uses line segments connected to points located directly above class midpoint values. A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars.

New cards
63

Relative Frequency Polygon

uses line segments connected to points located directly above class midpoint values but uses relative frequencies (proportions or percentages) for the vertical scale instead.

New cards
64

Pictographs

Drawings of objects. Data that are one-dimensional in nature (such as budget amounts) are often depicted with two-dimensional objects (such as dollar bills) or three-dimensional objects (such as stacks of dollar bills). By using pictographs, artists can create false impressions that grossly distort differences by using these simple principles of basic geometry.

New cards
65

Correlation

a relationship that exists between two variables when the values of one variable are somehow associated with the values of the other variable.

New cards
66

Linear Correlation

exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.

New cards
67

Scatter Plot

is a plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first variable (x), and the vertical axis is used for the second variable (y).

New cards
68

Linear Correlation Coefficient

is denoted by r, and it measures the strength of the linear association between two variables.

New cards
69

P-Value

is the probability of getting paired sample data with a linear correlation coefficient r that is at least as extreme as the one obtained from the paired sample data.

New cards
70

Regression Line

is the straight line that "best" fits the scatterplot of the data.

New cards
71

Descriptive Statistics

summarize or describe relevant characteristics of data

New cards
72

Inferential Statistics

used to make inferences or generalizations about a population

New cards
73

Measure of Center

used to measure the center of a data by finding the Mean, Median, Mode, and Midrange

New cards
74

Mean - (or arithmetic mean)

of a set of data is the measure of center found by adding all of the data values and dividing the total by the number of data values. Also known as the average

New cards
75

Resistant

if the presence of extreme values (outliers) does not cause it to change very much

New cards
76

Median

of a data set is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

New cards
77

Mode

of a data set is the value(s) that occur(s) with the greatest frequency.

New cards
78

Bimodal

When two data values occur with the same greatest frequency, each one is a mode

New cards
79

Multimodal

When more than two data values occur with the same greatest frequency, each is a mode

New cards
80

No mode

When no data value is repeated

New cards
81

Midrange

of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2

New cards
82

Variation

Describes the spread of data by finding values of range, variance, and standard deviation

New cards
83

Range

of a set of data values is the difference between the maximum data value and the minimum data value.

New cards
84

Standard Deviation

Sample = s, Population = σ. is a measure of how much data values deviate away from the mean.

New cards
85

Biased Estimator

which means that values of the sample standard deviation s do not tend to center around the value of the population standard deviation σ.

New cards
86

Unbiased Estimator

which means that values of s^2 tend to center around the value of σ^2 instead of systematically tending to overestimate or underestimate σ^2

New cards
87

Range Rule of Thumb

Subtract the smallest value in a dataset from the largest and divide the result by four to estimate the standard deviation.

New cards
88

Variance

of a set of values is a measure of variation equal to the square of the standard deviation.

New cards
89

Coefficient of Variation (or CV)

for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean

New cards
90

Z-Score (or standard score or standardized value)

is the number of standard deviations that a given value x is above or below the mean

New cards
91

Percentile

are measures of location, denoted which divide a set of data into 100 groups with about 1% of the values in each group

New cards
92

Quartiles

are measures of location, denoted and which divide a set of data into four groups with about 25% of the values in each group.

New cards
93

Boxplot (or box-and-whisker diagram)

is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3

New cards
94

Skewed

if the spread of data is not symmetric and extends more to one side than to the other.

New cards

Explore top notes

note Note
studied byStudied by 10 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
4.0(1)
note Note
studied byStudied by 5 people
... ago
4.0(1)
note Note
studied byStudied by 18 people
... ago
5.0(1)
note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 10 people
... ago
4.0(1)
note Note
studied byStudied by 23 people
... ago
5.0(1)
note Note
studied byStudied by 40070 people
... ago
4.8(312)

Explore top flashcards

flashcards Flashcard (201)
studied byStudied by 32 people
... ago
5.0(1)
flashcards Flashcard (64)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (22)
studied byStudied by 6 people
... ago
4.0(2)
flashcards Flashcard (42)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (91)
studied byStudied by 4 people
... ago
5.0(1)
flashcards Flashcard (35)
studied byStudied by 19 people
... ago
5.0(1)
flashcards Flashcard (32)
studied byStudied by 18 people
... ago
4.0(1)
flashcards Flashcard (45)
studied byStudied by 4 people
... ago
5.0(1)
robot