Statistics Vocabulary Ch. 1-3

4.5(2)
studied byStudied by 9 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/93

flashcard set

Earn XP

Description and Tags

Statistics

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

94 Terms

1
New cards

Data

Collections of observations, such as measurements, genders, or survey responses

2
New cards

Statistics

The science of planning studies and experiments; obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data

3
New cards

Population

the complete collection of all measurements or data that are being considered

4
New cards

Census

the collection of data from every member of the population

5
New cards

Sample

Subcollection of members selected from a population

6
New cards

Voluntary Response Sample

one in which the respondents themselves decide whether to be included

7
New cards

Parameter

a numerical measurement describing some characteristic of a population

8
New cards

Statistic

a numerical measurement describing some characteristic of a sample

9
New cards

Quantitative Data

Data consisting of numbers representing counts or measurements

10
New cards

Qualitative (Categorial data)

Data consisting of names or labels (not numbers that represent counts or measurements)

11
New cards

Discrete Data

result when the data values are quantitative and the number of values is finite or "countable"

12
New cards

Continuous Data

result from infinitely many possible quantitative values, where the collection of values is not countable

13
New cards

Nominal Level of Measurement

characterized by data that consist of names, labels, or categories only. The data can not be arranged in an ordering scheme (such as low to high)

14
New cards

Ordinal Level of Measurement

data that can be arranged in some order, but differences (obtained by subtraction) between data values either can not be determined or are meaningless

15
New cards

Interval Level of Measurement

Data that can be arranged in order, and differences between data values can be found and are meaningful. Data at the _ Level does NOT have a natural zero starting point at which none of the quantity is present.

16
New cards

Ratio Level of Measurement

Data that can be arranged in order, differences can be found and are meaningful, and there IS a natural zero starting point

17
New cards

Big Data

Data sets that are too large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of _ may require software simultaneously running in parallel on many different computers

18
New cards

Data Science

Involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as sociology or finance).

19
New cards

Missing Completely at Random

A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set. That is, any data value is just as likely to be missing as any other data value.

20
New cards

Missing Not at Random

A data value is missing not at random if the missing value is related to the reason that it is missing.

21
New cards

Placebo

A harmless and ineffective pill, medicine, or procedure sometimes used for psychological benefit or sometimes used by researchers for comparison to other treatments

22
New cards

Experiment

in an experiment, we apply some treatment and then proceed to observe its effects on the individuals. (these individuals are referred to as experimental units, and often called subjects when they are people)

23
New cards

Observational Study

observe and measure specific characteristics, but we don't attempt to modify the individuals being studied

24
New cards

Replication

Repetition of an experiment on more than one individual

25
New cards

Blinding

Used when the subject doesn't know whether he or she is receiving a treatment or a placebo

26
New cards

Placebo Effect

Used when individuals are assigned to different groups through a process of random selection

27
New cards

Double Blinding

the act of blinding both the subjects of an experiment and the researchers who work with the subjects.

28
New cards

Confounding

occurs when we can see some effect, but we can not identify the specific factor that caused it.

29
New cards

Simple Random Sample

A sample of size n selected from the population in such a way that each possible sample of size n has an equal chance of being selected.

30
New cards

Random Sample

has a weaker requirement (as compared to a simple random sample) that all members of the population have the same chance of being selected

31
New cards

Systematic Sampling

we select some starting point and then select every kth (such as every 50th) element in the population

32
New cards

Convenience Sampling

we simply use data that is very easy to get

33
New cards

Stratified Sampling

we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender). Then we draw a sample from each subgroup

34
New cards

Cluster Sampling

we first divide the population area into sections (or clusters). Then we randomly select some of those clusters and choose all the members from those selected clusters.

35
New cards

Cross-Sectional Study

data are observed, measured, and collected at one point in time

36
New cards

Retrospective Study

data are collected from a past time period by going back in time (through examination of records, interviews, and so on)

37
New cards

Prospective (Longitudinal Study)

data are collected in the future from groups that share common factors

38
New cards

Sampling Error

occurs when the sample has been selected with a random method, but there is a discrepancy between a sample result and the true population result; such an error results from chance sample fluctuations

39
New cards

Non-Sampling Error

the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances

40
New cards

Nonrandom Sampling Error

the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample

41
New cards

Statistically significant result

one that is very unlikely to occur by chance

42
New cards

Lower Class Limit

End value of a class limit.

43
New cards

Upper Class Limits

Beginning value of a class limit.

44
New cards

Class Boundaries

the numbers used to separate the classes, but without the gaps created by class limits. (The numbers between classes, Ex. Class : 10 - 19 , boundaries = 9.5, 19.5 )

45
New cards

Class Midpoint

the values in the middle of the classes. (Upper Class Limit + Lower Class Limit / 2)

46
New cards

Class Width

the difference between two consecutive lower class limits

47
New cards

Frequency Table (Distribution)

shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them.

48
New cards

Relative Frequency Distribution

the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies.

49
New cards

Cumulative Frequency Distribution

A variation of the basic frequency distribution, in which the frequency for each class is the sum of the frequencies for that class and all previous classes.

50
New cards

Histogram

A graph used to show frequency distributions of data points of one variable. (Bar Graph that touches, Each bar sits within the boundaries of each class)

51
New cards

Relative Frequency Histogram

A Histogram that measures the vertical scale on Frequency Percentages % instead of #'s

52
New cards

Normal Distribution

a distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. (Bell Shaped)

53
New cards

Skewed Right Distribution

a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the right side

54
New cards

Skewed Left Distribution

a distribution that is not symmetrical and extends to one side more than to the other. The tail is on the left side.

55
New cards

Uniform Distribution

a type of distribution in which all different possible values occur with approximately the same frequency, so the heights of the bars in the histogram are approximately uniform

56
New cards

DotPlot

a graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values. Dots representing equal values are stacked.

57
New cards

Stem-and-Leaf Plot

represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit, 10's) and the leaf (such as the rightmost digit, 1's). Can reconstruct data sets from graph

58
New cards

Time-Series Graph

a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly.

59
New cards

Bar Graph

uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. Typically has spaces between bars

60
New cards

Pareto Chart

a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right. (NO spaces between bars)

61
New cards

Pie Chart

a very common graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category.

62
New cards

Frequency Polygon

uses line segments connected to points located directly above class midpoint values. A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars.

63
New cards

Relative Frequency Polygon

uses line segments connected to points located directly above class midpoint values but uses relative frequencies (proportions or percentages) for the vertical scale instead.

64
New cards

Pictographs

Drawings of objects. Data that are one-dimensional in nature (such as budget amounts) are often depicted with two-dimensional objects (such as dollar bills) or three-dimensional objects (such as stacks of dollar bills). By using pictographs, artists can create false impressions that grossly distort differences by using these simple principles of basic geometry.

65
New cards

Correlation

a relationship that exists between two variables when the values of one variable are somehow associated with the values of the other variable.

66
New cards

Linear Correlation

exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.

67
New cards

Scatter Plot

is a plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first variable (x), and the vertical axis is used for the second variable (y).

68
New cards

Linear Correlation Coefficient

is denoted by r, and it measures the strength of the linear association between two variables.

69
New cards

P-Value

is the probability of getting paired sample data with a linear correlation coefficient r that is at least as extreme as the one obtained from the paired sample data.

70
New cards

Regression Line

is the straight line that "best" fits the scatterplot of the data.

71
New cards

Descriptive Statistics

summarize or describe relevant characteristics of data

72
New cards

Inferential Statistics

used to make inferences or generalizations about a population

73
New cards

Measure of Center

used to measure the center of a data by finding the Mean, Median, Mode, and Midrange

74
New cards

Mean - (or arithmetic mean)

of a set of data is the measure of center found by adding all of the data values and dividing the total by the number of data values. Also known as the average

75
New cards

Resistant

if the presence of extreme values (outliers) does not cause it to change very much

76
New cards

Median

of a data set is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

77
New cards

Mode

of a data set is the value(s) that occur(s) with the greatest frequency.

78
New cards

Bimodal

When two data values occur with the same greatest frequency, each one is a mode

79
New cards

Multimodal

When more than two data values occur with the same greatest frequency, each is a mode

80
New cards

No mode

When no data value is repeated

81
New cards

Midrange

of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2

82
New cards

Variation

Describes the spread of data by finding values of range, variance, and standard deviation

83
New cards

Range

of a set of data values is the difference between the maximum data value and the minimum data value.

84
New cards

Standard Deviation

Sample = s, Population = σ. is a measure of how much data values deviate away from the mean.

85
New cards

Biased Estimator

which means that values of the sample standard deviation s do not tend to center around the value of the population standard deviation σ.

86
New cards

Unbiased Estimator

which means that values of s^2 tend to center around the value of σ^2 instead of systematically tending to overestimate or underestimate σ^2

87
New cards

Range Rule of Thumb

Subtract the smallest value in a dataset from the largest and divide the result by four to estimate the standard deviation.

88
New cards

Variance

of a set of values is a measure of variation equal to the square of the standard deviation.

89
New cards

Coefficient of Variation (or CV)

for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean

90
New cards

Z-Score (or standard score or standardized value)

is the number of standard deviations that a given value x is above or below the mean

91
New cards

Percentile

are measures of location, denoted which divide a set of data into 100 groups with about 1% of the values in each group

92
New cards

Quartiles

are measures of location, denoted and which divide a set of data into four groups with about 25% of the values in each group.

93
New cards

Boxplot (or box-and-whisker diagram)

is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3

94
New cards

Skewed

if the spread of data is not symmetric and extends more to one side than to the other.