1

statistics

the science of using an assortment of methods to systematically collect, organize, summarize, analyze, and interpret information

2

descriptive statistics

tabular, graphical, or numerical summaries of data for a particular group

3

statistical inference

using data collected from a sample in order to make estimates and test hypotheses about the characteristics of a larger inference

4

population

the set of all elements of interest in a particular study

5

census

collecting data for the entire population

6

sample

subset of the population

7

sample survey

collecting data for a sample

8

data

information that we collect and analyze

9

data set

all of the data that is collected for a study

10

elements

the subjects of a study; entities on which data is collected

11

variable

attribute of the subjects/elements we are interested in studying

12

observation

set of all measurements collected for one subject/element

13

total data values

elements x variables

14

measurement scale

the nature of the values that are assigned to variables

15

nominal

categorical variables; does not indicate ranking (ex. gender, zip code)

16

ordinal

ranked data; distance between not equal/known (ex. socioeconomic class, TRACE scale)

17

interval

always numeric; distance between integers are equal; no absolute zero (0 doesn’t mean “absence of”) (ex. temperature, dress size)

18

ratio

lowest value is always zero; has an absolute zero (0 means “absence of”) (ex. age)

19

qualitative

nominal or ordinal; use words (or rank number) to describe subjects/elements

20

quantitative

interval or ratio; use numbers to describe subjects/elements

21

continuous

variables can take on values between whole numbers (fractions/decimals)

22

discrete

usually only take on whole number values (except shoe size)

23

experiment

a variable is specifically manipulated by the researcher

24

constant

a characteristic of elements/subjects that does not vary from one subject to the next

25

control variable

held constant in a research study by observing only one of its levels

26

observational research

levels of independent variable already exist (ex. gender, age); cannot make causal statements; looks for relationships between some set of variables

27

cross-sectional studies

provide a “snapshot” of different groups at one point in time

28

time series studies

longitudinal; use data that are collected on the same subjects/elements over several points in time; observe changes over time

29

effects

changes in data patterns

30

cyclical effect

any usual/consistent variation in daily, weekly, monthly, or annual data not related to change in season

31

seasonal effect

change in data that can be explained by/attributed to annual calendar-related events

32

irregular effect

any change in the data is not related to a regular cycle or season; caused by unusual events

33

business analytics

the use of data, tehcnology and statistical analysis to answer questions

34

descriptive analytics

use of data to understand past and current business performance

35

predictive analytics

use of historical data to identify patterns or relationships and to make predictions about what will happen in the future

36

prescriptive analytics

identify the best alternatives to minimize or maximize some objective

37

parameters

greek symbols representing descriptive measure of a population

38

sample statistics

roman letters representing descriptive measures of a sample

39

random sample

each member of population has an equal and independent chance of being included in the sample

40

sampling bias

when a sample is collected in a way that results in some members of the population being more or less likely to be included than others

41

raw data

data that has not been organized or summarized in any way

42

frequency distribution

the list of all frequencies for all categoreis

43

relative frequency distribution

the proportion of the observations that belong to a category f/n

44

percent frequency distribution

the percent of the observations that belong to a category f/n x 100

45

class intervals

data divided into sets with equal widths

46

class midpoint

the value half way between the upper and lower limit of an interval

47

cumulative frequency distribution

total number of items that have values less than or equal to the upper limit of each class

48

data vizualization

the process of displaying data meaningully in order to improve decision-making

49

dashboard

visually summarizes key business information

50

line charts

display data over time

51

pie chart

used to display relative frequency or percentage distributions

52

bar chart

used to visually present qualitative data; separated bars

53

histograms

display frequency distributions of quantitative variables; no spaces between bars

54

frequency polygon

points used to depict frequency for each class interval

55

scatter plot

displays relationship between two quantitative variables

56

trendline

depicts general direction of the relationship between variables

57

positive relationship

as x increases, y increases

58

negative relationship

as x increases, y decreases

59

symmetrical distribution

similar on both sides

60

negatively skewed distribution

skewed left; most data fall at upper end

61

positively skewed distribution

skewed right; most data fall at lower end

62

skewness

the measure of the symmetry of a data distribution

63

kurtosis

the measure of how peaked or flat a data distribution is relative to a normal distribution

64

excess kurtosis

kurtosis - 3

65

mesokurtic

kurtosis = 3, excess = 0

66

leptokurtic

kurtosis > 3, excess > 0

67

platykurtic

kurtosis < 3, excess <0

68

mildly skewed rule of thumb

skewness between -.5 and +.5

69

moderately skewed rule of thumb

skewness between -.5 and -1 or between +.5 and =1

70

highly skewed rule of thumb

skewness less than -1 or greater than +1

71

standard error calculation

plus or minus 3 times the standard error of skewness

72

mode

value with highest frequency of occurance; unaffected by outliers

73

median

the middlemost value when arranged in ascending order; not affected by outliers

74

median index

(n+1)/2

75

mean

best measure for normal data; can be pulled in direction of outliers

76

positive skew central tendencies

mode < Median < mean

77

negative skew central tendencies

mean < median < mode

78

mean of the means

if groups are the same size use ___

79

weighted mean

used to calculate the mean of two or more groups when their sample sizes are not equal

80

percentiles

divide rank-ordered data into 100 equal parts

81

quartiles

divide rank-ordered data into 4 equal parts

82

measures of variability

describe the spread of the data around the center

83

range

largest value - smallest value; sensitive to outliers

84

Interquartile Range

Q3-Q1; middle half of data

85

box and whisker plot

shows the center, the spread, and outliers of a data distribution

86

five number summary

smallest value(within inner fences), Q1, Q2, Q3, largest value(within inner fences)

87

inner fences

Q1 - (1.5 x IQR) Q3 + (1.5 x IQR)

88

outer fences

Q1 - (3 x IQR) Q3 + (3 x IQR)

89

middle 50% of data positively skewed

if median in box is on the left side; the ___

90

outer 50% of data positively skewed

if longest whisker is to the right of the box; the ___

91

deviation score

the distance from any score in teh data to the mean of the distribution

92

zero

deviation scores add up to ___

93

sum of squares

sum of squared deviation scores ; total variation in the data set

94

population variance

sum of squares divided by N; average variability

95

sample variance

sum of squares divided by n-1; average variability

96

standard deviation

square root of variance; average deviation around the mean of a distribution

97

degrees of freedom

the number of scores that are free to vary

98

empirical rule

applies to normal data; 68% between 1 standard deviation, 95% between 2 standard deviation, 99.7% between 3 standard deviation

99

chebyshev’s theorem

any data set (skewed); what percentage of data lie within k standard deviations of the mean (k>1)

100

coefficient of variation

how large the standard deviation is relative to the mean

