Looks like no one added any tags here yet for you.
Analytics
the scientific process of transforming data into insight for making better decisions
Descriptive Analytics
Analytical techniques that describe what has happened in the past.
Predictive Analytics
Analytical techniques that use models constructed from past data to predict the future or assess the impact of one variable on another.
Prescriptive Analytics
Analytical techniques that yield a course of action.
Big Data
A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time. characterized by 4 v's: volume, velocity, variety, and veracity.
Data Mining
The process of using procedures from statistics and computer science to extract useful information from extremely large databases.
Frequency Distribution
A tabular summary of data showing the number (frequency) of observations in each of several nonoverlapping categories or classes.
Relative Frequency Distribution
A tabular summary of data showing the fraction or proportion of observations in each of several nonoverlapping categories or classes.
Percent Frequency Distribution
A tabular summary of data showing the percentage of observations in each of several nonoverlapping classes.
Bar Chart
A graphical device for depicting categorical data that have been summarized in a frequency, relative frequency, or percent frequency distribution.
Pie Chart
A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class.
Class Midpoint
the value halfway between the lower and upper class limits
Dot Plot
a graphical device that summarizes data by the number of dots above each data value on the horizontal axis
Histogram
A graphical display of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis.
Cumulative Frequency Distribution
A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class.
Cumulative Relative Frequency Distribution
A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class.
Cumulative Percent Frequency Distribution
A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class.
Stem-and-leaf Display
a graphical display used to show simultaneously the rank order and shape of a distribution of data
Crosstabulation
A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns.
Simpson's Paradox
Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation.
Scatter Diagram
A graphical display of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
Trendline
a line that provides an approximation of the relationship between two variables
Side-by-side Bar Chart
a graphical display for depicting multiple bar charts on the same display
Stacked Bar Chart
A bar chart in which each bar is broken into rectangular segments of a different color showing the relative frequency of each class in a manner similar to a pie chart.
Mean
a measure of central location computed by summing the data values and dividing by the number of observations.
Excel: Mean
=AVERAGE()
Median
A measure of central location provided by the value in the middle when the data are arranged in ascending order.
Excel: median
=MEDIAN()
Mode
A measure of location, defined as the value that occurs with greatest frequency.
Excel: mode
=MODE.SNGL()
Weighted Mean
the mean obtained by assigning each observation a weight that reflects its importance
Geometric Mean
A measure of location that is calculated by finding the nth root of the product of n values.
Excel: geometric mean
=GEOMEAN()
Percentile
A value that provides information about how the data are spread over the interval from the smallest to the largest value.
Excel: Percentile
=PERCENTILE.EXC(A1:AX, k) where x is the last row of column A and k is the percentile value you are looking for.
Quartiles
The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data.
Excel: Quartiles
=QUARTILE.EXC(A1:AX, quart) where X is the last row of column A.
Range
A measure of variability, defined to be the largest value minus the smallest value.
Excel: Range
=MAX(A1,AX)-MIN(A1,AX)
Interquartile Range (IQR)
A measure of variability, defined to be the difference between the third and first quartiles.
Excel: IQR
=QUARTILE.EXC (A1:AX, 3) - QUARTILE.EXC (A1:AX, 1)
Variance
A measure of variability based on the squared deviations of the data values about the mean
Excel: Variance
=VAR.S()
Standard Deviation
A measure of variability computed by taking the positive square root of the variance.
Excel: Standard Deviation
=STDEV.S()
Coefficient of Variation
A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100.
Excel: Coefficient of Variation
=STDEV.S()/AVERAGE()*100
Skewness
A measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right result in positive skewness.
Z-score
A value computed by dividing the deviation about the mean by the standard deviation s. referred to as a standardized value and denotes the number of standard deviations is from the mean.
Chebyshev's Theorem
A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean.
~ At least 75% of the data values must be within z=2 SD's of the mean
~ At least 89% of the data values must be within z=3 SD's of the mean
~ At least 94% of the data values must be within z=4 SD's of the mean
Empirical Rule
A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell-shaped distribution.
Outliers
An unusually small or unusually large data value.
Excel: Outliers
=QUARTILE.EXC(A1:AX, 1) - 1.5 * (QUARTILE.EXC9A1:AX, 3)-QUARTILE.EXC(A1:AX,1))
and
=QUARTILE.EXC(A1:AX, 3) + 1.5 * (QUARTILE.EXC9A1:AX, 3)-QUARTILE.EXC(A1:AX,1))
Covarience
a measure of the linear association between two variables. Positive=positive relationship, vice versa
Correlation Coefficient
a measure of linear association between two variables that takes on values between -1 and +1.
Excel: Correlation Coefficient
=CORREL(array 1, array 2)
Experiment
a process that generates well-defined outcomes
Sample Space
the set of all experimental outcomes
Sample Point
An element of the sample space. represents an experimental outcome.
Multiple-step Experiments
an experiment that can be described as a sequence of steps. (n1)(n2)...(nk)
Tree Diagram
a graphical representation that helps in visualizing a multiple-step experiment
Combinations
In an experiment we may be interested in determining the number of ways n objects may be selected from among N objects without regard to the order in which the n objects are selected.
Excel: Combinations
=COMBIN()
Factorial
the product of an integer and all the integers below it !
Excel: Factorial
=FACT()
Permutations
In an experiment we may be interested in determining the number of ways n objects may be selected from among N objects WHEN the order in which the n objects are selected is important.
Basic requirements for assigning probabilities:
1. The probability assigned to each experimental outcome must be between 0 and 1
2. The sum of the probabilities for all experimental outcomes must equal 1
Classical Method
a method of assigning probabilities that is appropriate when all the experimental outcomes are equally likely. If n experimental outcomes are possible, a probability of 1/n is assigned to each experimental outcome.
Relative Frequency Method
a method of assigning probabilities that is appropriate when data are available to estimate the proportion of the time the experimental outcome will occur if the experiment is repeated a large number of times
Subjective Method
a method of assigning probabilities on the basis of judgment
Complement of A
the event consisting of all sample points that are not in A; denoted as Ac
Venn Diagram
a graphical representation for showing symbolically the sample space and operations involving events in which the sample space is represented by a rectangle and events are represented as circles within the sample space
Union of A and B
the event containing all sample points belonging to A or B or both. Denoted as AuB
Intersection of A and B
The event containing the sample points belonging to both A and B. The intersection is denoted A ∩ B.
Addition Law
a probability law used to compute the probability of the union of two events.
P(A ∩ B)= P(a)+P(B) - P(A u B)
Mutually Exclusive Events
Events that have no sample points in common; that is, A n B is empty and P(A n B) = 0.
Conditional Probability
the probability of an event given that another event already occurred.
P(A | B) = P(A ∩ B) P(B)
Joint Probabilities
The probability of two events both occurring; that is, the probability of the intersection of two events.
Marginal Probabilities
the values in the margins of a joint probability table that provide the probabilities of each event separately
Independent Events
Two events A and B where P(A | B) = P(A) or P(B | A) = P(B); that is, the events have no influence on each other.
Multiplication Law
a probability law used to compute the probability of the intersection of two events. It is P(A ∩ B) = P(B)P(A | B) or P(A ∩ B) = P(A)P(B | A). For independent events it reduces to, P(A ∩ B) = P(A)P(B)
Random Variable
a numerical description of the outcome of an experiment
Discrete Random Variable
a random variable that may assume either a finite number of values or an infinite sequence of values
Continuous Random Variable
a random variable that may assume any numerical value in an interval or collection of intervals
Probabilty Distribution
A description of how the probabilities are distributed over the values of a random variable.
Probability Function
a function, denoted by f(x), that provides the probability that x assumes a particular value for a discrete random variable
Empirical Discrete Distribution
a discrete probability distribution for which the relative frequency method is used to assign the probabilities
Discrete Uniform Probability Distribution
a probability distribution for which each possible value of the random variable has the same probability
Expected Value
A measure of the central location, or mean, of a random variable.
Variance
A measure of the variability, or dispersion, of a random variable.
Excel: Variance
=SUMPRODUCT(A1:AX, B1:BX)
bivariate probability distribution
A probability distribution involving two random variables. provides a probability for each pair of values that may occur for the two random variables.
Binomial Experiment
An experiment with the following characteristics:
(1) The experiment consists of a fixed number of trials denoted by n
(2) each trial has only two possible outcomes, a success or a failure
(3) the probability of success and the probability of failure are constant throughout the experiment
(4) each trial is independent of any other trial in the experiment.
Binomial Probability Function
the function used to compute binomial probabilities
Excel: Binomial Probability Function
=BINOM.DIST(X, N, P, TRUE/FALSE cumulative)
Poisson Probability Distribution
a probability distribution showing the probability of x occurrences of an event over a specified interval of time or space
Excel: Poisson Probability Distribution
=POISSION.DIST (x, mean, TRUE/FALSE cumulative)
Hypergeometric probability distribution
a probability distribution showing the probability of x successes in n trials from a population with r successes and N-r failures
Hypergeometric probability function
the function used to compute hypergeometric probabilities
Excel: Hypergeometric probability function
=HYPGEOM.DIST(x, n, r, N, TRUE/FALSE cumulative)