1/99
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
the science of collecting, presenting, analyzing, and interpreting data
Data
facts and information of interest
Analytics
the process of using data to make decisions; descriptive, predictive, and prescriptive
Statistical Inference
uses sample data and probability to draw general conclusions about a population
Data set
a collection of information that can be grouped by variables
Elements
the units on which data is collected
Variables
general characteristics of interest
Numerical (or Quantitative) variables
variables that are measurable and can be expressed as numbers
Categorical (or Qualitative) variables
variables that represent categories or groups
Nominal
label or name categorical variable - Favourite ice cream
Ordinal
Label or name AND can be ordered categorical variable - Preference for mint ice cream: dislike, somewhat like, love
Interval
has a fixed unit of measure; can find the difference between (or sum of) two data values; zero can be a meaningful measure numerical variable - the temperature of ice cream from different freezers
Ratio
characteristics of interval data AND also the product or quotient of two data values make sense; zero means non-existent numerical variable - the number of ounces of ice cream in sized containers
Available Data
Data that were produced in the past may be a cost-effective way to help answer a present question
Internal data
Personnel files, cash-flow reports, and inventory records are sources of available data
External data
sources may be available for free (library, internet, researchers) or for a fee (Bloomberg, Neilson Co.)
Newly Produced Data
Some questions may require data to be produced. Newly produced data can be obtained through the design of observational or experimental studies.
Observational Study
Elements are observed and variables of interest are measured. An observation is the set of measurements for a single element
Experimental Study
A treatment is deliberately imposed on the elements and their responses are measured
Cross-Sectional
Observational Studies and Experiments - if the data is collected at a single point in time
Time Series
Observational Studies and Experiments - if the data is collected over several time periods
Population
the collection of ALL elements of interest; the group in which we want to draw a conclusion about
Sample
a subset/part of the population used to gather information; gives insight about the population
Census
an attempt to gather information from every element in the population; usually impossible or too expensive - US Gov
Parameter
a number that describes a population; a fixed amount that is usually an unknown (unless you have a census); - The mean height of Purdue students
Statistic
Is a number that describes a sample; an amount that can vary depending on which sample is collected; estimates a parameter; - The mean height of a sample of 350 Purdue students
Conclusion
We can use the sample proportion to estimate the population proportion
Ordered List
Shows the least & greatest data value
Frequency Table
Shows the total counts for particular intervals
Relative Frequency Table
Shows the total count in a particular interval relative to the entire set of data; usually represented as a percent
Histogram
A display of a frequency or relative frequency table
Mean
the sum of all data values divided by the number of data values
Median
the midpoint of an ordered data set; ½ of the observations are below it and ½ are above it
Mode
the data value (or values) that occur the most
Range
Max-min: where max is maximum (largest) data value and min is the minimum (smallest) data value; range is a single number
Sample Variance
a measure of how different, on average, data values are from the mean
Sample Standard Deviation
Square root of sample variance
Interquartile Range
Q3-Q1 (Q = quartile)
Categorical Variable
Qualitative data represented by categories.
Numerical Variable
Quantitative data represented by numbers.
Bar Graph
Visual representation of categorical data.
Pie Chart
Circular chart showing proportions of categories.
Stemplot
Graph displaying quantitative data in stems and leaves.
Shape of Distribution
Describes the form of data distribution (bell/skewed).
Unusual Data
Identifies gaps or extreme values in data.
Mean (x̄)
Average value of a data set.
Median (M)
Middle value when data is ordered.
Spread
Variability or consistency of data values.
Standard Deviation (s)
Measures data variability around the mean.
Population Variance
Variance calculated for an entire population.
Interquartile Range (IQR)
Difference between Q3 and Q1.
Quartiles
Values dividing data into four equal parts.
Percentile
Value below which a percentage of data falls.
5-Number Summary
Minimum, Q1, Median, Q3, Maximum values.
Boxplot
Graphical summary of the 5-number summary.
Modified Boxplot
Boxplot that highlights outliers.
Outlier
Data point significantly different from others.
Z-score
Standardized score indicating distance from mean.
Empirical Rule
Describes data distribution in bell-shaped curves.
Skewness
Measure of asymmetry in data distribution.
Kurtosis
Measure of data distribution's peakedness.
Descriptive Statistics
Summarizes and describes characteristics of data.
z-scores
A measure of the number of standard deviations a data value is from the mean.
Random Experiment
A random process that generates well-defined outcomes.
Sample space, S
The set of all possible outcomes (or sample points).
Event
A collection of one or more outcomes (or sample points).
Probability
A measure of the likelihood that an event will occur.
Classical (Theoretical) Method
Assumes all outcomes are equally likely.
Relative Frequency Method
Conduct many trials of a random experiment to estimate probabilities.
Counting Rule
If a random experiment consists of a sequence of k steps with n1 outcomes for the 1st step, n2 outcomes for the 2nd step, ..., nk outcomes for the kth step, then there are (n1)(n2)...(nk) total outcomes for the experiment.
Factorial
A notation for showing a special product, n! = n × (n - 1) × (n - 2) × ... × 1.
Permutations
The arrangements of n things taken r at a time, where order matters.
Combinations
The selection of n things taken r at a time, where order does not matter.
Complement of An Event
The set of all outcomes in S that are not outcomes of event A.
Union of A and B
The set of all outcomes that belong to A or B or both, denoted A ∪ B.
Intersection of A and B
The set of all outcomes that belong to both A and B, denoted A ∩ B.
Venn Diagrams
Visual displays of the relationship of the outcomes of combined events and the sample space.
Addition Law for 2 events
P(A ∪ B) = P(A) + P(B) - P(A ∩ B).
Mutually Exclusive Events
Two events that do not share any outcomes and cannot occur at the same time.
Complementary events
Events that cannot both occur at the same time and share no outcomes.
Probability Statement
P(A) = the number of favorable outcomes / TOTAL number of outcomes.
P(A)
The probability of event A occurring.
P(A and B)
The probability that both events A and B occur.
P(A or B)
The probability that either event A or event B occurs.
P(not A)
The probability that event A does not occur.
P(G)
Probability that a home has a garage.
P(P)
Probability that a home has a swimming pool.
P(G and P)
Probability that a home has both a garage and a swimming pool.
P(G or P)
Probability that a home has either a garage or a swimming pool.
N
The total number of employees or items in a sample.
P(L)
Probability that an employee completed their work late.
P(D)
Probability that an employee's work was defective.
P(L and D)
Probability that an employee's work was both late and defective.
P(L or D)
Probability that an employee's work was either late or defective.
Joint Distribution
Probability distribution of two events occurring together.
Marginal Distribution
Distribution of a single event from joint distribution.
Joint Probability
Probability of two events happening simultaneously.
Marginal Probability
Probability of a single event occurring.
Conditional Probability
Probability of an event given another event has occurred.
Multiplication Law
P(A and B) = P(A) * P(B|A).