1/81
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Population
The set of all items of interest in a statistical problem
Ex: Houses in Roanoke
Parameter
A descriptive measure of a population
Ex: Mean (average) appraised value of all houses
Sample
A set of items drawn from a population
Ex: 100 randomly selected houses
Statistic
A descriptive measure of a sample
Ex: Mean appraised value of selected homes
Statistical Inference
The process of making an estimate, prediction, or decision based upon sample data
Qualitative
Categorical
Ex: Brand names
Quantitative
Numbers
Ex: Number of bedrooms
Cross-Sectional
Observations in a sample that are collected at the same time
Time Series
Data that is collected at different points in time
N
Size of population
n
Size of sample
μ
Population mean
x̄
Sample mean
σ²
Population variance
σ
Population standard deviation
s²
Sample variance
s
Sample standard deviation
Measures of Central Tendency
Mean, Median, Mode
Mean
Average
Median
The middle number
Percentiles
By 1’s??
Ex: Score in the 95th percentile? Your score is ≥ 95% of other scores in the group. This does not mean you made a 95%
Mode
Most frequently occurring number
CAN BE BIMODAL OR MULTIMODAL, DO NOT BE FOOLED
Measures of Dispersion
Range, Standard Deviation, and Variance
Dispersion is also known as
The spread or range of variability
Range
High minus low
A normal bell curve means
Mean = Median = Mode, tails are asymptotic, kurtosis is 0
Asymptotic
Close to the horizontal axis (x-axis) but never reach it
Standard Deviation
The standardized measure of distance from the mean, the positive square root of the variance
What percent of cases fall within one standard deviation from the mean?
68%
What percent of cases fall within two standard deviations from the mean?
97%
Shape of Data
Skewness and Kurtosis
Skewness
Measures the asymmetry of data
Positive Skew
Right skewed/longer right tail
Negative Skew
Left skewed/longer left tail
Kurtosis
Measures the peakedness of the distribution of data
High Kurtosis
Leptokurtic
Data has more outliers
Low Kurtosis
Platykurtic
Data has fewer outliers, values are more spread out evenly
Normal Kurtosis
Mesokurtic, data has normal distribution with a moderate number of outliers
What is the goal of graphing?
Presentation of descriptive statistics
Presentation of evidence
Some people understand better with visual aids
Provides a sense of the underlying data generating process (scatter-plots)
Statistical Studies
Observational, experimental
Observational
No attempt is made to control or influence the variables of interest
Experimental
Conducted under controlled conditions, provides more information compared to data obtained from existing sources/observational studies
Considerations on Data Acquisition
Time requirement
Cost of acquisition
Data errors
Analytics
The scientific process of transforming data into insight for making better decisions
Descriptive Analytics
What has happened in the past
Predictive Analytics
Models constructed from past data to predict the future or to asses the impact of one variable on another
Prescriptive Analytics
Yield the best course of action
The three V’s
Volume, Velocity, Variety
Volume
The amount of available data
Velocity
The speed at which data is collected and processed
Variety
Different data types
Unethical Behavior in Statistical Study
Improper sampling
Inappropriate analysis of the data
Development of misleading graphs
Use of inappropriate summary statistics
Biased interpretation of the statistical results
Quartile
Q1 = 25th percentile
Q2 = 50th percentile
Q3 = 75th percentile
Interquartile Range
Q3-Q1
Variance
Based on the difference between the value of each observation and the mean
Coefficient of Variation
Usually expressed as a percentage, measures how large the standard deviation is relative to the mean, (Standard deviation / mean) * 100%
Covariance
Measures how two variables change together in a linear way (there will be a direction indicating the relationship)
Correlation Coefficient
A measure of the relationship between x and y that is not affected by the units of measurement, CORRELATION IS NOT CAUSATION
The hypothesis test will be two tailed if
Null and alt are equal and inequal
The hypothesis test will be one tailed if
The null and alt are >< or equal to
Type I Error
Rejecting null when it is true, false positive
Level of significance
The probability of making a Type I Error
Type II Error
Accepting null when it is false, false negative
p-value
The probability of observing results as extreme as the sample, assuming null is true
p is less than or equal to a
Reject null
p is greater than a
Fail to reject null
Confidence Interval
A range of values that is used to estimate an unknown population parameter
t-test
Compares means of two sets of data and notes if there is an observable difference between the two
t is
an observed difference / standard error
p = 0.05
5% chance of making a Type I Error, 95% confidence
p = 0.01
1% chance of making a Type I Error, 99% confidence
The type of tail is determined by
Alternative hypothesis
t equation
t equals x bar minus mu divided by standard deviation divided by the square root of the sample number (n)
Degrees of Freedom (df)
Measures how much independent information is available to estimate variability
df is generally
n - 1
Random/Simple Random Sampling
Equal chances for all, unbiased, most widely used
Stratified Sampling
Ensures all groups are represented
Quota
Quick, non-random, guarantees proportions, based on convenience and faster/cheaper, non-probability
Purposive
Participants with specific traits or expertise, informative but subjective (non-probability)
Cluster
Efficient for large, spread out populations
Systematic
Regular interval selection, simple to apply, selecting every kth individual
Sampling
The process of selecting a subset of indivs or items from a pop to estimate characteristics of the whole, studying the entire population is usually impractical and costly