ISDS 361A MT1 Review
CHAPTER 1 — FOUNDATIONS
What is Statistics?
Statistics: Refers to a method for extracting information from data.
Business Analytics: Involves the analysis of data to make informed decisions.
Important note from Professor: "You will be asked this."
Population vs Sample
Population: Refers to the entire group, denoted by parameters:
Mean: ( \mu )
Standard Deviation: ( \sigma )
Proportion: ( p )
Sample: A subset of the population, characterized by statistics:
Sample Mean: ( \bar{x} )
Sample Standard Deviation: ( s )
Sample Proportion: ( \hat{p} )
Descriptive Statistics: Deals with summarizing and describing the features of a dataset.
Inferential Statistics: Involves making predictions or inferences about a population based on a sample.
Example of Population and Sample
Given a question: "A survey of 300 CSUF students found 60% support stadium."
Population: All CSUF students
Sample: 300 students surveyed
Statistic: 60% (proportion supporting the stadium)
Parameter: True percentage of all students who support the stadium
Random vs Non-Random Sampling
Random Sample: Each member of the population has an equal chance of being selected.
Non-Random Sampling: Refers to sampling methods that do not provide equal chances of selection, introducing biases.
Types of Non-Random Sampling:
Selection Bias: Occurs when the sample is not representative of the population.
Non-response Bias: Occurs when individuals selected to participate cannot or do not respond.
Important note: "Convenient sampling = selection bias."
CHAPTER 3 — DESCRIPTIVE STATISTICS
Important note from Professor: "Everything in Chapter 3 will be on exam."
1) Shape of Data
Symmetric Distribution: Mean = Median.
Boxplot: Displays the distribution of data based on a five-number summary (minimum, Q1, median, Q3, maximum).
Skewness
Skew Right (Positive):
Mean > Median
Tail on the right caused by large high outlier.
Example Data: 20, 25, 30, 35, 500 → Mean > Median → right skew.
Skew Left (Negative):
Mean < Median
Tail on the left often caused by extreme low values.
Key Point from Professor: "If mean and median are identical, then the distribution is symmetric." This will be on the exam.
2) Boxplot & Outliers
Outlier Detection Formula:
Lower Limit (LL): ( Q1 - 1.5 imes IQR )
Upper Limit (UL): ( Q3 + 1.5 imes IQR )
If a value is less than LL or greater than UL, it is considered an outlier.
Example Calculation:
Given: ( Q1=70, Q3=80, IQR=10 )
UL Calculation: ( UL = 80 + 15 = 95 )
Value 100 is an outlier.
3) Measures of Location
Mean: Average of values.
Median: The middle value; preferred for skewed data.
Mode: The most frequently occurring value; can be bimodal.
4) Measures of Variability
Range: ( ext{max} - ext{min} )
Interquartile Range (IQR): ( Q3 - Q1 )
Variance:
Population Variance: ( ext{VAR.P} )
Sample Variance: ( ext{VAR.S} )
Standard Deviation:
Population Standard Deviation: ( ext{STDEV.P} )
Sample Standard Deviation: ( ext{STDEV.S} )
Coefficient of Variation (CV): Defined as ( CV = \frac{SD}{Mean} ); useful when comparing the variability between different datasets.
Example:
Data Set 1: ( \mu=20,000, \sigma=5,000 \Rightarrow CV=0.25 )
Data Set 2: ( \mu=30,000, \sigma=7,000 \Rightarrow CV=0.233 )
Conclusion: Data Set 1 is more variable.
5) Z-Score
Formula: ( z = \frac{x - \mu}{\sigma} )
Interpretation:
Positive Z-Score: Above average value.
Negative Z-Score: Below average value.
Z-Score of Zero: Average value.
Example Calculation:
Given: ( \mu=75, \sigma=5, x=90 )
Z-Score Calculation: ( z=3 )
Important note: "This will stay with you."
6) Empirical Rule
Applies only to symmetric data:
( \mu \pm 1\sigma ): Approximately 68% of data
( \mu \pm 2\sigma ): Approximately 95% of data
( \mu \pm 3\sigma ): Approximately 100% of data
Important note: "Only for symmetrical data."
7) Chebyshev’s Theorem
Applies to all datasets (symmetric or non-symmetric):
Formula: Percent ≥ ( 1 - \frac{1}{z^2} )
Example Calculation:
Using ( z=2 ): ( 1 - \frac{1}{4} = 75\%)
Important note: "You will see this on Cengage and exam."
CHAPTER 6 — NORMAL DISTRIBUTION
Normal Distribution Characteristics
Symmetric Shape: Mean = Median = Mode.
Random Variable: Denoted as ( X )
CORE EXCEL COMMANDS for Normal Distribution
Left-tail Probability:
( ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )
Example: Find ( P(X<70) )
Right-tail Probability:
( 1 - ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )
Example: Find ( P(X>90) )
Probability Between Two Values:
( ext{NORM.DIST}( ext{high}, \mu, \sigma, \text{TRUE}) - ext{NORM.DIST}( ext{low}, \mu, \sigma, \text{TRUE}) )
Inverse Functions (Find X given a percentile):
Bottom ( p\%): ( ext{NORM.INV}(p, \mu, \sigma) )
Top ( p\%): ( ext{NORM.INV}(1 - p, \mu, \sigma) )
Important note from Professor: Repeated emphasis on the usefulness of these Excel commands when working with normal distributions.