Untitled Notes

Statistics: Refers to a method for extracting information from data.
Business Analytics: Involves the analysis of data to make informed decisions.
Important note from Professor: "You will be asked this."

Population: Refers to the entire group, denoted by parameters:
- Mean: ( \mu )
- Standard Deviation: ( \sigma )
- Proportion: ( p )
Sample: A subset of the population, characterized by statistics:
- Sample Mean: ( \bar{x} )
- Sample Standard Deviation: ( s )
- Sample Proportion: ( \hat{p} )
Descriptive Statistics: Deals with summarizing and describing the features of a dataset.
Inferential Statistics: Involves making predictions or inferences about a population based on a sample.

Given a question: "A survey of 300 CSUF students found 60% support stadium."
- Population: All CSUF students
- Sample: 300 students surveyed
- Statistic: 60% (proportion supporting the stadium)
- Parameter: True percentage of all students who support the stadium

Random Sample: Each member of the population has an equal chance of being selected.
Non-Random Sampling: Refers to sampling methods that do not provide equal chances of selection, introducing biases.
Types of Non-Random Sampling:
- Selection Bias: Occurs when the sample is not representative of the population.
- Non-response Bias: Occurs when individuals selected to participate cannot or do not respond.
Important note: "Convenient sampling = selection bias."

Symmetric Distribution: Mean = Median.
Boxplot: Displays the distribution of data based on a five-number summary (minimum, Q1, median, Q3, maximum).

Skew Right (Positive):
- Mean > Median
- Tail on the right caused by large high outlier.
- Example Data: 20, 25, 30, 35, 500 → Mean > Median → right skew.
Skew Left (Negative):
- Mean < Median
- Tail on the left often caused by extreme low values.
Key Point from Professor: "If mean and median are identical, then the distribution is symmetric." This will be on the exam.

Outlier Detection Formula:
- Lower Limit (LL): ( Q1 - 1.5 imes IQR )
- Upper Limit (UL): ( Q3 + 1.5 imes IQR )
- If a value is less than LL or greater than UL, it is considered an outlier.
Example Calculation:
- Given: ( Q1=70, Q3=80, IQR=10 )
- UL Calculation: ( UL = 80 + 15 = 95 )
- Value 100 is an outlier.

Range: ( ext{max} - ext{min} )
Interquartile Range (IQR): ( Q3 - Q1 )
Variance:
- Population Variance: ( ext{VAR.P} )
- Sample Variance: ( ext{VAR.S} )
Standard Deviation:
- Population Standard Deviation: ( ext{STDEV.P} )
- Sample Standard Deviation: ( ext{STDEV.S} )
Coefficient of Variation (CV): Defined as ( CV = \frac{SD}{Mean} ); useful when comparing the variability between different datasets.
- Example:
- Data Set 1: ( \mu=20,000, \sigma=5,000 \Rightarrow CV=0.25 )
- Data Set 2: ( \mu=30,000, \sigma=7,000 \Rightarrow CV=0.233 )
- Conclusion: Data Set 1 is more variable.

Formula: ( z = \frac{x - \mu}{\sigma} )
Interpretation:
- Positive Z-Score: Above average value.
- Negative Z-Score: Below average value.
- Z-Score of Zero: Average value.
Example Calculation:
- Given: ( \mu=75, \sigma=5, x=90 )
- Z-Score Calculation: ( z=3 )
Important note: "This will stay with you."

Applies only to symmetric data:
- ( \mu \pm 1\sigma ): Approximately 68% of data
- ( \mu \pm 2\sigma ): Approximately 95% of data
- ( \mu \pm 3\sigma ): Approximately 100% of data
Important note: "Only for symmetrical data."

Left-tail Probability:
- ( ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )
- Example: Find ( P(X<70) )
Right-tail Probability:
- ( 1 - ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )
- Example: Find ( P(X>90) )
Probability Between Two Values:
- ( ext{NORM.DIST}( ext{high}, \mu, \sigma, \text{TRUE}) - ext{NORM.DIST}( ext{low}, \mu, \sigma, \text{TRUE}) )
Inverse Functions (Find X given a percentile):
- Bottom ( p\%): ( ext{NORM.INV}(p, \mu, \sigma) )
- Top ( p\%): ( ext{NORM.INV}(1 - p, \mu, \sigma) )
Important note from Professor: Repeated emphasis on the usefulness of these Excel commands when working with normal distributions.