ISDS 361A MT1 Review

CHAPTER 1 — FOUNDATIONS

What is Statistics?
  • Statistics: Refers to a method for extracting information from data.

  • Business Analytics: Involves the analysis of data to make informed decisions.

  • Important note from Professor: "You will be asked this."

Population vs Sample
  • Population: Refers to the entire group, denoted by parameters:

    • Mean: ( \mu )

    • Standard Deviation: ( \sigma )

    • Proportion: ( p )

  • Sample: A subset of the population, characterized by statistics:

    • Sample Mean: ( \bar{x} )

    • Sample Standard Deviation: ( s )

    • Sample Proportion: ( \hat{p} )

  • Descriptive Statistics: Deals with summarizing and describing the features of a dataset.

  • Inferential Statistics: Involves making predictions or inferences about a population based on a sample.

Example of Population and Sample
  • Given a question: "A survey of 300 CSUF students found 60% support stadium."

    • Population: All CSUF students

    • Sample: 300 students surveyed

    • Statistic: 60% (proportion supporting the stadium)

    • Parameter: True percentage of all students who support the stadium

Random vs Non-Random Sampling
  • Random Sample: Each member of the population has an equal chance of being selected.

  • Non-Random Sampling: Refers to sampling methods that do not provide equal chances of selection, introducing biases.

  • Types of Non-Random Sampling:

    • Selection Bias: Occurs when the sample is not representative of the population.

    • Non-response Bias: Occurs when individuals selected to participate cannot or do not respond.

  • Important note: "Convenient sampling = selection bias."

CHAPTER 3 — DESCRIPTIVE STATISTICS

  • Important note from Professor: "Everything in Chapter 3 will be on exam."

1) Shape of Data
  • Symmetric Distribution: Mean = Median.

  • Boxplot: Displays the distribution of data based on a five-number summary (minimum, Q1, median, Q3, maximum).

Skewness
  • Skew Right (Positive):

    • Mean > Median

    • Tail on the right caused by large high outlier.

    • Example Data: 20, 25, 30, 35, 500 → Mean > Median → right skew.

  • Skew Left (Negative):

    • Mean < Median

    • Tail on the left often caused by extreme low values.

  • Key Point from Professor: "If mean and median are identical, then the distribution is symmetric." This will be on the exam.

2) Boxplot & Outliers
  • Outlier Detection Formula:

    • Lower Limit (LL): ( Q1 - 1.5 imes IQR )

    • Upper Limit (UL): ( Q3 + 1.5 imes IQR )

    • If a value is less than LL or greater than UL, it is considered an outlier.

  • Example Calculation:

    • Given: ( Q1=70, Q3=80, IQR=10 )

    • UL Calculation: ( UL = 80 + 15 = 95 )

    • Value 100 is an outlier.

3) Measures of Location
  • Mean: Average of values.

  • Median: The middle value; preferred for skewed data.

  • Mode: The most frequently occurring value; can be bimodal.

4) Measures of Variability
  • Range: ( ext{max} - ext{min} )

  • Interquartile Range (IQR): ( Q3 - Q1 )

  • Variance:

    • Population Variance: ( ext{VAR.P} )

    • Sample Variance: ( ext{VAR.S} )

  • Standard Deviation:

    • Population Standard Deviation: ( ext{STDEV.P} )

    • Sample Standard Deviation: ( ext{STDEV.S} )

  • Coefficient of Variation (CV): Defined as ( CV = \frac{SD}{Mean} ); useful when comparing the variability between different datasets.

    • Example:

    • Data Set 1: ( \mu=20,000, \sigma=5,000 \Rightarrow CV=0.25 )

    • Data Set 2: ( \mu=30,000, \sigma=7,000 \Rightarrow CV=0.233 )

    • Conclusion: Data Set 1 is more variable.

5) Z-Score
  • Formula: ( z = \frac{x - \mu}{\sigma} )

  • Interpretation:

    • Positive Z-Score: Above average value.

    • Negative Z-Score: Below average value.

    • Z-Score of Zero: Average value.

  • Example Calculation:

    • Given: ( \mu=75, \sigma=5, x=90 )

    • Z-Score Calculation: ( z=3 )

  • Important note: "This will stay with you."

6) Empirical Rule
  • Applies only to symmetric data:

    • ( \mu \pm 1\sigma ): Approximately 68% of data

    • ( \mu \pm 2\sigma ): Approximately 95% of data

    • ( \mu \pm 3\sigma ): Approximately 100% of data

  • Important note: "Only for symmetrical data."

7) Chebyshev’s Theorem
  • Applies to all datasets (symmetric or non-symmetric):

  • Formula: Percent ≥ ( 1 - \frac{1}{z^2} )

  • Example Calculation:

    • Using ( z=2 ): ( 1 - \frac{1}{4} = 75\%)

  • Important note: "You will see this on Cengage and exam."

CHAPTER 6 — NORMAL DISTRIBUTION

Normal Distribution Characteristics
  • Symmetric Shape: Mean = Median = Mode.

  • Random Variable: Denoted as ( X )

CORE EXCEL COMMANDS for Normal Distribution
  • Left-tail Probability:

    • ( ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )

    • Example: Find ( P(X<70) )

  • Right-tail Probability:

    • ( 1 - ext{NORM.DIST}(x, \mu, \sigma, \text{TRUE}) )

    • Example: Find ( P(X>90) )

  • Probability Between Two Values:

    • ( ext{NORM.DIST}( ext{high}, \mu, \sigma, \text{TRUE}) - ext{NORM.DIST}( ext{low}, \mu, \sigma, \text{TRUE}) )

  • Inverse Functions (Find X given a percentile):

    • Bottom ( p\%): ( ext{NORM.INV}(p, \mu, \sigma) )

    • Top ( p\%): ( ext{NORM.INV}(1 - p, \mu, \sigma) )

  • Important note from Professor: Repeated emphasis on the usefulness of these Excel commands when working with normal distributions.