Chapter 1 – Intro to Statistics & Data

Data, Statistics, and Business Analytics

  • Data are “facts & figures”—numerical values or qualitative labels—that describe objects, people, events, or transactions.
  • Modern reality: datasets are frequently huge (“Big Data”) and stored/processed on computers or in the cloud.
  • Statistics is “a way to get information from data.” It involves:
    • Collecting raw observations.
    • Organising & cleaning them.
    • Visualising & summarising them.
    • Extracting insights that support decision-making.
  • Practical motivation questions raised in lecture:
    • Effect of advertising investment on subsequent sales.
    • Relationship between shelf location and cereal sales.
    • Delivery/logistics example: UPS tracks weight, destination, cost for every package—massive operational dataset.
  • Distinction between data and information:
    • Data = raw, unprocessed input.
    • Information = the “bigger picture” understanding produced after statistical processing.

Business Analytics

  • Definition (must memorise / underline):
    • Business analytics is the scientific process of transforming data into insights for making better business decisions.
  • Emphasis of course: analytics for business & economic decision-making.
  • Links to prior learning:
    • Builds on statistics, computer science, domain knowledge.
    • Ethical importance: data-driven decisions can affect pricing, job allocations, credit, etc.; requires responsible use.

Three Types of Data (Measurement Scales)

  1. Nominal
    • Qualitative / categorical; pure labels or names.
    • No arithmetic operations are meaningful (cannot add, multiply, etc.).
    • Examples: gender, product ID, cereal flavour.
  2. Ordinal
    • Still categorical but ordered / ranked.
    • Example scale: Excellent > Good > Fair > Poor.
    • Arithmetic still meaningless, yet order conveys preference/intensity.
  3. Interval (often called “ratio” or “quantitative” in some texts)
    • Numeric, equal intervals between points; full arithmetic operations are valid.
    • Can compute means, differences, ratios, etc.
    • Examples: age, income, weight, price.

Branches of Statistics

  • Descriptive Statistics
    • Summarise important aspects of a dataset (centre, spread, shape, patterns).
    • Tools include frequency tables, charts, measures such as mean xˉ\bar{x}, median, mode, variance s2s^2.
    • Focus for first chapters/weeks.
  • Inferential Statistics
    • Go beyond available data to draw conclusions about a population.
    • Use probability theory, confidence intervals, hypothesis tests.
    • Example hypothesis from lecture: “Average American Idol viewer age =23=23.” Test H<em>0:μ=23H<em>0: \mu=23 vs HA: \mu>23 using sample n=500n=500.

Core Terminology

  • Population
    • Entire set of items/individuals of interest (e.g.
      all California residents, all American Idol viewers).
  • Parameter
    • Numerical characteristic of a population (e.g.
      true mean age μ\mu, true proportion pp supporting an issue).
  • Sample
    • Subset of the population selected for analysis (e.g.
      5,000 California residents surveyed).
  • Statistic
    • Numerical measure computed from a sample (e.g.
      sample mean xˉ\bar{x}, sample proportion p^\hat{p}).
  • Relationship mnemonic:
    • “Statistic is to Sample as Parameter is to Population.”

Example Analyses Discussed

Example 1: California Economy Sentiment

  • Sample: 5,000 California residents (randomly selected).
  • Reported result: >55\% have positive view.
  • Because researcher generalises to all Californians, this is inferential statistics.

Example 2: CSUF Students & Statistics Enthusiasm

  • Sample: 5,000 CSUF students; 80 % excited about statistics.
  • Statement limited to sample—no population claim—thus descriptive statistics.

Example 3: American Idol Viewer Age

  • Population: All viewers of American Idol.
  • Variable: Age (interval/quantitative).
  • Sample: 500 viewers.
  • Statistic: Sample mean age xˉsample\bar{x}_{sample}.
  • Inference goal: Decide whether population mean age μ\mu differs from 2323 (producer’s hypothesis \mu>23).
  • Hypothesis test form once course completed:
    • H<em>0:μ=23H<em>0: \mu=23; H</em>A:μ23H</em>A: \mu \neq 23 (or >23).
    • Decision at α=0.05\alpha=0.05 ("5 % level of risk").

Goals & Best Practices in Data Collection

  • Main goal: Obtain a representative sample mirroring population characteristics.
  • Most common method: Simple Random Sampling (SRS)—every experimental unit has equal selection probability.
  • Pitfalls/ethical notes:
    • Bias if sampling frame incomplete or response rates differ.
    • Privacy and informed consent when gathering personal data.

Road-Map of the Course (as highlighted)

  1. Intro & foundational definitions (this lecture).
  2. Descriptive statistics: tables, charts, numerical summaries.
  3. Inferential statistics: estimation & hypothesis testing.
  4. Business applications / analytics cases.

Key Take-aways for Exams & Practice

  • Memorise core definitions (population, parameter, sample, statistic, business analytics).
  • Be able to classify data scales (nominal, ordinal, interval).
  • Distinguish descriptive vs inferential tasks given a scenario.
  • Remember mnemonic: StatisticSample, ParameterPopulationStatistic \rightarrow Sample,\ Parameter \rightarrow Population.
  • Understand role of randomness in producing representative samples.
  • Recognise real-world relevance: advertising ROI, shelf placement, logistics optimisation.