Chapter 1 – Intro to Statistics & Data

Data are “facts & figures”—numerical values or qualitative labels—that describe objects, people, events, or transactions.
Modern reality: datasets are frequently huge (“Big Data”) and stored/processed on computers or in the cloud.
Statistics is “a way to get information from data.” It involves:
- Collecting raw observations.
- Organising & cleaning them.
- Visualising & summarising them.
- Extracting insights that support decision-making.
Practical motivation questions raised in lecture:
- Effect of advertising investment on subsequent sales.
- Relationship between shelf location and cereal sales.
- Delivery/logistics example: UPS tracks weight, destination, cost for every package—massive operational dataset.
Distinction between data and information:
- Data = raw, unprocessed input.
- Information = the “bigger picture” understanding produced after statistical processing.

Definition (must memorise / underline):
- Business analytics is the scientific process of transforming data into insights for making better business decisions.
Emphasis of course: analytics for business & economic decision-making.
Links to prior learning:
- Builds on statistics, computer science, domain knowledge.
- Ethical importance: data-driven decisions can affect pricing, job allocations, credit, etc.; requires responsible use.

Nominal
- Qualitative / categorical; pure labels or names.
- No arithmetic operations are meaningful (cannot add, multiply, etc.).
- Examples: gender, product ID, cereal flavour.
Ordinal
- Still categorical but ordered / ranked.
- Example scale: Excellent > Good > Fair > Poor.
- Arithmetic still meaningless, yet order conveys preference/intensity.
Interval (often called “ratio” or “quantitative” in some texts)
- Numeric, equal intervals between points; full arithmetic operations are valid.
- Can compute means, differences, ratios, etc.
- Examples: age, income, weight, price.

Descriptive Statistics
- Summarise important aspects of a dataset (centre, spread, shape, patterns).
- Tools include frequency tables, charts, measures such as mean $\bar{x}$ , median, mode, variance $s^2$ .
- Focus for first chapters/weeks.
Inferential Statistics
- Go beyond available data to draw conclusions about a population.
- Use probability theory, confidence intervals, hypothesis tests.
- Example hypothesis from lecture: “Average American Idol viewer age $=23$ .” Test $H<em>0: \mu=23$ vs HA: \mu>23 using sample $n=500$ .

Population
- Entire set of items/individuals of interest (e.g.
  all California residents, all American Idol viewers).
Parameter
- Numerical characteristic of a population (e.g.
  true mean age $\mu$ , true proportion $p$ supporting an issue).
Sample
- Subset of the population selected for analysis (e.g.
  5,000 California residents surveyed).
Statistic
- Numerical measure computed from a sample (e.g.
  sample mean $\bar{x}$ , sample proportion $\hat{p}$ ).
Relationship mnemonic:
- “Statistic is to Sample as Parameter is to Population.”

Sample: 5,000 California residents (randomly selected).
Reported result: >55\% have positive view.
Because researcher generalises to all Californians, this is inferential statistics.

Sample: 5,000 CSUF students; 80 % excited about statistics.
Statement limited to sample—no population claim—thus descriptive statistics.

Population: All viewers of American Idol.
Variable: Age (interval/quantitative).
Sample: 500 viewers.
Statistic: Sample mean age $\bar{x}_{sample}$ .
Inference goal: Decide whether population mean age $\mu$ differs from $23$ (producer’s hypothesis \mu>23).
Hypothesis test form once course completed:
- $H<em>0: \mu=23$ ; $H</em>A: \mu \neq 23$ (or >23).
- Decision at $\alpha=0.05$ ("5 % level of risk").

Main goal: Obtain a representative sample mirroring population characteristics.
Most common method: Simple Random Sampling (SRS)—every experimental unit has equal selection probability.
Pitfalls/ethical notes:
- Bias if sampling frame incomplete or response rates differ.
- Privacy and informed consent when gathering personal data.

Memorise core definitions (population, parameter, sample, statistic, business analytics).
Be able to classify data scales (nominal, ordinal, interval).
Distinguish descriptive vs inferential tasks given a scenario.
Remember mnemonic: $Statistic \rightarrow Sample,\ Parameter \rightarrow Population$ .
Understand role of randomness in producing representative samples.
Recognise real-world relevance: advertising ROI, shelf placement, logistics optimisation.