Chapter 1 – Intro to Statistics & Data
Data, Statistics, and Business Analytics
- Data are “facts & figures”—numerical values or qualitative labels—that describe objects, people, events, or transactions.
- Modern reality: datasets are frequently huge (“Big Data”) and stored/processed on computers or in the cloud.
- Statistics is “a way to get information from data.” It involves:
- Collecting raw observations.
- Organising & cleaning them.
- Visualising & summarising them.
- Extracting insights that support decision-making.
- Practical motivation questions raised in lecture:
- Effect of advertising investment on subsequent sales.
- Relationship between shelf location and cereal sales.
- Delivery/logistics example: UPS tracks weight, destination, cost for every package—massive operational dataset.
- Distinction between data and information:
- Data = raw, unprocessed input.
- Information = the “bigger picture” understanding produced after statistical processing.
Business Analytics
- Definition (must memorise / underline):
- Business analytics is the scientific process of transforming data into insights for making better business decisions.
- Emphasis of course: analytics for business & economic decision-making.
- Links to prior learning:
- Builds on statistics, computer science, domain knowledge.
- Ethical importance: data-driven decisions can affect pricing, job allocations, credit, etc.; requires responsible use.
Three Types of Data (Measurement Scales)
- Nominal
- Qualitative / categorical; pure labels or names.
- No arithmetic operations are meaningful (cannot add, multiply, etc.).
- Examples: gender, product ID, cereal flavour.
- Ordinal
- Still categorical but ordered / ranked.
- Example scale: Excellent > Good > Fair > Poor.
- Arithmetic still meaningless, yet order conveys preference/intensity.
- Interval (often called “ratio” or “quantitative” in some texts)
- Numeric, equal intervals between points; full arithmetic operations are valid.
- Can compute means, differences, ratios, etc.
- Examples: age, income, weight, price.
Branches of Statistics
- Descriptive Statistics
- Summarise important aspects of a dataset (centre, spread, shape, patterns).
- Tools include frequency tables, charts, measures such as mean xˉ, median, mode, variance s2.
- Focus for first chapters/weeks.
- Inferential Statistics
- Go beyond available data to draw conclusions about a population.
- Use probability theory, confidence intervals, hypothesis tests.
- Example hypothesis from lecture: “Average American Idol viewer age =23.” Test H<em>0:μ=23 vs HA: \mu>23 using sample n=500.
Core Terminology
- Population
- Entire set of items/individuals of interest (e.g.
all California residents, all American Idol viewers).
- Parameter
- Numerical characteristic of a population (e.g.
true mean age μ, true proportion p supporting an issue).
- Sample
- Subset of the population selected for analysis (e.g.
5,000 California residents surveyed).
- Statistic
- Numerical measure computed from a sample (e.g.
sample mean xˉ, sample proportion p^).
- Relationship mnemonic:
- “Statistic is to Sample as Parameter is to Population.”
Example Analyses Discussed
Example 1: California Economy Sentiment
- Sample: 5,000 California residents (randomly selected).
- Reported result: >55\% have positive view.
- Because researcher generalises to all Californians, this is inferential statistics.
Example 2: CSUF Students & Statistics Enthusiasm
- Sample: 5,000 CSUF students; 80 % excited about statistics.
- Statement limited to sample—no population claim—thus descriptive statistics.
Example 3: American Idol Viewer Age
- Population: All viewers of American Idol.
- Variable: Age (interval/quantitative).
- Sample: 500 viewers.
- Statistic: Sample mean age xˉsample.
- Inference goal: Decide whether population mean age μ differs from 23 (producer’s hypothesis \mu>23).
- Hypothesis test form once course completed:
- H<em>0:μ=23; H</em>A:μ=23 (or >23).
- Decision at α=0.05 ("5 % level of risk").
Goals & Best Practices in Data Collection
- Main goal: Obtain a representative sample mirroring population characteristics.
- Most common method: Simple Random Sampling (SRS)—every experimental unit has equal selection probability.
- Pitfalls/ethical notes:
- Bias if sampling frame incomplete or response rates differ.
- Privacy and informed consent when gathering personal data.
Road-Map of the Course (as highlighted)
- Intro & foundational definitions (this lecture).
- Descriptive statistics: tables, charts, numerical summaries.
- Inferential statistics: estimation & hypothesis testing.
- Business applications / analytics cases.
Key Take-aways for Exams & Practice
- Memorise core definitions (population, parameter, sample, statistic, business analytics).
- Be able to classify data scales (nominal, ordinal, interval).
- Distinguish descriptive vs inferential tasks given a scenario.
- Remember mnemonic: Statistic→Sample, Parameter→Population.
- Understand role of randomness in producing representative samples.
- Recognise real-world relevance: advertising ROI, shelf placement, logistics optimisation.