Study Notes: Statistics (COM 508)

What is Statistics?

The term “statistics” can refer to numerical facts such as averages, medians, percentages, and maximums that help us understand business and economic situations.
Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data.
Statistics aim to explain patterns and correlations that appear within a data set and identify the factors that produce cause and effect.

Importance of Statistics

The use of statistics applies to almost every industry and occupation.
Professionals and individuals can use basic principles of statistics to improve understanding of variables and relationships present in businesses, finances, and the world.

Basic Concepts in Statistics

Data and Data Sets
- Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation.
- All data collected in a study are referred to as the data set for the study.
Elements, Variables, and Observations (definitions)
- Elements: the entities on which data are collected.
- Variable: a characteristic of interest for the elements.
- Observation: the set of measurements obtained for a particular element.
- A data set with n elements contains n observations.
- The total number of data values in a complete data set is the number of elements multiplied by the number of variables: $\text{Total values} = n \times k$ where k is the number of variables.
Population vs Sample
- Population: the collection of all the elements under consideration in a statistical study; the universe of the study; contains all elements from a data set; the measurable characteristic is called a PARAMETER.
- Sample: a subset of the population from which information is collected; contains part of the population; the measurable characteristic is called a STATISTIC.
- Inference is drawn on the population from the sample.
Parameters vs Statistics
- Parameters: a summary measure describing a population (e.g., $\mu, \sigma^2$ ).
- Statistics: a summary measure describing a sample (e.g., $\bar{X}, s^2$ ).
- Note: the mean values of two or more samples drawn from the same population will not necessarily be equal.
Descriptive vs Inferential Statistics
- Descriptive statistics: summarize the data collected (e.g., averages, charts, measures of variability).
- Inferential statistics: draw conclusions or make predictions about population characteristics from sample information; includes estimation and hypothesis testing.
- Examples:
- Inferential: Estimate the population mean using the sample mean.
- Inferential: Test the claim that the population mean equals a specified value.
Steps in Statistical Inquiry
- Problem identification & hypothesis formulation
- Research design formulation
- Data collection
- Data processing and analysis
- Results interpretation / drawing conclusions
- Data coding

Data and Data Sets (Key Concepts)

Data are facts and figures; a data set is all the data collected in a study.
Elements, Variables, and Observations (revisited)
- An observation is the measurement for a single element on a particular variable.
Cross-check: data types (see later) and measurement scales.

Types of Data by Nature

Quantitative data (numerical): any attribute measured in numbers (e.g., height, weight).
Qualitative data (categorical): observations in the form of labeling of a characteristic (e.g., sex, color).

Types of Variables

Categorical/Qualitative Variable:
- Values represent categories (nominal, ordinal).
Numerical/Quantitative Variable:
- Values represent quantities and can be further categorized as:
- Discrete: numerical values arising from counting (e.g., number of children).
- Continuous: numerical responses arising from measurement (e.g., time, height).
Important note: Continuous data can be reported as discrete values (e.g., time in seconds), but are treated as continuous in analysis.

Continuous vs Discrete Data

Continuous data: can take any value within a continuum (finite or infinite interval).
- Examples: passing rate, percentage of retention, time, distance, speed.
Discrete data: arise from counting; take on whole, distinct values.
- Examples: number of students, tuition amount (rounded), number of products sold.

Measurement Scales (Primary Considerations in Selecting Techniques)

Categorical variables (qualitative):
- Nominal: two or more categories with no intrinsic ordering; categories are distinct, non-overlapping, and exhaustive.
- Ordinal: categories can be ranked/ordered; differences between categories are not necessarily equal.
Numerical/Quantitative variables:
- Interval: measurable; zero point is arbitrary (zero does not imply absence of the characteristic).
- Ratio: true zero point; zero indicates absence of the characteristic.

Variable Types in Survey Contexts (Examples and Classification)

Example questions to determine variable and scale (note the typical classifications):
- Sex (Female/Male) — Nominal (categorical, no order).
- Age (as of last birthday) — Ratio (zero means no age).
- Current major of study — Nominal.
- Year level — Ordinal.
- GPA — Interval (0.0–4.0 scale; zero does not imply absence of academic ability).
- Student status (Regular/Irregular) — Nominal.
- Sibling studying in UST — Nominal (Yes/No).
- Number of family alumni — Ratio (count).
- Are you a registered voter? — Nominal.
- Perceived fear about NCOV2-19 (1–10) — Ordinal (Likert-type scale).
Another set of statements (Relative to intrinsic rewards):
- Part II presents a frequency-based scale for intrinsic rewards; typical scales are ordinal (frequency-based).
Measurement scale identifications (additional examples):
- a. Customer Satisfaction Ratings — Ordinal (rating scale).
- b. Department Names — Nominal.
- c. Temperature in Office Spaces — Interval.
- d. Job Applicant Priority — Ordinal.
- e. Product Categories — Nominal.
- f. Socio-Economic Status — Ordinal.
- g. Customer Feedback Scores — Ordinal.
- h. Total revenue earned — Ratio.
- i. Amount of Sales — Ratio.

Types and Classifications of Variables

Manifest Variable vs Latent Variable
- Manifest Variable: directly measurable with a single question; observed/observable.
- Latent Variable: cannot be measured directly; measured with several indicators; latent variable models use manifest variables to infer existence of latent constructs.
Exogenous vs Endogenous Variables
- Exogenous (Independent) Variable: causes fluctuations in other observable/latent variables; predictor/purpose = cause.
- Endogenous (Dependent) Variable: influenced by exogenous variables; outcome/criterion = effect.
Diagrammatic mapping (conceptual):
- Exogenous → Endogenous (Endogenous is influenced by Exogenous).
Example mappings (from the slides):
- Advertising spending → Sales revenue (exogenous influences endogenous).
- Emotional intelligence → Self-esteem (exogenous influences endogenous).
- Job-skill mismatch → Intent to leave (exogenous influences endogenous).

Data Collection and Source Classification

Data Classification by Source
- Primary data: data collected directly via observation or experimentation; Data Collection methods include observation, survey, experiment.
- Secondary data: data collected from other sources (e.g., publications, reports).
Primary vs Secondary data examples
- Primary data: customer satisfaction surveys (direct data collection from customers).
- Secondary data: industry reports, market analyses from published sources.

Data Collection Methods and Data Types

Primary data collection methods: observation, survey, experiment.
Secondary data sources: print or electronic compilations, reports, government data, etc.

Cross-Sectional vs Time Series Data

Cross-Sectional Data: collected at roughly the same point in time across different subjects/items.
- Example: Per capita GDP, credit ratings for 60 WTO nations at the same time.
Time Series Data: collected over several time periods for the same subject.
- Example: U.S. average price per gallon of gasoline from 2012 to 2018.
Use cases:
- Cross-sectional: snapshot of many subjects at one time.
- Time series: trends and patterns over time; forecasting.

3 Types of Analytics: Descriptive, Predictive, Prescriptive

Analytics: transforming data into insights for decision making.
Descriptive analytics: describes what happened in the past or present.
Predictive analytics: uses past data to predict future values; identify factors that influence one variable on another.
Prescriptive analytics: provides best-course-of-action recommendations to optimize outcomes.

Examples of Descriptive, Predictive, and Prescriptive Analysis

Descriptive Analysis
- Problem Situation: summarize total sales by region/product/time, identify trends, calculate averages; visualize trends.
- Actions: compute totals, means, standard deviations; create charts/graphs.
Predictive Analysis
- Problem Situation: forecast next quarter sales to plan production; identify drivers like seasonality and promotions.
- Actions: build models to forecast; quantify impact of predictors.
Prescriptive Analysis
- Problem Situation: optimize inventory to minimize costs while avoiding stockouts.
- Actions: use optimization algorithms to set reorder points, supplier mix, storage strategies.

Practical Applications and Formulas

Descriptive statistics often start with a data set and compute measures such as the mean.
- Sample mean: x̄
- Population mean: $\mu$
- Sample variance: $s^2$
- Population variance: $\sigma^2$
Relationship between population and sample summaries:
- Parameters describe the population; statistics describe the sample.
- Inference uses sample data to draw conclusions about population characteristics.
Data values in a complete data set: $\text{Total values} = n \times k$ where k is the number of variables.

Quick Reference (Key Definitions)

Population: All elements under study; universe of analysis.
Sample: Subset of the population used for analysis.
Parameter: A numerical summary of a population.
Statistic: A numerical summary of a sample.
Observation: The measurement for a particular element on a given variable.
Element: The entity from which data are collected.
Variable: A characteristic of interest for the elements.

Connections to Prior and Real-World Relevance

Statistical thinking underpins decision making in business, marketing, operations, and economics.
Understanding types of data and measurement scales helps in selecting appropriate analyses and interpreting results.
Recognizing when to use descriptive vs inferential methods guides study design and interpretation of findings.
Time series and cross-sectional analyses support forecasting and market research, enabling better strategic planning.

Practical Considerations and Ethics (Implicit)

Ethical use of data involves collecting data responsibly, protecting privacy, and reporting results accurately.
When using sample statistics to draw conclusions about populations, acknowledge uncertainty and sampling variability.
The choice between measurement scales affects the type of analyses that are valid and the interpretations that can be made.