Study Notes: Statistics (COM 508)

What is Statistics?

  • The term “statistics” can refer to numerical facts such as averages, medians, percentages, and maximums that help us understand business and economic situations.

  • Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data.

  • Statistics aim to explain patterns and correlations that appear within a data set and identify the factors that produce cause and effect.

Importance of Statistics

  • The use of statistics applies to almost every industry and occupation.

  • Professionals and individuals can use basic principles of statistics to improve understanding of variables and relationships present in businesses, finances, and the world.

Basic Concepts in Statistics

  • Data and Data Sets

    • Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation.

    • All data collected in a study are referred to as the data set for the study.

  • Elements, Variables, and Observations (definitions)

    • Elements: the entities on which data are collected.

    • Variable: a characteristic of interest for the elements.

    • Observation: the set of measurements obtained for a particular element.

    • A data set with n elements contains n observations.

    • The total number of data values in a complete data set is the number of elements multiplied by the number of variables: Total values=n×k\text{Total values} = n \times k where k is the number of variables.

  • Population vs Sample

    • Population: the collection of all the elements under consideration in a statistical study; the universe of the study; contains all elements from a data set; the measurable characteristic is called a PARAMETER.

    • Sample: a subset of the population from which information is collected; contains part of the population; the measurable characteristic is called a STATISTIC.

    • Inference is drawn on the population from the sample.

  • Parameters vs Statistics

    • Parameters: a summary measure describing a population (e.g., μ,σ2\mu, \sigma^2).

    • Statistics: a summary measure describing a sample (e.g., Xˉ,s2\bar{X}, s^2).

    • Note: the mean values of two or more samples drawn from the same population will not necessarily be equal.

  • Descriptive vs Inferential Statistics

    • Descriptive statistics: summarize the data collected (e.g., averages, charts, measures of variability).

    • Inferential statistics: draw conclusions or make predictions about population characteristics from sample information; includes estimation and hypothesis testing.

    • Examples:

    • Inferential: Estimate the population mean using the sample mean.

    • Inferential: Test the claim that the population mean equals a specified value.

  • Steps in Statistical Inquiry

    • Problem identification & hypothesis formulation

    • Research design formulation

    • Data collection

    • Data processing and analysis

    • Results interpretation / drawing conclusions

    • Data coding

Data and Data Sets (Key Concepts)

  • Data are facts and figures; a data set is all the data collected in a study.

  • Elements, Variables, and Observations (revisited)

    • An observation is the measurement for a single element on a particular variable.

  • Cross-check: data types (see later) and measurement scales.

Types of Data by Nature

  • Quantitative data (numerical): any attribute measured in numbers (e.g., height, weight).

  • Qualitative data (categorical): observations in the form of labeling of a characteristic (e.g., sex, color).

Types of Variables

  • Categorical/Qualitative Variable:

    • Values represent categories (nominal, ordinal).

  • Numerical/Quantitative Variable:

    • Values represent quantities and can be further categorized as:

    • Discrete: numerical values arising from counting (e.g., number of children).

    • Continuous: numerical responses arising from measurement (e.g., time, height).

  • Important note: Continuous data can be reported as discrete values (e.g., time in seconds), but are treated as continuous in analysis.

Continuous vs Discrete Data

  • Continuous data: can take any value within a continuum (finite or infinite interval).

    • Examples: passing rate, percentage of retention, time, distance, speed.

  • Discrete data: arise from counting; take on whole, distinct values.

    • Examples: number of students, tuition amount (rounded), number of products sold.

Measurement Scales (Primary Considerations in Selecting Techniques)

  • Categorical variables (qualitative):

    • Nominal: two or more categories with no intrinsic ordering; categories are distinct, non-overlapping, and exhaustive.

    • Ordinal: categories can be ranked/ordered; differences between categories are not necessarily equal.

  • Numerical/Quantitative variables:

    • Interval: measurable; zero point is arbitrary (zero does not imply absence of the characteristic).

    • Ratio: true zero point; zero indicates absence of the characteristic.

Variable Types in Survey Contexts (Examples and Classification)

  • Example questions to determine variable and scale (note the typical classifications):

    • Sex (Female/Male) — Nominal (categorical, no order).

    • Age (as of last birthday) — Ratio (zero means no age).

    • Current major of study — Nominal.

    • Year level — Ordinal.

    • GPA — Interval (0.0–4.0 scale; zero does not imply absence of academic ability).

    • Student status (Regular/Irregular) — Nominal.

    • Sibling studying in UST — Nominal (Yes/No).

    • Number of family alumni — Ratio (count).

    • Are you a registered voter? — Nominal.

    • Perceived fear about NCOV2-19 (1–10) — Ordinal (Likert-type scale).

  • Another set of statements (Relative to intrinsic rewards):

    • Part II presents a frequency-based scale for intrinsic rewards; typical scales are ordinal (frequency-based).

  • Measurement scale identifications (additional examples):

    • a. Customer Satisfaction Ratings — Ordinal (rating scale).

    • b. Department Names — Nominal.

    • c. Temperature in Office Spaces — Interval.

    • d. Job Applicant Priority — Ordinal.

    • e. Product Categories — Nominal.

    • f. Socio-Economic Status — Ordinal.

    • g. Customer Feedback Scores — Ordinal.

    • h. Total revenue earned — Ratio.

    • i. Amount of Sales — Ratio.

Types and Classifications of Variables

  • Manifest Variable vs Latent Variable

    • Manifest Variable: directly measurable with a single question; observed/observable.

    • Latent Variable: cannot be measured directly; measured with several indicators; latent variable models use manifest variables to infer existence of latent constructs.

  • Exogenous vs Endogenous Variables

    • Exogenous (Independent) Variable: causes fluctuations in other observable/latent variables; predictor/purpose = cause.

    • Endogenous (Dependent) Variable: influenced by exogenous variables; outcome/criterion = effect.

  • Diagrammatic mapping (conceptual):

    • Exogenous → Endogenous (Endogenous is influenced by Exogenous).

  • Example mappings (from the slides):

    • Advertising spending → Sales revenue (exogenous influences endogenous).

    • Emotional intelligence → Self-esteem (exogenous influences endogenous).

    • Job-skill mismatch → Intent to leave (exogenous influences endogenous).

Data Collection and Source Classification

  • Data Classification by Source

    • Primary data: data collected directly via observation or experimentation; Data Collection methods include observation, survey, experiment.

    • Secondary data: data collected from other sources (e.g., publications, reports).

  • Primary vs Secondary data examples

    • Primary data: customer satisfaction surveys (direct data collection from customers).

    • Secondary data: industry reports, market analyses from published sources.

Data Collection Methods and Data Types

  • Primary data collection methods: observation, survey, experiment.

  • Secondary data sources: print or electronic compilations, reports, government data, etc.

Cross-Sectional vs Time Series Data

  • Cross-Sectional Data: collected at roughly the same point in time across different subjects/items.

    • Example: Per capita GDP, credit ratings for 60 WTO nations at the same time.

  • Time Series Data: collected over several time periods for the same subject.

    • Example: U.S. average price per gallon of gasoline from 2012 to 2018.

  • Use cases:

    • Cross-sectional: snapshot of many subjects at one time.

    • Time series: trends and patterns over time; forecasting.

3 Types of Analytics: Descriptive, Predictive, Prescriptive

  • Analytics: transforming data into insights for decision making.

  • Descriptive analytics: describes what happened in the past or present.

  • Predictive analytics: uses past data to predict future values; identify factors that influence one variable on another.

  • Prescriptive analytics: provides best-course-of-action recommendations to optimize outcomes.

Examples of Descriptive, Predictive, and Prescriptive Analysis

  • Descriptive Analysis

    • Problem Situation: summarize total sales by region/product/time, identify trends, calculate averages; visualize trends.

    • Actions: compute totals, means, standard deviations; create charts/graphs.

  • Predictive Analysis

    • Problem Situation: forecast next quarter sales to plan production; identify drivers like seasonality and promotions.

    • Actions: build models to forecast; quantify impact of predictors.

  • Prescriptive Analysis

    • Problem Situation: optimize inventory to minimize costs while avoiding stockouts.

    • Actions: use optimization algorithms to set reorder points, supplier mix, storage strategies.

Practical Applications and Formulas

  • Descriptive statistics often start with a data set and compute measures such as the mean.

    • Sample mean:

    • Population mean: μ\mu

    • Sample variance: s2s^2

    • Population variance: σ2\sigma^2

  • Relationship between population and sample summaries:

    • Parameters describe the population; statistics describe the sample.

    • Inference uses sample data to draw conclusions about population characteristics.

  • Data values in a complete data set: Total values=n×k\text{Total values} = n \times k where k is the number of variables.

Quick Reference (Key Definitions)

  • Population: All elements under study; universe of analysis.

  • Sample: Subset of the population used for analysis.

  • Parameter: A numerical summary of a population.

  • Statistic: A numerical summary of a sample.

  • Observation: The measurement for a particular element on a given variable.

  • Element: The entity from which data are collected.

  • Variable: A characteristic of interest for the elements.

Connections to Prior and Real-World Relevance

  • Statistical thinking underpins decision making in business, marketing, operations, and economics.

  • Understanding types of data and measurement scales helps in selecting appropriate analyses and interpreting results.

  • Recognizing when to use descriptive vs inferential methods guides study design and interpretation of findings.

  • Time series and cross-sectional analyses support forecasting and market research, enabling better strategic planning.

Practical Considerations and Ethics (Implicit)

  • Ethical use of data involves collecting data responsibly, protecting privacy, and reporting results accurately.

  • When using sample statistics to draw conclusions about populations, acknowledge uncertainty and sampling variability.

  • The choice between measurement scales affects the type of analyses that are valid and the interpretations that can be made.