Statistics Notes

Introduction to Statistics

  • Studying statistics requires maintaining context to avoid losing sight of the bigger picture.

Everyday Use of Statistics

  • Assessing student performance in exams.
  • Comparing population growth rates between cities.
  • Determining the safest means of transportation.
  • Determining if a disease spread is an epidemic.
  • Determining insurance premiums based on driving records.
  • Predicting election results using opinion polls.
  • Measuring customer satisfaction to improve product quality.
  • Determining staffing needs for supplying goods and services.
  • Identifying peak times for customer service demand.
  • Finding the usual wait time to see a doctor.
  • Identifying hotspots of criminal activities.
  • Forecasting weather for the next few days.
  • Tracking trends in stock market or revenue performances.
  • Determining the number of operators needed at various times to handle calls.
  • Analyzing player performance in hockey and basketball games.

Top 10 Uses of Statistics

  1. Predictions: Statistics helps in making predictions about future events.
  2. Forecasts: Used in weather forecasting and governmental planning.
  3. Preparedness: Helpful in emergency preparedness.
  4. Testing: Important for quality testing in various areas.
  5. Political: Crucial in political campaigns.
  6. Predicting: Plays a role in the medical field.
  7. Financial: The financial market relies heavily on statistics.
  8. Insurance: Used extensively in the insurance industry to determine premiums and assess risk.
  9. Consumer: Widely used in consumer goods products.
  10. Sports: Essential for making sports more effective.

Statistical Thinking

  • Humans constantly observe and process information from their surroundings.
  • Engage in statistical thinking knowingly or unknowingly in daily activities.
  • Statistical awareness is an important part of daily lives.
  • Observed events tend to follow patterns or regularities.
  • Investigating these patterns helps in understanding their effects.
  • Data collection is needed to investigate patterns through measurement, counting, or quantification.
  • Measurements enable data analysis to identify connections, linkages, similarities, and differences.
  • Understanding data allows for the application and generalization of knowledge to a wider context.
  • Strive to avoid errors in generalization by collecting more data for higher confidence.
  • Limited data reduces confidence in inferences, while large amounts of data increase confidence.
  • 100% certainty in generalizing population characteristics is unattainable.
  • Weigh chances or probabilities to assess the accuracy of generalizations, determining the degree of confidence.
  • Statistical thinking summarizes experiences to understand essential features.
  • Use summaries to estimate or predict future outcomes.

Statistical Thinking - Time Value

  • Statistics helps explore the nature of variations over time.
  • Describe patterns of change.
  • Understand factors responsible for variations.
  • Predict possible future outcomes.

Statistical Thinking - Key Processes

  • Variation: Observe random variations of events.
  • Pattern: Explore regularities in variations.
  • Trend: Summarize variations to detect trends.
  • Connect: Link variations to causes.
  • Generalize: Draw reasonable conclusions and apply knowledge.
  • Probability: Assign probability to measure confidence in conclusions.

What is Statistics?

  • Statistics is the science of collecting, organizing, analyzing, presenting, and interpreting data to make informed decisions.
  • It generates summary information and inferences about processes.
  • Uses facts and figures to measure, assess, analyze, manage, and improve decisions.
  • The goal is to rely on credible, data-driven evidence for effective decision-making, which is evidence-based management.
  • Statistics:
    • Generates knowledge and useful intelligence.
    • Monitors business processes.
    • Manages situations or phenomena.
    • Evaluates and assesses options.
    • Mitigates risks or adverse effects.
    • Improves business operations.
    • Predicts outcomes and forecasts future trends.

Statistics as a Science

  • Statistical analysis follows a scientific cycle.
    • Formulate a hypothesis.
    • Collect sample data.
    • Analyze the data.
    • Interpret the results to draw conclusions about the hypothesis.
  • Statistics is a science because it provides proven methods for systematic examination.
  • Based on mathematical theorems proven through logic.

Scientific Cycle

  • Theory/Hypothesis: Explores the relationship between events or phenomena to understand their connections.
  • Data Collection: Involves designing effective ways to obtain data from the field to represent the entire population.
  • Data Analysis: Uses statistical methods to describe and analyze data, drawing conclusions about the validity of the hypothesis.

Why Study Statistics?

  • To Conduct Research: Use statistical methods to publish scientific research, management reports, and academic journals, contributing to scientific knowledge.
  • To Intelligently Read and Interpret Reports: Acquire numerical literacy skills to critically evaluate research articles, academic journals, and reports for sound decision-making.
  • To Further Develop Critical and Analytical Thinking Skills: Enhance critical thinking and logical reasoning to understand issues, improving analytical reasoning.
  • To Become an Informed Consumer of Information: Critically evaluate information to avoid being misled by misuse of statistics.

Data to Decision

  • Statistics is about finding meaning behind numbers.
  • Data: Isolated, raw, and unprocessed facts and figures without context (descriptive).
  • Information: Organized, structured data providing meaning by exploring relationships (diagnostic).
  • Knowledge: Learning derived from information through logical reasoning, revealing trends and patterns (predictive).
  • Wisdom: Insights and actions obtained from knowledge to guide the best course of action for solving complex issues (prescriptive).
  • A statistical process generally requires evidence to support an argument, obtained through data gathering and statistical analysis.
  • Statistical analysis helps discover facts, information, knowledge, and critical insights for informed decisions.

Data to Decisions Breakdown

  1. Data:
    • Signal, know-nothing.
    • What? Reveals relationships.
  2. Information:
    • Useful, organized, structured, given context.
    • Why? Reveals patterns.
  3. Knowledge:
    • Contextual learning, given meaning.
    • What is Best? Reveals principles.
  4. Wisdom:
    • Understanding, given insight, actionable.
    • What Action? Reveals direction.
  5. Decisions:
    *Change, movement, given purpose

Applied Statistical Research

  1. Exploratory Research:
    • Used to find out if a particular issue is becoming a growing concern in a community.
    • Uses facts and figures to identify and explore a problem.
    • Examples:
      • Is teenage pregnancy a growing societal problem?
      • Is school bullying a serious social problem?
      • Is mental health a serious issue facing the youth?
      • Whether a neighborhood is becoming a crime hotspot?
      • Whether the police are racial profiling?
  2. Descriptive Research:
    • Used to describe a prevailing issue of interest.
    • Provides a method of measuring and describing phenomena.
    • Examples:
      • Monitor the teenage pregnancy rate for the past 5 years.
      • How many students have become victims of school bullying in the past 5 years?
      • What is the statistical trend in mental health cases among the youth population?
      • Track the crime rates of the neighborhood for the past 5 years.
      • What is the racial breakdown of police stops?
  3. Explanatory Research:
    • Used to identify causes and effects of phenomena, to explain and predict how one phenomenon will change in response to a change in the other phenomenon. *Examples:
      • What factors are related to teenage pregnancy?
      • Whether poor parenting leads to school bullying?
      • What factors are contributing to rising mental health among the youth?
      • Whether gang activities are contributing more to crime in the neighborhood?
      • Whether lack of diversity is affecting police relationship with the community?
  4. Evaluation Research:
    • Statistics is used to assess the effectiveness of policies and program outcomes against a baseline or comparing post outcome against prior outcome.
    • Examples:
      • Whether after-school programs reduce teenage delinquency?
      • Whether cameras reduced red light traffic violation?
      • How effective are early intervention programs for the youth population?
      • Whether police on school campus reduced school violence?
      • Whether community policing has improved police relationship with the community

Sampling and Inference

  • Population: Complete collection of all objects or persons of interest; a measurement of any population characteristics is known as a parameter.
  • Sample: A subset of data drawn from a population for study; a measurement of any sample characteristics is known as a statistic.
  • Sampling: Process of collecting data from the field to represent an entire population.
    • Needed because investigating every member is impractical, expensive, and time-consuming.
  • Inference: Process of making decisions or conclusions about the population based on sample evidence.

Types of Statistics: Descriptive vs. Inferential

  • Descriptive Statistics:
    • Used to describe, organize, and summarize information about an entire population.
    • Example: 90% satisfaction of all customers.
  • Inferential Statistics:
    • Used to generalize about a population based on a sample of data.
    • Example: 90% satisfaction of a sample of 50 customers --> 90% satisfaction of all customers.

Descriptive Statistics

  • Methods used to summarize, describe, and present data in a meaningful way.
  • Includes tabular, graphical, and numerical descriptions.
  • Organize data using tables (e.g., frequency table), graphs (pie chart and bar graph), and calculate quantities (e.g., mean, median, and mode).
  • Effective way of describing the characteristics of a sample or observed data.
  • Does not allow making conclusions beyond the analyzed data; uses a deductive (top-down) approach.
  • Statistical information in newspapers, magazines, company reports are summarized and presented in this format.

Inferential Statistics

  • Since studying an entire population is often impossible, data is collected from a sample.
  • Sampling involves collecting sample data from the field to represent a population, which is often large, expensive, and time-consuming to study.
  • After analyzing sample data, results are inferred back to the population for estimates, predictions, or conclusions about population characteristics.
  • Statistical inference is the process of making decisions or conclusions about the population based on sample evidence.
  • Researchers generalize from the sample to make conclusions beyond the sample data.
  • Examples include hypothesis testing, probability, interval estimation, and regression; uses an inductive (bottom-up) approach.
Example of Inferential Statistics
  • Estimating the average income of residents in Toronto.
  • Studying the income of the entire Toronto population (
    2.52.5 million) is time-consuming and expensive.
  • A survey of 200 residents shows an average income of $43,000\$43,000.
  • Inference: The entire population in Toronto has an average income of $43,000\$43,000.
  • Population: All elements of interest in a study (parameter).
  • Sample: A subset of the population (statistic).

Descriptive vs. Inferential Statistics - Comparison Chart

FeatureDescriptive StatisticsInferential Statistics
UseDescribe dataGeneralize data
ProcessNarrowing down large dataInferring small data to a broader level
FocusOrganizing, analyzing, and presenting dataComparing, testing, and predicting the future of the data
Result DisplayGraphs, charts, etc.Probability scores
ConclusionDescribes known dataConcludes data beyond availability
ToolsCentral tendency, dispersion, skewness, etc.Hypothesis tests, confidence intervals, regression analysis, etc.

Statistical Reasoning

Inductive Reasoning

  • Infer the characteristics of the population based on what you know about the sample (inferential statistics).
  • Moves from precise observation to a generalization.
  • Begin with research questions and collection of empirical data, which are used to generate hypotheses and theory.
  • Conclusions are probabilistic (weak or strong).
  • You need evidence instead of true fact.
  • Example: The quiz is easy therefore the final exam will be easy.
  • Sample è Population

Deductive Reasoning

  • Deduce the characteristics of the sample based on existing theory about the population (descriptive statistics).
  • Moves from generalized statement to an effective conclusion.
  • Begin with a theory-driven hypothesis, which guide data collection and analysis.
  • Conclusions are sure (valid or invalid).
  • You use true facts to assume a valid conclusion.
  • Example: All students in the class play guitar. Jane is a student in the class so Jane plays guitar.
  • Sample ç Population

Statistical Variables

  • Independent Variable:
    • The cause or the explanatory variable.
    • Explains the existence of the dependent variable.
    • Influences the dependent variable.
    • Represented by “X” on a chart.
  • Dependent Variable:
    • The effect or the response variable.
    • Responds to the effect of the independent variable.
    • Influenced by the independent variable.
    • Represented by “Y” on a chart.
  • Example: In a relationship between income and expense, your level of expense depends on your level of income; expense is a dependent variable, and income is an independent variable.
  • The independent variable is controlled or manipulated by the researcher, while the dependent variable changes in response to the independent variable.
  • A statistical value can be a constant or variable; a constant is fixed, while a variable changes.

Big Data & Data Analytics

  • Vast amounts of data are available for analysis due to social media and interactive technology.
  • Organizations collect large amounts of data daily, referred to as “Big Data”.

Six Defining Terms of Big Data

  1. Volume: Vast and huge amount of available data (e.g., terrabytes, yottabytes).
  2. Variety: Many different forms of data (e.g., text, audio, video, touch-screen).
  3. Velocity: The high speed and frequency at which data is generated and processed.
  4. Veracity: The growing abnormalities, biases, and noises affecting data quality.
  5. Value: The growing value of data to improve decision-making in organizations.
  6. Volatility: The rapid changing and unpredictable flow of data (e.g., social media).
  • Many organizations emphasize using data to drive business operations.

  • Data alone is meaningless without valuable insights for informed actions.

  • Leveraging huge datasets is crucial, making data a critical asset.

  • Organizations implement data warehousing to capture, store, and maintain large data.

  • Data warehousing makes it possible to store, retrieve, and process extremely large quantities of data in seconds.

  • Data is queried using data mining techniques.

  • Data mining involves using statistics, mathematics, AI, machine learning, and computers to mine data for better decision-making.

  • Dramatic increase in available data, cost-effective storage, faster processing, and recognition of data value have increased the recognition of data analytics.

  • Data Analytics: A scientific process of transforming data into critical insight and actionable intelligence to make better-informed decisions.

Four Types of Data Analytics

  1. Descriptive Analytics:

    • Looks at past data to explain what happened and why.
  2. Diagnostic Analytics: Understand why something happened in the past. Root cause analysis.

  3. Predictive Analytics:

    • Uses past data to forecast what will happen in the future, determining necessary actions.
  4. Prescriptive Analytics:

    • Helpful in determining what actions need to be taken to affect those outcomes
Descriptive Analytics
  • Involves using analytical tools to describe past events, signaling what is right or wrong without explaining why.
  • Summarizes findings using data visualization and tabulation methods like data queries, dashboards, descriptive statistics, graphs, and tables.
  • For instance, using data to describe the crime rate over the past five years.
Diagnostic Analytics
  • Involves using analytical tools to understand why something happened in the past.
  • Also known as root cause analysis, it seeks to understand the root causes of events.
  • Helpful in determining factors and events that contributed to the outcome by making connections and establishing correlations, providing causal relations but no insight.
  • Techniques include data mining, correlation, probabilities, and sensitivity analysis.
  • For instance, using data to investigate why the crime rate has increased over the past five years.
Predictive Analytics
  • Involves using analytical tools to predict future events.
  • Uses analytical models constructed from past data to predict future outcomes or assess the impact of variables.
  • Uses findings from descriptive and diagnostic analytics to predict future trends and relies on machine learning.
  • For instance, using methods such as mathematical models, linear regression, time-series, and forecasting models to predict future crime rates.
Prescriptive Analytics
  • Involves using a set of analytical techniques that yield a best course of action, recommending actions to affect outcomes.
  • Aims to prescribe actions to eliminate future problems or take advantage of trends.
  • Suggests favorable outcomes for specified actions and various actions to achieve particular outcomes.
  • For instance, using optimization models to generate deployment solutions that minimize costs while maximizing efficiencies in reducing crime.

Technology and Statistics

  • Given the volume of data, processing speed, and complex calculations, sophisticated calculators and software applications are used for statistical analysis.
  • Examples of graphing and statistical software applications are MS Excel, SPSS, SAS, Python, and R.
  • Software applications and calculators make statistical calculations easier, faster, and more accurate.
  • Focus is now placed on understanding statistical concepts, appropriate techniques, and interpretation of statistical results rather than regurgitating formulas.
  • MS Excel is widely used due to its ease of use and prevalence.
  • Excel has built-in statistical functions (Analysis ToolPak) that add statistical power.

Ethical Considerations

  • It is important to be fair, thorough, objective, and neutral when collecting and analyzing data.

Avoid Unethical Behaviors

  • Improper or biased sampling.

  • Over-generalization based on small/unrepresentative sampling.

  • Considering only data points that reinforce a particular theory.

  • Inappropriate analysis of data.

  • Development of misleading graphs.

  • Fudging data or creating false data.

  • Inappropriate summary statistics.

  • Running multiple tests until the desired result is obtained.

  • Biased interpretation of statistical results.

  • The American Statistical Association developed the “Ethical Guidelines For Statistical Practice” to help practitioners make ethical decisions and assist students in performing responsible statistical work.