Statistics Unit 1

Introduction to Statistics

  • Statistics: A set of procedures that help to draw conclusions or make decisions under uncertainty based on data.

  • Applications: Analyzing policy effects, salary differences, capital investment decisions, and insurance premium assessments.

Why Study Statistics?

Three Parts of Statistics

  • Descriptive Statistics: Summarizes or describes data set information.

  • Probability: Analyzes laws of chance to manage uncertainty.

  • Statistical Inference: Procedures to make conclusions or decisions under uncertainty using data.

Basic Statistical Concepts

  • Population: Set of all elements to study (e.g., people, cars).

  • Sample: Subset of the population observed for analysis.

  • Parameter: Numerical measure of a population characteristic.

  • Statistic: Numerical measure obtained from a sample.

Example of Basic Concepts

  • Population: All adult Spanish males.

  • Sample: 500 selected adult Spanish males.

  • Parameter: Average height of all adult Spanish males.

  • Statistic: Average height of the selected 500 males.

  • Samples should be representative; adequate sampling procedures are studied separately.

Classification of Variables

Types of Variables

  • Categorical: Outcomes are categories.

    • Nominal: No intrinsic order (e.g., gender).

    • Ordinal: Ordered categories (e.g., satisfaction levels).

  • Numerical: Outcomes are numbers.

    • Discrete: Finite or countably infinite outcomes.

    • Continuous: Infinite outcomes (e.g., weight, height).

Examples of Variable Classification

  • Gender: Nominal categorical.

  • Marital Status: Nominal categorical.

  • Body Mass Index: Continuous numerical.

  • Visits to Doctor: Discrete numerical.

  • Satisfaction Level: Ordinal categorical.

Transforming Variables

  • Categorical to Numerical: Assigning numbers to categories (e.g., satisfaction).

  • Caution required interpreting results post-transformation.

Graphical Representation of Categorical Variables

Frequency and Relative Frequency

  • Frequency: Count of observations in a category.

  • Relative Frequency: Proportion of observations in a category.

Graphical Tools

  • Pie Chart: Circular representation of categorical data.

  • Bar Chart: Vertical bars symbolize frequencies; may arrange from most to least frequent (Pareto chart).

Examples

  • Lifestyle survey results can be presented in tables and pie/bar charts for easy visualization of categorical data.

Time Series Data Graphs

  • For numerical variables over time, use a two-dimensional graph: time on the horizontal axis, the variable on the vertical.

Graphs for Numerical Variables

Data Grouping

  • Define the number of classes and their lengths for grouping continuous variables.

  • Example guidelines for defining class intervals.

Histograms

  • Vertical bars show frequency for each class; visualizes observation distribution (min, max, uniform or normal distributions).

  • Cumulative Relative Frequency Polygon: Represents cumulative frequencies graphically, allowing for analysis of data distribution.

Scatter Plots and Relationships

  • Represents relationships between two numerical variables.

  • Clustering patterns indicate correlation direction.

  • Outliers can significantly affect statistical analysis.

Contingency Tables and Cluster Bar Charts

  • Analyze two categorical variables with joint frequencies organized in a contingency table.

  • Cluster bar charts can visually display frequencies, with adjustments made for relative frequencies to improve perception accuracy in disparities.