Statistics Unit 1
Introduction to Statistics
Statistics: A set of procedures that help to draw conclusions or make decisions under uncertainty based on data.
Applications: Analyzing policy effects, salary differences, capital investment decisions, and insurance premium assessments.
Why Study Statistics?
Three Parts of Statistics
Descriptive Statistics: Summarizes or describes data set information.
Probability: Analyzes laws of chance to manage uncertainty.
Statistical Inference: Procedures to make conclusions or decisions under uncertainty using data.
Basic Statistical Concepts
Population: Set of all elements to study (e.g., people, cars).
Sample: Subset of the population observed for analysis.
Parameter: Numerical measure of a population characteristic.
Statistic: Numerical measure obtained from a sample.
Example of Basic Concepts
Population: All adult Spanish males.
Sample: 500 selected adult Spanish males.
Parameter: Average height of all adult Spanish males.
Statistic: Average height of the selected 500 males.
Samples should be representative; adequate sampling procedures are studied separately.
Classification of Variables
Types of Variables
Categorical: Outcomes are categories.
Nominal: No intrinsic order (e.g., gender).
Ordinal: Ordered categories (e.g., satisfaction levels).
Numerical: Outcomes are numbers.
Discrete: Finite or countably infinite outcomes.
Continuous: Infinite outcomes (e.g., weight, height).
Examples of Variable Classification
Gender: Nominal categorical.
Marital Status: Nominal categorical.
Body Mass Index: Continuous numerical.
Visits to Doctor: Discrete numerical.
Satisfaction Level: Ordinal categorical.
Transforming Variables
Categorical to Numerical: Assigning numbers to categories (e.g., satisfaction).
Caution required interpreting results post-transformation.
Graphical Representation of Categorical Variables
Frequency and Relative Frequency
Frequency: Count of observations in a category.
Relative Frequency: Proportion of observations in a category.
Graphical Tools
Pie Chart: Circular representation of categorical data.
Bar Chart: Vertical bars symbolize frequencies; may arrange from most to least frequent (Pareto chart).
Examples
Lifestyle survey results can be presented in tables and pie/bar charts for easy visualization of categorical data.
Time Series Data Graphs
For numerical variables over time, use a two-dimensional graph: time on the horizontal axis, the variable on the vertical.
Graphs for Numerical Variables
Data Grouping
Define the number of classes and their lengths for grouping continuous variables.
Example guidelines for defining class intervals.
Histograms
Vertical bars show frequency for each class; visualizes observation distribution (min, max, uniform or normal distributions).
Cumulative Relative Frequency Polygon: Represents cumulative frequencies graphically, allowing for analysis of data distribution.
Scatter Plots and Relationships
Represents relationships between two numerical variables.
Clustering patterns indicate correlation direction.
Outliers can significantly affect statistical analysis.
Contingency Tables and Cluster Bar Charts
Analyze two categorical variables with joint frequencies organized in a contingency table.
Cluster bar charts can visually display frequencies, with adjustments made for relative frequencies to improve perception accuracy in disparities.