Combine _241028_235241
4.1 Statistics and Sampling
Statistics Overview
Origin: Derived from Latin "statisticum collegium" (council of state) and Italian "statista" (statesman).
Historical Context: Analyzed demographic and economic data; originally termed political arithmetic.
Evolution: Expanded in the 1800s to encompass data collection, summary, and analysis of any kind, integrating with probability for statistical inference.
Definition: Statistics is the science of designing studies, collecting data, and modeling/analyzing it for decision-making and scientific discovery when information is limited and variable.
Types of Statistics
Descriptive Statistics:
Describes the main features of a data set.
Includes methods for collecting, organizing, summarizing, and presenting data.
Examples: Bar Graphs, Histograms, Mean (Arithmetic, Weighted), Median, Mode, Variability (Range, Variance, Standard Deviation).
Inferential Statistics:
Involves making inferences about a population based on sample data.
Methods include generalization from samples, estimations, hypothesis tests, correlation and regression analyses, and predictive modeling.
Population and Sample
Population (N): The totality of observations or elements from a dataset.
Sample (n): A subset of the population selected for the study, used to draw conclusions about the whole population.
Variables
Definition: A variable is a characteristic with varying quantities or qualities.
Types of Variables:
Qualitative: Categories based on characteristics (e.g., sex, religion).
Quantitative:
Discrete: Countable numbers (e.g., number of students).
Continuous: Measurable values that can take on any value within a range (e.g., weight, height).
Levels of Measurement (NOIR)
Nominal: Categorical variables without natural ordering (e.g., gender).
Ordinal: Categorical variables with a ranking (e.g., education level).
Interval: Numerical variables without a meaningful zero point (e.g., temperature in Celsius).
Ratio: Numerical variables with a meaningful zero (e.g., income).
Sampling Techniques
Sampling Definition: The act of selecting a representative part of a population.
Purpose of Sampling: To draw conclusions about populations from samples to understand characteristics effectively.
Reasons for Sampling:
Economy: Reduces costs and resources.
Timeliness: Quicker data collection and analysis.
Large Population Sizes: Practical to sample rather than attempt total enumeration.
Inaccessibility: Some populations are unreachable, thus requiring sampling of accessible subsets.
Census vs. Sampling
Census: Complete collection of demographic, economic, and social data for entire population.
Essential Features of a Census:
Individual enumeration, defined territory, regular intervals, inclusive coverage.
Basic Steps in Sampling
Determine the population.
Select appropriate sampling design.
Determine sample size.
Obtain the sample.
Sampling Designs
Probability Sampling: Every member of the population has a known chance of selection.
Simple Random Sampling: Unbiased; every unit has an equal chance; selections are independent.
Systematic Sampling: Selects every nth member based on a defined interval.
Stratified Sampling: Dividing the population into strata and randomly sampling from each stratum.
Cluster Sampling: Divide population into clusters and randomly select clusters to sample.
Non-Probability Sampling: Not all members have a chance of being included.
Purposive Sampling: Selection based on researcher judgment.
Convenience Sampling: Selecting readily available subjects.
Snowball Sampling: Existing respondents recruit future subjects from among their acquaintances.
Quota Sampling: Ensures equal representation among key segments.
Sample Size Determination
Importance of appropriate sample size in gaining reliable results. Common formulations:
Yamane’s Formula for calculating sample size based on population size and sampling error.
Cochran’s Formula for ideal sample size given a desired precision level and confidence.
Measures of Central Tendency
Arithmetic Mean: Average of a dataset.
Weighted Mean: Considers the importance of different data points.
Median: Middle value when data is ordered.
Mode: Most frequent value in a dataset.
Measures of Dispersion
Range: Difference between highest and lowest values.
Variance: Measures how data points differ from the mean.
Standard Deviation: Indicates spread of data points around the mean.
Coefficient of Variation: Standard deviation expressed as a percentage of the mean.