Combine _241028_235241

4.1 Statistics and Sampling

Statistics Overview

  • Origin: Derived from Latin "statisticum collegium" (council of state) and Italian "statista" (statesman).

  • Historical Context: Analyzed demographic and economic data; originally termed political arithmetic.

  • Evolution: Expanded in the 1800s to encompass data collection, summary, and analysis of any kind, integrating with probability for statistical inference.

  • Definition: Statistics is the science of designing studies, collecting data, and modeling/analyzing it for decision-making and scientific discovery when information is limited and variable.

Types of Statistics

  • Descriptive Statistics:

    • Describes the main features of a data set.

    • Includes methods for collecting, organizing, summarizing, and presenting data.

    • Examples: Bar Graphs, Histograms, Mean (Arithmetic, Weighted), Median, Mode, Variability (Range, Variance, Standard Deviation).

  • Inferential Statistics:

    • Involves making inferences about a population based on sample data.

    • Methods include generalization from samples, estimations, hypothesis tests, correlation and regression analyses, and predictive modeling.

Population and Sample

  • Population (N): The totality of observations or elements from a dataset.

  • Sample (n): A subset of the population selected for the study, used to draw conclusions about the whole population.

Variables

  • Definition: A variable is a characteristic with varying quantities or qualities.

  • Types of Variables:

    • Qualitative: Categories based on characteristics (e.g., sex, religion).

    • Quantitative:

      • Discrete: Countable numbers (e.g., number of students).

      • Continuous: Measurable values that can take on any value within a range (e.g., weight, height).

Levels of Measurement (NOIR)

  1. Nominal: Categorical variables without natural ordering (e.g., gender).

  2. Ordinal: Categorical variables with a ranking (e.g., education level).

  3. Interval: Numerical variables without a meaningful zero point (e.g., temperature in Celsius).

  4. Ratio: Numerical variables with a meaningful zero (e.g., income).

Sampling Techniques

  • Sampling Definition: The act of selecting a representative part of a population.

  • Purpose of Sampling: To draw conclusions about populations from samples to understand characteristics effectively.

  • Reasons for Sampling:

    • Economy: Reduces costs and resources.

    • Timeliness: Quicker data collection and analysis.

    • Large Population Sizes: Practical to sample rather than attempt total enumeration.

    • Inaccessibility: Some populations are unreachable, thus requiring sampling of accessible subsets.

Census vs. Sampling

  • Census: Complete collection of demographic, economic, and social data for entire population.

  • Essential Features of a Census:

    • Individual enumeration, defined territory, regular intervals, inclusive coverage.

Basic Steps in Sampling

  1. Determine the population.

  2. Select appropriate sampling design.

  3. Determine sample size.

  4. Obtain the sample.

Sampling Designs

  • Probability Sampling: Every member of the population has a known chance of selection.

    • Simple Random Sampling: Unbiased; every unit has an equal chance; selections are independent.

    • Systematic Sampling: Selects every nth member based on a defined interval.

    • Stratified Sampling: Dividing the population into strata and randomly sampling from each stratum.

    • Cluster Sampling: Divide population into clusters and randomly select clusters to sample.

  • Non-Probability Sampling: Not all members have a chance of being included.

    • Purposive Sampling: Selection based on researcher judgment.

    • Convenience Sampling: Selecting readily available subjects.

    • Snowball Sampling: Existing respondents recruit future subjects from among their acquaintances.

    • Quota Sampling: Ensures equal representation among key segments.

Sample Size Determination

  • Importance of appropriate sample size in gaining reliable results. Common formulations:

    • Yamane’s Formula for calculating sample size based on population size and sampling error.

    • Cochran’s Formula for ideal sample size given a desired precision level and confidence.

Measures of Central Tendency

  • Arithmetic Mean: Average of a dataset.

  • Weighted Mean: Considers the importance of different data points.

  • Median: Middle value when data is ordered.

  • Mode: Most frequent value in a dataset.

Measures of Dispersion

  • Range: Difference between highest and lowest values.

  • Variance: Measures how data points differ from the mean.

  • Standard Deviation: Indicates spread of data points around the mean.

  • Coefficient of Variation: Standard deviation expressed as a percentage of the mean.