Research & Sampling Design – Comprehensive Study Notes

Meaning and Scope of Research Design

  • "Research Design" = overall blueprint for a study.
    • Arranges the “what, where, when, how much & by what means” of data collection/analysis.
    • Formal definition: “arrangement of conditions for collection & analysis of data in a manner that combines relevance to research purpose with economy in procedure.”
  • Covers researcher actions from framing the hypothesis → operationalising variables → collecting & analysing data → writing final report.

Typical Design‐Decision Questions

  • What is the study about & why is it being done?
  • Where & when will it be carried out (time horizon, geographic setting)?
  • What data are required & where can they be found?
  • What period(s) will be covered?
  • What sample design will be followed?
  • Which data-collection techniques will be used?
  • How will data be analysed? In what reporting style?

Four Sub-Designs Contained in the Master Design

  1. Sampling design – procedure for selecting study elements.
  2. Observational (data) design – conditions under which observations are made (who, when, where, with what instrument).
  3. Statistical design – how many observations, which analyses.
  4. Operational design – field and administrative procedures that implement the other three.

Need / Importance of Research Design

  • Ensures smooth, efficient conduct → maximises information while minimising time, effort & cost.
  • Provides firm foundation for reliability & validity of results.
  • Serves as architectural “blue-print,” analogous to a house plan before construction.
  • Specific benefits:
    • Reduces inaccuracy & bias; improves efficiency & reliability.
    • Minimises waste of resources; guides resource allocation.
    • Clarifies requirements for hypothesis testing & data needs.
    • Communicates overview to other experts; keeps project on course.

Features of a Good Design

  • Flexible, appropriate, efficient, economical.
  • Minimises bias & experimental error; maximises reliability.
  • Generates maximal information; allows examination of multiple aspects of problem.
  • Suitability is context-specific: one design rarely fits all problems.

Criteria normally considered:

  • Means of obtaining information.
  • Skill/availability of researcher & staff.
  • Objectives & nature of problem.
  • Time & money constraints.

Key Concepts in Research Design

  • Variable – measurable concept; can be continuous (age, income) or discrete (number of children).
  • Independent Variable (IV) – antecedent; manipulated/predictor.
  • Dependent Variable (DV) – outcome/consequence affected by IV.
    • Example: Age → Height (Age = IV, Height = DV).
  • Extraneous Variable – IV not related to study purpose but influences DV → introduces experimental error.
  • Control – design strategies to minimise extraneous influence.
  • Confounded Relationship – DV influence by extraneous variables; IV–DV relation clouded.
  • Research Hypothesis – predictive statement linking at least one IV to one DV; tested via scientific method (e.g., “e-Learning enhances teaching–learning experience”).
  • Hypothesis-Testing Research
    • Experimental: IVs actively manipulated.
    • Non-experimental: IVs not manipulated (ex-post-facto, survey, etc.).
  • Experimental & Control Groups – experimental receives novel treatment; control receives usual conditions.
  • Treatments – specific conditions applied to groups.
  • Experiment – process of testing statistical hypothesis.
    • Absolute vs comparative experiments.
  • Experimental Unit – basic plot/subject on which treatment is applied.

Traditional Categories of Research Design

ObjectiveAppropriate DesignTypical Key Words
Gain background, define terms, clarify problems, set prioritiesExploratoryQualitative, flexible, informal
Describe who, what, where, when, howDescriptive / DiagnosticCross-sectional, longitudinal, survey
Determine causality, test “if-then”, evaluate effectsCausalExperiment, manipulation, control

Exploratory Research

  • Unstructured, informal; undertaken when little is known.
  • Uses: secondary data analysis, experience surveys, case analysis, focus groups, projective techniques.

Descriptive Research

  • Answers who/what/where/when/how (not why).
  • Cross-sectional studies → one-time measurement; sample surveys often online.
  • Longitudinal studies → repeated measures; panels or repeated independent samples.

Causal Research (Experiments)

  • Explains phenomena via conditional statements “If X then Y.”
  • Relies on manipulation of IVs & control of extraneous variables.
  • Two settings: Laboratory (high control, artificial) vs Field (natural, realistic).
  • Symbols:
    • OO = observation/measurement of DV
    • XX = manipulation of IV
    • RR = random assignment
    • EE = experimental effect
  • Simple experimental patterns:
    • After-Only: X O1X\ O_1
    • One-Group Before–After: O<em>1 X O</em>2O<em>1\ X\ O</em>2
    • Before–After w/ Control:
    • Experimental: O<em>1 X O</em>2O<em>1\ X\ O</em>2
    • Control: O<em>3 O</em>4O<em>3\ O</em>4, E=(O<em>2O</em>1)(O<em>4O</em>3)E = (O<em>2 - O</em>1) - (O<em>4 - O</em>3)

Internal vs External Validity

  • Internal – observed DV change really due to IV.
  • External – results generalise to real-world.

R. A. Fisher’s Three Principles of Experimental Design

  1. Replication – repeat treatments to estimate error & increase accuracy.
  2. Randomisation – assign treatments randomly to protect against extraneous influences.
  3. Local Control (Blocking) – deliberately vary known nuisance factors so their variation can be measured & removed.

Informal Experimental Designs (no specific statistical layout)

  • Before-After without Control.
  • After-Only with Control.
  • Before-After with Control.

Formal Experimental Designs

  1. Completely Randomised Design (CRD) – uses replication & randomisation; analysed by one-way ANOVA.
  2. Randomised Block Design (RBD) – adds blocking (local control); analysed by two-way ANOVA.
  3. Latin Square (LS) Design – controls two blocking factors (rows & columns); each treatment appears once per row & column.
  4. Factorial Designs – study two or more factors simultaneously.
    • Simple 2×22 \times 2 factorial.
    • Multifactor (e.g., 2×2×22 \times 2 \times 2) – cells labelled Cell 1 … Cell 8.

Sampling Design: Definitions & Options

  • Sample Design – definite plan for obtaining sample from a population; specifies selection technique.
  • Census – complete enumeration of every element (e.g., Indian Census every 10 yrs).
    • Advantages: intensive detail; high accuracy.
    • Disadvantages: cost, time, impractical for large populations.
  • Sample Survey – study subset; conclusions generalised to population.
    • Advantages: economical, quicker, indispensable, checks on census.

The Sampling Design Process

  1. Define the population (geography, demographics, usage, awareness, etc.).
  2. Choose / construct the sampling frame (list of elements).
  3. Select sampling technique(s) – probability or non-probability.
  4. Determine sample size.
  5. Execute sampling & ensure adherence to plan.

Criteria for Selecting a Sampling Procedure

  • Balance two costs:
    1. Data-collection cost.
    2. Cost of incorrect inference.
  • Consider potential systematic bias & sampling error.
  • Major sources of systematic bias:
    • Inappropriate frame, defective measuring device, non-response, observation effects, natural reporting bias.
  • Sampling error is random; falls as 1n\propto \frac{1}{\sqrt{n}}.

Characteristics of a Good Sample Design

  • Produces truly representative sample.
  • Small sampling error.
  • Viable within budget & logistics.
  • Controls systematic bias effectively.
  • Allows results to generalise to population with reasonable confidence.

Classification of Sampling Techniques

Probability Sampling (each element known, non-zero chance)

  • Simple Random Sampling (SRS)
    • Pselection=nNP_{selection} = \frac{n}{N} where NN = population size.
    • Procedure: number elements 1…NN, generate nn random numbers.
  • Systematic Sampling
    • Skip interval k=Nnk = \frac{N}{n}; pick random start rr [1,k]\in [1,k] then elements r,r+k,r+2k,r, r+k, r+2k, \dots.
  • Stratified Sampling
    • Divide population into homogeneous strata; draw SRS within each.
    • Sample per stratum n<em>hn<em>h may be proportionate n</em>h=nNhNn</em>h = n\frac{N_h}{N} or disproportionate.
  • Cluster Sampling
    • Divide into heterogeneous clusters; randomly select clusters, then either take all units (one-stage) or sample within (two-stage/multistage).
    • Area Sampling – clusters are geographic areas (blocks, districts, etc.).
Stratified vs Cluster (Quick Contrast)
FeatureStratifiedCluster
SubdivisionFew strata, many elements eachMany clusters, few elements each
Within groupHomogeneousHeterogeneous
Between groupsHeterogeneousHomogeneous
SelectionSample elementSample cluster
GoalIncrease precision (↓error)Increase efficiency (↓cost)

Non-Probability Sampling (non-random selection)

  • Convenience – easy access.
  • Judgment (Purposive) – researcher selects what seems representative.
  • Quota – sample mirrors population on selected control characteristics (gender, age, etc.). Example calculation for 3,000 respondents with sex/age/education proportions.
  • Snowball – initial respondents refer others; useful for rare traits.
Choosing Probability vs Non-Probability
FactorFavors Non-ProbFavors Prob
Research natureExploratoryConclusive
Sampling vs nonsampling errorNon-sampling biggerSampling bigger
Population variabilityHomogeneousHeterogeneous
Statistical analysisUnfavorableFavorable
Operational ease/costFavorableUnfavorable

Strengths & Weaknesses Summary (Selected Techniques)

  • Probability → results projectable, sampling error computable; but higher cost & time.
  • Non-probability → cheaper, quicker; but unknown sampling error, limited generalisability.

Sampling & Non-Sampling Errors

  • Sampling Error – due to studying sample not whole; SE when nSE \downarrow \text{ when } n\uparrow.
  • Non-Sampling Error – any other error (coverage, non-response, measurement, processing).
    • Group A: Preparation errors (e.g., inadequate frame).
    • Group B: Data-collection errors (interviewer bias, respondent misreporting).
    • Group C: Processing errors (editing, coding, analysis mistakes).

Graphical relation: as sample size increases, sampling error ↓ but non-sampling error may ↑ after a point.

Sample Size Determination

Why Determine Sample Size?

  • Ensure sufficient power/precision without wasting resources.
  • Too small → study cannot detect true effects; too large → unnecessary cost, may lose accuracy via non-sampling errors.

Statistical Concepts

  • Random Error ↔ precision (reliability); reduced by larger nn.
  • Systematic Error (Bias) ↔ accuracy (validity); reduced by better design.
  • Null ($H0$) vs Alternative ($H1$) Hypothesis
    • Type I Error (\alpha) – reject true H0H_0; commonly α=0.05\alpha = 0.05.
    • Type II Error (\beta) – fail to reject false H0H_0.
    • Power =1β= 1-\beta – probability of detecting true effect.
  • Effect Size – magnitude of difference/association to be detected.

Basic Formula: Large Populations (>10,000)

n=Z2pqd2n = \frac{Z^2 p q}{d^2}

  • ZZ = z-value for desired confidence (1.96 for 95%).
  • pp = expected proportion possessing attribute; if unknown use 0.5.
  • q=1pq = 1 - p.
  • dd = acceptable margin of error (precision), e.g., 0.05.

Example: p=0.5,Z=1.96,d=0.05n=384p = 0.5, Z = 1.96, d = 0.05 \Rightarrow n = 384.

Finite Population Correction (<10,000)

nf=n1+nNn_f = \frac{n}{1 + \frac{n}{N}}

  • where NN = population size.

Example: n=400,N=1,000nf=4001+0.4=286n = 400, N = 1{,}000 \Rightarrow n_f = \frac{400}{1 + 0.4} = 286.

Comparing Two Equal Groups (Proportions)

n=2Z2pqd2n' = \frac{2 Z^2 p q}{d^2}

  • If expecting p=0.40p = 0.40 and wanting to detect d=0.10d = 0.10 difference at 95% confidence:
    n=2×1.962×0.4×0.60.12=184n' = \frac{2\times 1.96^2 \times 0.4 \times 0.6}{0.1^2} = 184 per group.

Relation of Sample Size, Error & Power

  • nZ<em>α+Z</em>βEffect Size2n \propto \frac{Z<em>{\alpha} + Z</em>{\beta}}{\text{Effect Size}^2} (generic principle).
  • Larger nn → higher power, smaller confidence interval width.

Practical Tools for Sample Size & Power

  • Ready-made tables (e.g., incidence rate table with relative precision).
  • Nomograms (graphical; need control % & desired % change).
  • Software: Epi-Info, nQuery, Power & Precision, Sample, STATA, SPSS.

Summary & Practical Implications

  • A research or sampling design is the strategic plan that binds objectives, data needs, collection methods, analysis, cost & time.
  • Good design and sampling choices minimise both bias & variance, ensuring results are valid, reliable, precise and economical.
  • Understanding experimental layouts, probability vs non-probability sampling, and error structures is essential before fieldwork begins.
  • Sample-size calculations anchor the study’s statistical validity; they hinge on effect size, desired confidence, allowable error, power, and population size.
  • Ethical & practical stakes: over- or under-sizing wastes resources, compromises findings, or burdens participants unnecessarily.