Research & Sampling Design – Comprehensive Study Notes

Meaning and Scope of Research Design

"Research Design" = overall blueprint for a study.
- Arranges the “what, where, when, how much & by what means” of data collection/analysis.
- Formal definition: “arrangement of conditions for collection & analysis of data in a manner that combines relevance to research purpose with economy in procedure.”
Covers researcher actions from framing the hypothesis → operationalising variables → collecting & analysing data → writing final report.

Typical Design‐Decision Questions

What is the study about & why is it being done?
Where & when will it be carried out (time horizon, geographic setting)?
What data are required & where can they be found?
What period(s) will be covered?
What sample design will be followed?
Which data-collection techniques will be used?
How will data be analysed? In what reporting style?

Four Sub-Designs Contained in the Master Design

Sampling design – procedure for selecting study elements.
Observational (data) design – conditions under which observations are made (who, when, where, with what instrument).
Statistical design – how many observations, which analyses.
Operational design – field and administrative procedures that implement the other three.

Need / Importance of Research Design

Ensures smooth, efficient conduct → maximises information while minimising time, effort & cost.
Provides firm foundation for reliability & validity of results.
Serves as architectural “blue-print,” analogous to a house plan before construction.
Specific benefits:
- Reduces inaccuracy & bias; improves efficiency & reliability.
- Minimises waste of resources; guides resource allocation.
- Clarifies requirements for hypothesis testing & data needs.
- Communicates overview to other experts; keeps project on course.

Features of a Good Design

Flexible, appropriate, efficient, economical.
Minimises bias & experimental error; maximises reliability.
Generates maximal information; allows examination of multiple aspects of problem.
Suitability is context-specific: one design rarely fits all problems.

Criteria normally considered:

Means of obtaining information.
Skill/availability of researcher & staff.
Objectives & nature of problem.
Time & money constraints.

Key Concepts in Research Design

Variable – measurable concept; can be continuous (age, income) or discrete (number of children).
Independent Variable (IV) – antecedent; manipulated/predictor.
Dependent Variable (DV) – outcome/consequence affected by IV.
- Example: Age → Height (Age = IV, Height = DV).
Extraneous Variable – IV not related to study purpose but influences DV → introduces experimental error.
Control – design strategies to minimise extraneous influence.
Confounded Relationship – DV influence by extraneous variables; IV–DV relation clouded.
Research Hypothesis – predictive statement linking at least one IV to one DV; tested via scientific method (e.g., “e-Learning enhances teaching–learning experience”).
Hypothesis-Testing Research
- Experimental: IVs actively manipulated.
- Non-experimental: IVs not manipulated (ex-post-facto, survey, etc.).
Experimental & Control Groups – experimental receives novel treatment; control receives usual conditions.
Treatments – specific conditions applied to groups.
Experiment – process of testing statistical hypothesis.
- Absolute vs comparative experiments.
Experimental Unit – basic plot/subject on which treatment is applied.

Traditional Categories of Research Design

Objective	Appropriate Design	Typical Key Words
Gain background, define terms, clarify problems, set priorities	Exploratory	Qualitative, flexible, informal
Describe who, what, where, when, how	Descriptive / Diagnostic	Cross-sectional, longitudinal, survey
Determine causality, test “if-then”, evaluate effects	Causal	Experiment, manipulation, control

Exploratory Research

Unstructured, informal; undertaken when little is known.
Uses: secondary data analysis, experience surveys, case analysis, focus groups, projective techniques.

Descriptive Research

Answers who/what/where/when/how (not why).
Cross-sectional studies → one-time measurement; sample surveys often online.
Longitudinal studies → repeated measures; panels or repeated independent samples.

Causal Research (Experiments)

Explains phenomena via conditional statements “If X then Y.”
Relies on manipulation of IVs & control of extraneous variables.
Two settings: Laboratory (high control, artificial) vs Field (natural, realistic).
Symbols:
- $O$ = observation/measurement of DV
- $X$ = manipulation of IV
- $R$ = random assignment
- $E$ = experimental effect
Simple experimental patterns:
- After-Only: $X\ O_1$
- One-Group Before–After: $O1\ X\ O2$
- Before–After w/ Control:
- Experimental: $O1\ X\ O2$
- Control: $O3\ O4$ , $E = (O2 - O1) - (O4 - O3)$

Internal vs External Validity

Internal – observed DV change really due to IV.
External – results generalise to real-world.

R. A. Fisher’s Three Principles of Experimental Design

Replication – repeat treatments to estimate error & increase accuracy.
Randomisation – assign treatments randomly to protect against extraneous influences.
Local Control (Blocking) – deliberately vary known nuisance factors so their variation can be measured & removed.

Informal Experimental Designs (no specific statistical layout)

Before-After without Control.
After-Only with Control.
Before-After with Control.

Formal Experimental Designs

Completely Randomised Design (CRD) – uses replication & randomisation; analysed by one-way ANOVA.
Randomised Block Design (RBD) – adds blocking (local control); analysed by two-way ANOVA.
Latin Square (LS) Design – controls two blocking factors (rows & columns); each treatment appears once per row & column.
Factorial Designs – study two or more factors simultaneously.
- Simple $2 \times 2$ factorial.
- Multifactor (e.g., $2 \times 2 \times 2$ ) – cells labelled Cell 1 … Cell 8.

Sampling Design: Definitions & Options

Sample Design – definite plan for obtaining sample from a population; specifies selection technique.
Census – complete enumeration of every element (e.g., Indian Census every 10 yrs).
- Advantages: intensive detail; high accuracy.
- Disadvantages: cost, time, impractical for large populations.
Sample Survey – study subset; conclusions generalised to population.
- Advantages: economical, quicker, indispensable, checks on census.

The Sampling Design Process

Define the population (geography, demographics, usage, awareness, etc.).
Choose / construct the sampling frame (list of elements).
Select sampling technique(s) – probability or non-probability.
Determine sample size.
Execute sampling & ensure adherence to plan.

Criteria for Selecting a Sampling Procedure

Balance two costs:
1. Data-collection cost.
2. Cost of incorrect inference.
Consider potential systematic bias & sampling error.
Major sources of systematic bias:
- Inappropriate frame, defective measuring device, non-response, observation effects, natural reporting bias.
Sampling error is random; falls as $\propto \frac{1}{\sqrt{n}}$ .

Characteristics of a Good Sample Design

Produces truly representative sample.
Small sampling error.
Viable within budget & logistics.
Controls systematic bias effectively.
Allows results to generalise to population with reasonable confidence.

Classification of Sampling Techniques

Probability Sampling (each element known, non-zero chance)

Simple Random Sampling (SRS)
- $P_{selection} = \frac{n}{N}$ where $N$ = population size.
- Procedure: number elements 1… $N$ , generate $n$ random numbers.
Systematic Sampling
- Skip interval $k = \frac{N}{n}$ ; pick random start $r$ $\in [1,k]$ then elements $r, r+k, r+2k, \dots$ .
Stratified Sampling
- Divide population into homogeneous strata; draw SRS within each.
- Sample per stratum $nh$ may be proportionate $nh = n\frac{N_h}{N}$ or disproportionate.
Cluster Sampling
- Divide into heterogeneous clusters; randomly select clusters, then either take all units (one-stage) or sample within (two-stage/multistage).
- Area Sampling – clusters are geographic areas (blocks, districts, etc.).

Stratified vs Cluster (Quick Contrast)

Feature	Stratified	Cluster
Subdivision	Few strata, many elements each	Many clusters, few elements each
Within group	Homogeneous	Heterogeneous
Between groups	Heterogeneous	Homogeneous
Selection	Sample element	Sample cluster
Goal	Increase precision (↓error)	Increase efficiency (↓cost)

Non-Probability Sampling (non-random selection)

Convenience – easy access.
Judgment (Purposive) – researcher selects what seems representative.
Quota – sample mirrors population on selected control characteristics (gender, age, etc.). Example calculation for 3,000 respondents with sex/age/education proportions.
Snowball – initial respondents refer others; useful for rare traits.

Choosing Probability vs Non-Probability

Factor	Favors Non-Prob	Favors Prob
Research nature	Exploratory	Conclusive
Sampling vs nonsampling error	Non-sampling bigger	Sampling bigger
Population variability	Homogeneous	Heterogeneous
Statistical analysis	Unfavorable	Favorable
Operational ease/cost	Favorable	Unfavorable

Strengths & Weaknesses Summary (Selected Techniques)

Probability → results projectable, sampling error computable; but higher cost & time.
Non-probability → cheaper, quicker; but unknown sampling error, limited generalisability.

Sampling & Non-Sampling Errors

Sampling Error – due to studying sample not whole; $SE \downarrow \text{ when } n\uparrow$ .
Non-Sampling Error – any other error (coverage, non-response, measurement, processing).
- Group A: Preparation errors (e.g., inadequate frame).
- Group B: Data-collection errors (interviewer bias, respondent misreporting).
- Group C: Processing errors (editing, coding, analysis mistakes).

Graphical relation: as sample size increases, sampling error ↓ but non-sampling error may ↑ after a point.

Sample Size Determination

Why Determine Sample Size?

Ensure sufficient power/precision without wasting resources.
Too small → study cannot detect true effects; too large → unnecessary cost, may lose accuracy via non-sampling errors.

Statistical Concepts

Random Error ↔ precision (reliability); reduced by larger $n$ .
Systematic Error (Bias) ↔ accuracy (validity); reduced by better design.
Null ($H0$) vs Alternative ($H1$) Hypothesis
- Type I Error (\alpha) – reject true $H_0$ ; commonly $\alpha = 0.05$ .
- Type II Error (\beta) – fail to reject false $H_0$ .
- Power $= 1-\beta$ – probability of detecting true effect.
Effect Size – magnitude of difference/association to be detected.

Basic Formula: Large Populations (>10,000)

$n = \frac{Z^2 p q}{d^2}$

$Z$ = z-value for desired confidence (1.96 for 95%).
$p$ = expected proportion possessing attribute; if unknown use 0.5.
$q = 1 - p$ .
$d$ = acceptable margin of error (precision), e.g., 0.05.

Example: $p = 0.5, Z = 1.96, d = 0.05 \Rightarrow n = 384$ .

Finite Population Correction (<10,000)

$n_f = \frac{n}{1 + \frac{n}{N}}$

where $N$ = population size.

Example: $n = 400, N = 1{,}000 \Rightarrow n_f = \frac{400}{1 + 0.4} = 286$ .

Comparing Two Equal Groups (Proportions)

$n' = \frac{2 Z^2 p q}{d^2}$

If expecting $p = 0.40$ and wanting to detect $d = 0.10$ difference at 95% confidence:
$n' = \frac{2\times 1.96^2 \times 0.4 \times 0.6}{0.1^2} = 184$ per group.

Relation of Sample Size, Error & Power

$n \propto \frac{Z{\alpha} + Z{\beta}}{\text{Effect Size}^2}$ (generic principle).
Larger $n$ → higher power, smaller confidence interval width.

Practical Tools for Sample Size & Power

Ready-made tables (e.g., incidence rate table with relative precision).
Nomograms (graphical; need control % & desired % change).
Software: Epi-Info, nQuery, Power & Precision, Sample, STATA, SPSS.

Summary & Practical Implications

A research or sampling design is the strategic plan that binds objectives, data needs, collection methods, analysis, cost & time.
Good design and sampling choices minimise both bias & variance, ensuring results are valid, reliable, precise and economical.
Understanding experimental layouts, probability vs non-probability sampling, and error structures is essential before fieldwork begins.
Sample-size calculations anchor the study’s statistical validity; they hinge on effect size, desired confidence, allowable error, power, and population size.
Ethical & practical stakes: over- or under-sizing wastes resources, compromises findings, or burdens participants unnecessarily.