Comprehensive Study Notes on Sample and Research Design

Introduction to Sampling and Census Inquiry

All items in any field of inquiry constitute a "universe" or a "population."
A census inquiry is defined as a complete enumeration of all items in the population. Characteristics of a census inquiry include:
- All items are covered, leaving no element of chance.
- It provides the highest level of accuracy.
- It involves a massive expenditure of time, money, and energy.
- It is difficult to adopt for large fields of inquiry due to resource requirements.
- It is often beyond the reach of ordinary researchers and is typically reserved for government institutions (e.g., the national population census carried out once every decade).
A sample is technically a part of the total population studied to obtain sufficiently accurate results.
The sampling technique is the specific process of selecting these samples.
A sample must be a representative "miniature cross-section" of the total population.
A sample design is a pre-determined plan for selecting a sample and deciding the sample size before any data collection begins.

The Necessity and Advantages of Sampling

Resource Limitations: Constraints in time, finance, and manpower make studying an entire population difficult.
Destructive Testing: In scenarios like testing the breaking strength of materials, the test subjects must be destroyed. A census would result in the complete destruction of all materials; therefore, sampling is mandatory.
Speed: Sampling provides much quicker results than a census, which is critical when the time between the recognized need for information and the availability of that information is short.
Infinite Populations: Sampling is the only feasible process if the population is infinite.
Quality of Study: Some argue that sample-based studies are of higher quality than census studies. This is attributed to the possibility of better interviewing techniques, more thorough investigations of missing or suspicious information, better supervision, and superior data processing compared to the logistics of total coverage.

Fundamental Steps in Developing a Sample Design

Type of Universe: The researcher must define if the universe is finite (certain number of items, e.g., city population, factory workers) or infinite (uncountable items, e.g., listeners of a radio program, stars in the sky).
Source List (Sampling Frame): This is the list from which the sample is drawn. For a finite universe, it contains names of all items. It must be comprehensive, correct, reliable, appropriate, and representative.
Size of Sample: This refers to the total number of items selected. It should be "optimum"—neither excessively large nor too small. Factors influencing size include:
- Population variance.
- Population size.
- The parameter of interest.
- Budgetary constraints.
Parameters of Interest: The specific population parameters that are the focus of the research must be identified.
Budgetary Constraint: Practical cost considerations impact both the size and the type of sample, sometimes necessitating non-probability samples.
Sampling Procedure: The researcher must choose a technique that, for a given size and cost, offers the smallest sampling error.

Errors and Inaccuracy in Sampling: Systematic Bias and Sampling Error

Sampling analysis involves two costs: the cost of data collection and the cost of incorrect inference.
Systematic Bias: This results from errors in sampling procedures and cannot be reduced by increasing sample size. Causes include:
- Inappropriate Sampling Frame: A biased representation of the universe.
- Defective Measuring Device: Biased questionnaires, biased interviewers, or faulty physical measuring tools.
- Non-respondents: Inability to sample all individuals initially included.
- Indeterminacy Principle: Individuals acting differently when being observed (e.g., workers slowing down during a work study to influence piece-work quotas).
- Natural Bias in Reporting: Discrepancies such as downward bias in income reported to tax departments versus upward bias in reporting to social organizations.
Sampling Errors: Random variations in sample estimates around true population parameters.
- They are compensatory (expected value is zero) because they are equally likely to be positive or negative.
- Error magnitude decreases as sample size increases and is smaller in homogeneous populations.
- Precision of the sampling plan is the measurement of sampling error. Precision is improved by increasing sample size (to a limit) or, more effectively, by selecting a better sampling design.

Characteristics of an Effective Sample Design

It must result in a truly representative sample.
It must result in a small sampling error.
It must be viable within the available funds.
It must enable better control of systematic bias.
It must allow results to be applied to the universe with a reasonable level of confidence.

Probability Sampling Techniques

Probability sampling (random or chance sampling) is based on the concept of random selection.
Simple Random Sampling: Every unit in the universe has an equal and independent chance of being included.
- Requires a complete, up-to-date sampling frame.
- Most reliable for homogeneous populations.
- Defined as a sample where each of the $\text{NC}_n$ possible samples has an equal probability of $\frac{1}{\text{NC}_n}$ of being selected.
- Example: If $N=4$ (elements a, b, c, d) and sample size $n=2$ , there are $\binom{4}{2} = 6$ possible samples (ab, ac, ad, bc, bd, cd), each with a $\frac{1}{6}$ probability.
- Methods: Lottery method or random number tables.
- Hypothetical Lottery Logic: For drawing 2 elements from 4, the first draw probability is $\frac{2}{4}$ , and the second is $\frac{1}{3}$ . Joint probability is $\frac{2}{4} \times \frac{1}{3} = \frac{1}{6}$ .
Systematic Sampling: Selecting every $i^{th}$ item on a list.
- Skip Interval $(I) = \frac{\text{Population Size (P)}}{\text{Sample Size (S)}}$ .
- Start point is a random number between 1 and $I$ .
- Example: For a population of 100 and a sample of 20, $(I) = \frac{100}{20} = 5$ . If the start is 2, the sample includes 2, 7, 12, 17, 22… 97.
- Advantage: Even spread over the population; low cost.
- Disadvantage: Inefficient if there is hidden periodicity.
Stratified Sampling: Used for non-homogeneous populations. The population is divided into subpopulations (strata) that are internally homogeneous and externally heterogeneous.
- Forming Strata: Based on common characteristics, personal judgment, or pilot studies.
- Proportional Allocation: Sample size for stratum $i$ is found by $n \times P_i$ (where $P_i$ is the proportion of the population in that stratum).
- Example: $n=30$ , $N=800$ . Stratum 1 ( $N_1=400$ ) sample is $30 \times \frac{400}{800} = 15$ . Stratum 2 ( $N_2=240$ ) sample is $30 \times \frac{240}{800} = 9$ . Stratum 3 ( $N_3=160$ ) sample is $30 \times \frac{160}{800} = 6$ .
Cluster Sampling: Dividing the total area into smaller non-overlapping areas (clusters) and randomly selecting whole clusters to study.
- Advantage: Reduces cost by concentrating the survey.
- Disadvantage: Less precise than simple random sampling due to internal cluster homogeneity.

Comparison and Details of Probability Sampling Designs

Comparison of Stratified vs. Cluster Sampling:
- Stratification: Subgroups have many elements; focuses on within-subgroup homogeneity and between-subgroup heterogeneity; elements are randomly chosen from within each subgroup.
- Cluster: Subgroups have few elements; focuses on within-subgroup heterogeneity and between-subgroup homogeneity (though often the reverse occurs); entire subgroups are chosen randomly and studied in toto.
Comparison Table of Main Designs:
- Simple Random: Individually sampled; equal chance; Disadv: expensive, requires full listing.
- Systematic: Initial member selected determines the rest; Disadv: periodicity skew.
- Stratified: Sampled individually within subpopulations; all strata represented; Disadv: expensive to create.
- Cluster: Clusters selected; all members of selected clusters included; Disadv: lower statistical efficiency (more error).
Numerical Example Table (Selecting 8 stores from 320 stores in 10 cities with an interval of 40 and start point 8):
- City 1 (35 stores, Cum: 35): Sample 8
- City 2 (17 stores, Cum: 52): Sample 48
- City 3 (10 stores, Cum: 62):
- City 4 (32 stores, Cum: 94): Sample 88
- City 5 (80 stores, Cum: 174): Sample 128, 168
- City 6 (18 stores, Cum: 192):
- City 7 (26 stores, Cum: 218): Sample 208
- City 8 (19 stores, Cum: 237):
- City 9 (26 stores, Cum: 263): Sample 248
- City 10 (57 stores, Cum: 320): Sample 288

Non-Probability Sampling Techniques

In non-probability sampling, items are selected deliberately by the researcher's judgment. Conclusions are limited to the sample and cannot be generalized broadly.
Judgment (Purposive) Sampling: Researcher uses personal judgment to select experts or representative individuals.
Convenience (Accidental) Sampling: Selecting the most accessible population (e.g., friends or colleagues) to save time/resources.
Quota Sampling: Selecting a predetermined number of individuals from specific groups (age, gender, etc.). Useful for capturing rare characteristics.
Referral Sampling: Respondents provide names and addresses of other potential participants.

Introduction to Research Design

Research Design: A set of methods and procedures for collecting and analyzing variables. It is the framework for finding answers to research questions.
Design Components: Study type (Descriptive, Correlational, Semi-experimental, Experimental, Review, Meta-analytic), hypotheses, variables, experimental design, data collection methods, and statistical analysis plans.
Fixed vs. Flexible Designs:
- Fixed Designs: Fixed before data collection; usually quantitative and theory-driven.
- Flexible Designs: More freedom during collection; used when variables are not quantitatively measurable (e.g., culture) or when theory is unavailable beforehand.

Typologies of Research Design: Experimental and Case Study Methods

Experimental Designs: Active manipulation of variables and random assignment to study behavioral changes or outcomes.
- Importance of operationalizing variables and selecting statistical methods.
- Application of Power Analysis: Determining sample size required to find an effect while considering Type I and Type II error probabilities.
Case Study Method: Intensive investigation of a single unit (individual, institution, system, community) to locate factors accounting for behavioral patterns.
- Characteristics: Qualitative approach; intensive detail over breadth; holistic analysis of causal interrelationships; direct observation of behavioral patterns; useful for generating hypotheses.
- Phases of Case Study:
1. Recognition/determination of status.
2. Data collection and examination of history.
3. Diagnosis/identification of causal factors.
4. Application of remedial treatments.
5. Follow-up program for effectiveness.

Specialized Research Designs

Ethnographic Study: Involved with groups, organizations, or cultures where the researcher spends significant time within the community.
Cross-sectional Design: Measures differences among various subjects at one single point in time without a time dimension or intervention. It is a passive approach to causal inference.
Exploratory Design: Conducted when a research problem has few earlier studies. Goals include:
- Gaining familiarity with details and settings.
- Developing tentative theories/hypotheses.
- Determining study feasibility.
- Refining issues for systematic investigation.
Longitudinal Study: Follows the same sample over time with repeated observations to track changes and establish magnitude/direction of causal relationships. Also called a panel study.
Action Research: An iterative cycle of exploration, planning an interventionary strategy, executing the action, and observing results. The process repeats until a solution or sufficient understanding is reached. It is a cyclical protocol intended to foster deep understanding of situations.