Survey Precision and Statistical Inference Notes

Overview and Institutional Context

These study notes are based on the course materials for "Survey Precision" presented by Dr. Atoui Saida for the academic year 2025-2026 at Setif 1 University - Ferhat ABBAS (جامعة سطيف 1 - فرحات عباس). The primary focus of the material is statistical inference and the determination of precision when analyzing both qualitative and quantitative variables within a population.

Introduction to Statistical Inference

Statistical Inference is defined as the process of inferring unknown characteristics of a population from partial observations. This process inherently includes a margin of error, as it involves making generalizations about a large group based on a subset (sample) of that group.

Precision of a Percentage or Proportion (Qualitative Variables)

When dealing with a qualitative variable, the objective is to determine the precision of a percentage or proportion $P$ . The problem is framed as follows:

$P_0$ : The observed proportion within the sample.
$P$ : The true population proportion, which is unknown.
$n$ : The sample size.
$x$ : The number of subjects in the sample possessing the specific characteristic.
The formula for the observed proportion is:
$p_0 = \frac{x}{n}$

Theoretical Reminders and the Binomial Distribution

For repeated sampling, the count of subjects $x$ presenting a characteristic follows a binomial distribution, denoted as $B(n, P)$ . The observed proportion $p_0$ exhibits the following statistical properties:

Mean: $np_0$
Variance: $\frac{p_0q_0}{n}$ (Note: for counts, the variance is expressed as $np_0q_0$ )
Standard deviation ( $s$ ): $\sqrt{\frac{p_0q_0}{n}}$

When the sample size $n$ is sufficiently large, the binomial distribution can be approximated by a normal distribution, a concept referred to as the "Normal Approximation."

The Standard Normal Distribution and Probability (Alpha)

The probability $\alpha$ (alpha) represents the probability of having values outside of a specific interval defined by the z-scores $(-z, +z)$ . This is visualized as the area in the tails of the distribution ( $\alpha/2$ in each tail).

According to the table of the standardized normal distribution (Loi Normale), derived from Fisher and Yates (Statistical tables for biological, agricultural, and medical research), the probability $\epsilon$ relates to the absolute value of the reduced deviation exceeding a given value. Notable z-score values include:

For $\alpha = 0.05$ , the z-score is $1.960$ .
For $\alpha = 0.01$ , the z-score is $2.576$ .
For $\alpha = 0.10$ , the z-score is $1.645$ .

Small probability values and their corresponding z-scores:

$0.001 \rightarrow 3.29053$
$0.0001 \rightarrow 3.89059$
$0.00001 \rightarrow 4.41717$
$0.000001 \rightarrow 4.89164$
$0.0000001 \rightarrow 5.32672$
$0.00000001 \rightarrow 5.73073$
$0.000000001 \rightarrow 6.10941$

Confidence Interval for Proportions

The calculation assumes that the true population proportion $P$ is close to the observed proportion $P_0$ . The population proportion is estimated to lie within a confidence interval:
$P = P_0 \pm e$ Where:

$e$ : The margin of error.
The range of the interval is from $P_0 - e$ to $P_0 + e$ .
The total width of the interval is $2e$ .
Error risk $\alpha$ corresponds to the z-score.
Maximum error formula: $e = z \times s$ , where $s = \sqrt{\frac{pq}{n}}$ .
Full estimation formula: $P = P_0 \pm z \sqrt{\frac{p_0q_0}{n}}$

Crucially, the risk of error $\alpha$ is inversely proportional to the width of the confidence interval.

Application: Disease Frequency Case Study

Data Provided:

Sample size ( $n$ ): $250$
Affected subjects ( $x$ ): $25$
Observed proportion ( $p_0$ ): $\frac{25}{250} = 0.1$ (or $10\%$
Complement ( $q_0$ ): $1 - 0.1 = 0.9$
Risk error (\alpha): $5\% (0.05)$ , which implies $z = 1.96$

Calculation:

Standard deviation of the proportion: $s = \sqrt{\frac{0.1 \times 0.9}{250}} = 0.019$
Margin of error: $e = 1.96 \times 0.019 = 0.038$
Interval calculation: $P = 0.1 \pm 0.038$
Result: Lower limit = $0.062$ , Upper limit = $0.138$

Check Conditions: To validate the normal approximation, the conditions $np \ge 5$ and $nq \ge 5$ must be met at the interval limits:

Lower limit ( $0.062$ ): $250 \times 0.062 = 15.5$ ; $250 \times 0.938 = 234.5$
Upper limit ( $0.138$ ): $250 \times 0.138 = 34.5$ ; $250 \times 0.862 = 215.5$ All values are $\ge 5$ , so the approximation is valid.

Conclusion: The frequency of the disease is estimated at $10\%$ and varies between $6.2\%$ and $13.8\%$ for a confidence level where $p = 0.05$ ( $IC 95\% = 6.2\% - 13.8\%$ ).

Precision of a Mean (Quantitative Variables)

For quantitative variables, the objective is to estimate the population mean $m$ (unknown) using the sample mean $m_0$ .

Statistical Parameters:

Sample Mean ( $m_0$ ): $\frac{\sum x_i}{n}$
Variance of variable $x$ ( $s^2$ ): $\frac{\sum (x_i - m)^2}{n}$
Standard deviation of variable $x$ ( $s$ ): $\sqrt{\frac{\sum (x_i - m)^2}{n}}$
Variance of the mean ( $m$ ): $\frac{s^2}{n}$
Standard deviation of the mean ( $m$ ): $\frac{s}{\sqrt{n}}$

Application Conditions:

The variable must follow a normal distribution.
The sample size must be $\ge 30$ ( $n \ge 30$ ).

Confidence Interval for the Mean

The calculation assumes $m$ is close to $m_0$ and falls within the interval $m_0 \pm e$ .

Range: $m_0 - e$ to $m_0 + e$
Width: $2e$
Formula: $m = m_0 \pm z \frac{s}{\sqrt{n}}$

Fluctuation of Variable $x$ : To find the range in which individual values of $x_i$ fluctuate in the population:

Formula: $x_i = m_0 \pm e$
Error for individual values: $e = z \times s$
Full formula: $x_i = m_0 \pm z s$

Application: Average Number of Children Case Study

Data Provided:

Sample size ( $n$ ): $400$
Sample average ( $m_0$ ): $6$ children
Standard deviation ( $s$ ): $2$ children
Risk error (\alpha): $5\% (0.05)$ , which implies $z = 1.96$

Condition Check:

The variable (number of children) follows a normal distribution.
$n = 400$ , which is greater than $30$ .

Calculation of Mean Precision:

Standard deviation of the mean: $s_m = \frac{2}{\sqrt{400}} = 0.1$
Margin of error: $e = 1.96 \times 0.1 = 0.2$
Population mean estimate: $m = 6 \pm 0.2$
Results: $m \in [5.8, 6.2]$

Conclusion on Mean: The average number of children in the population varies between $5.8$ and $6.2$ children ( $p = 0.05$ ; $IC 95\% = 5.8 - 6.2$ ).

Calculation of Variable Fluctuation:

Margin of error for individuals: $e = z \times s = 1.96 \times 2 = 3.92$ (rounded to $4$ children).
Interval: $x_i = 6 \pm 4$
Result: $2$ to $10$ children.

Conclusion on Fluctuation: For $95\%$ of families in the population, the number of children is between $2$ and $10$ children.