Statistics: Margin of Error, Standard Deviation, and Sampling Reliability

Margin of Error Defined: This term refers to the measure of accuracy for collected data. It indicates how much the results might vary from the actual population value, typically expressed as a percentage, such as $1\%$ or $2\%$ in either direction. It essentially assesses the accuracy of each piece of data collected during the initial phase.
Standard Deviation Defined: Unlike the margin of error, standard deviation is calculated after the data has already been collected. It measures how individual data points vary from one another or from the central mean of the group.
Key Distinction:
- Margin of Error: Focused on the accuracy and variability of the data collection process itself.
- Standard Deviation: Focused on the internal variance within the dataset once the information is gathered.

Approximation: Sample means and proportions are used to represent the overall population. Statistics from a smaller group act as an approximation to generalize what is occurring in the larger population.
Variability of Samples: Sample means and proportions will vary based on several factors:
- The Experimenter: Who is conducting the study or survey.
- Timing: When the experiment is performed.
- Sample Space: The specific size of the group ( $n$ ) from which information is taken.
Consistent Procedures: Even if the same experiment or survey is repeated, the results can differ depending on the specific sample group chosen.

Statistical Relationship: Standard deviation is used to help estimate the margin of error. The two concepts are related in terms of representing group variability.
Direct Correlations:
- A Large Margin of Error corresponds to high variability, which results in a larger standard deviation. This leads to a lack of confidence in the final results.
- A Small Margin of Error indicates more consistent and reliable data.
Random Sampling Variations: Multiple random samplings of the same size (e.g., three different groups of $30$ people) can yield different results. Understanding this variation is critical to logical thinking in statistics.

Population Characteristics: Statistics are often used to define characteristics of a whole population, such as in political opinion polls or popularity polls (e.g., gubernatorial races).
Poll Reliability:
- If a poll has a high margin of error, it is considered unreliable.
- If a poll has a low margin of error, it is generally considered a good, valid study.
Sampling Bias Example: Consider a survey on shortening the school day.
- Biased Sampling: Surveying only students will likely result in a larger margin of error regarding the general population because students have a weighted opinion (wanting less school).
- Random Sampling: Surveying a mix of parents and students provides a variety of opinions, leading to a more valid and accurate collection of data.

The Scenario: A sample of five coins is taken to determine their origin.
Historical Mint Locations:
- Denver (D): Coins marked with a D.
- Philadelphia (P): Coins that are either blank or stamped with a P.
- San Francisco (S): The third major mint location in the United States, identified by an S mark.
Practical Context: Individuals saving specific coins, such as $2,026$ pennies, provides a real-world example of gathering a sample to analyze population characteristics.

The Rule of Two Deviations: A common rule of thumb for establishing a good margin of error is to use two standard deviations ( $2\sigma$ ).
The 95% Rule: Setting the margin of error at two standard deviations generally ensures that $95\%$ of the respondents or data points fall within that calculated range.
Calculation Logic:
- Assume a mean or estimate of $0.55$ .
- Assume one standard deviation is $0.15$ .
- Two deviations would be $2 \times 0.15 = 0.30$ .
- The margin of error for the estimate of $0.55$ would be stated as $0.55 \pm 0.3$ .
Comparison of Data Sets: If two different sets of data are collected and their deviations differ, then their margins of error will also necessarily differ.

The Effect of Sample Size:
- Large Sample Size ( $n \uparrow$ ): Talking to more people reduces the standard deviation and the margin of error. As the sample size increases, data becomes more reliable, and the distance between the estimate and the actual population value decreases.
- Small Sample Size ( $n \downarrow$ ): Results in less reliable data and a larger margin of error.
Visualizing Reliability: In a graph, a larger sample size causes the data points to cluster closer together, minimizing the gaps and making the results tend toward the true mean.

Question (Student): Is the relationship between sample size and standard deviation an inverse relationship?
Response: Yes, that is a good way to think about it. As you talk to more people, you expect less deviation (the graph becomes tighter) assuming the data is valid. Conversely, high standard deviation (wider graphs) means data points are spread further from the center.
Question (Student): What margin of error should we give for an estimate of $0.55$ ?
Response: Using the rule of two deviations, if one deviation is $0.15$ , you use $0.3$ as the margin of error, resulting in $0.55 \pm 0.3$ . This ensures that the bulk of the data (the $95\%$ confidence interval) is captured.
Question (Student): Does more people mean more reliable?
Response: Absolutely. More people talk to, less margin of error, more reliability. Fewer people, less reliable.