Probability Distribution
The binomial distribution is a statistical model for scenarios where there are two potential outcomes, commonly referred to as 'success' and 'failure'. This distribution is particularly useful in experiments that can be repeated a fixed number of times. The formula for calculating the probability of achieving exactly r successes in n trials is:
[ P(X = r) = C(n, r) (p)^{r} (q)^{n-r} ]
where C(n, r) represents the number of combinations of n trials selected r at a time, p signifies the probability of success, and q is the probability of failure, defined as ( q = 1 - p ).
Example 1: Suppose you roll a dice 12 times (n = 12) and want to find the probability of rolling a 3 exactly 5 times. In this case, the probability of success (rolling a 3) on any individual trial is ( p = rac{1}{6} ), while ( q = rac{5}{6} ). You would compute:
[ P(X = 5) = C(12, 5) \left(\frac{1}{6}\right)^{5} \left(\frac{5}{6}\right)^{7} ]
For any given experiment with n trials, the expected frequency for each outcome can be calculated using the formula:
[ \text{Expected Frequency} = N \cdot P(X = r) ]
where N is the total number of trials.
Example 2: If you conduct a survey of 200 people where you expect 30% (0.3) to prefer tea over coffee, you would calculate the expected frequency for tea as:
[ \text{Expected Frequency} = 200 \cdot 0.3 = 60 ]
For r = 0 to 4 successes:
For r = 0: If the probability for 0 successes is 0.1, then Expected Frequency = ( 200 \cdot 0.1 = 20 )
For r = 1: If the probability for 1 success is 0.15, then Expected Frequency = ( 200 \cdot 0.15 = 30 )
For r = 2: If the probability for 2 successes is 0.2, then Expected Frequency = ( 200 \cdot 0.2 = 40 )
For r = 3: If the probability for 3 successes is 0.25, then Expected Frequency = ( 200 \cdot 0.25 = 50 )
For r = 4: If the probability for 4 successes is 0.2, then Expected Frequency = ( 200 \cdot 0.2 = 40 )
Fitting a binomial distribution involves matching observed data to the expected frequencies using the binomial formula. For example, if data from an experiment yielded the following:
Values of X: 0, 1, 2, 3, 4
Frequencies (f): 1, 32, 41, 24, 5
We can calculate the total number of trials as 103 (1 + 32 + 41 + 24 + 5). This data can be analyzed to observe if it fits the expected binomial distribution from a theoretical model.
In a binomial distribution, the mean (( \mu )) is calculated as: [ \mu = np ]
and the variance (( \sigma^2 )) is given by: [ \sigma^2 = npq ]
The standard deviation is determined using: [ \sigma = \sqrt{npq} ]
Example 3: Considering 100 trials and a success probability of 0.6:
Calculating the Mean: ( \mu = 100 \cdot 0.6 = 60 )
Variance: ( \sigma^2 = 100 \cdot 0.6 \cdot 0.4 = 24 )
Standard Deviation: ( \sigma = \sqrt{24} \approx 4.9 )
The Poisson distribution can be viewed as a limiting case of the binomial distribution occurring when the number of trials (n) tends to infinity and the probability of success (p) decreases towards zero, while keeping the product (np) constant. This model is suitable for counting the number of events in a fixed period or space.
Common Applications:
Calls to a Call Center: You might use it to assess how many calls are received per hour.
Defects in Production Parts: Factories can apply it to model the number of defective items in a batch.
Accidents in a Time Frame: It can be applied to predict the number of traffic accidents in a given area over a month.
Discrete Distribution: Only takes non-negative integer values.
Single Parameter: The distribution is fully described by a single parameter m (mean number of events).
The probability mass function for the Poisson distribution is defined as: [ P(X = r) = \frac{e^{-m} m^r}{r!} ]
where ( r ) is the number of occurrences and ( e ) is Euler's number (approximately 2.71828).
Example 4: If a factory experiences an average of 2 defects per hour, to find the probability of having exactly 3 defects in one hour: [ P(X = 3) = \frac{e^{-2} 2^{3}}{3!} \approx 0.1804 ]
The normal (Gaussian) distribution, notable for its bell-shaped curve, is vital in statistics and is characterized by its parameters including the mean (( \mu )) and standard deviation (( \sigma )).
Symmetry: The distribution is symmetrical about its mean.
Central Measures: Mean = Median = Mode.
Area Under Curve: The total area of the curve sums to one, implying all probabilities add up to 100%.
Unimodal: There is a single peak or mode, indicating one predominant value around which data tends to cluster.
Normal distributions are instrumental in approximating various real-world scenarios and provide the basis for many statistical methodologies and inferential techniques. For instance, they allow statisticians to create confidence intervals and conduct hypothesis testing effectively, making them essential in research and analytics.