Probability Functions for Continuous Random Variables and Special Distributions
Characterization of Continuous Random Variables
A continuous random variable (CRV) is defined as a variable capable of assuming infinitely many values within a given range. Unlike discrete variables, where data consists of distinct, countable points, CRV data represents measurements on a continuous scale. Standard examples include physical quantities such as pressure, height, mass, weight, density, volume, temperature, and distance.
A key conceptual distinction for CRVs is the probability of a single point. Consider a student waiting for a bus that arrives every minutes. While the variable (waiting time) can technically take the value of exactly minutes, the probability of an exact occurrence is mathematically zero (). Consequently, in continuous probability, we do not focus on the probability of a variable assuming a single particular value; instead, we determine the probability that the variable falls within a specific interval, such as .
The Probability Density Function (PDF)
The probability distribution of a CRV is defined by a density function denoted as . This function assigns probabilities to intervals of decimal numbers. Geometrically, the probability that assumes a value in the interval is equal to the area under the curve , bounded below by the -axis and on the sides by the vertical lines and .
For a function to qualify as a valid Probability Density Function (PDF), it must satisfy several rigorous conditions:
Non-negativity: for all values within the range of .
Continuity: The function must be piecewise continuous.
Normalization: The total area under the entire curve must equal , expressed as .
Interval Probability: The probability of being between and is calculated via integration: .
An important property to note is that for any CRV, the inclusion of endpoints does not change the probability. That is: P(a \le X \le b) = P(a < X \le b) = P(a \le X < b) = P(a < X < b).
Analytical Example: Flight Delay Distributions
Consider a scenario where represents the delay in hours of a PIA flight from the UK, modeled by the function: for
To verify if this is a valid PDF, we calculate the area under the function. Geometrically, at , , and at , . This forms a triangle. The area of this triangle is given by: Calculus verification confirms this:
Specific probability calculations using this PDF include:
Delay less than hours (P(X < 4)): Geometrically, this forms a trapezium with heights and .
Delay between and hours (): heights are and .
Delay more than hours (P(X > 6)): heights are and .
Advanced Density Function Examples
Consider the function for the interval . Integration is required to find specific probabilities:
For , the integral is evaluated from to :
For , the integral is evaluated from to :
For , the integral evaluates to .
In another example, find the constant for where . Applying the normalization property: Using this result, conditional probability can be assessed. For sets and , find . First, calculate : Then calculate :
The Exponential Distribution
The Exponential distribution is essential for modeling waiting times until specific events occur, assuming events happen independently at a constant average rate. Common applications include the time until a radioactive particle decays, time until hardware failure, or the arrival time of customers in a queue.
The PDF is defined as: for Where represents the rate parameter, derived from the mean as .
The Cumulative Distribution Function (CDF) is derived as: Calculations follow these patterns:
Right-tail probability: P(X > x) = e^{-\lambda x}
Interval probability:
If at a shopping mall, customers enter at an average of per minute (), then .
The probability a customer enters in less than one minute: P(X < 1) = 1 - e^{-0.5 \times 1} = 0.3935
The probability a customer enters after minutes: P(X > 2) = e^{-0.5 \times 2} = 0.3679
The Memoryless Property of Exponential Distributions
A unique characteristic of the exponential distribution is the memoryless property. It posits that the probability of an event occurring in the future is independent of how much time has already elapsed. For instance, if a machine part has already lasted years, the probability it lasts an additional years is the same as the probability a brand-new part lasts years: P(X \ge S + t | X > S) = P(X > t)
As an application, if repair equipment has an average repair rate of machines per hour (), the probability that a repair takes at least hours given it has already exceeded hours is calculated as: P(X \ge 10 | X > 9) = P(X > 1) = e^{-0.5 \times 1} = 0.6065
The Normal Probability Distribution
The Normal distribution is the most prominent continuous distribution, characterized by its symmetric bell-shaped curve centered around the mean (). High-density data clusters near the mean, and the probability decreases as values move further away. The tails of the curve are asymptotic, meaning they approach the -axis but never intersect it.
Key properties include:
Symmetry: The left and right sides are mirror images.
Distribution: Exactly of values lie above the mean and below.
Parameters: Defined by the mean () and variance ().
The PDF formula is:
Standardization allows us to transform any Normal distribution into the Standard Normal distribution (), where and . The transformation formula is:
Normal Distribution Applications and Problem Solving
Practical applications of the Normal distribution range from measuring human height to machine error rates.
Example: Soft Drink Machine Regulation A machine fills cups with an average of and a standard deviation of .
Fraction of cups with less than : Z = \frac{224 - 200}{15} = 1.6 \rightarrow P(Z < 1.6) = 0.9452
Probability between and : Z_1 = \frac{191-200}{15} = -0.6, Z_2 = \frac{209-200}{15} = 0.6 \rightarrow P(-0.6 < Z < 0.6) = P(0.6) - P(-0.6) = 0.7257 - 0.2743 = 0.4514
Overflow probability for cups (P(X > 230)): Z = \frac{230 - 200}{15} = 2 \rightarrow P(Z > 2) = 0.0228. In drinks, roughly cups will overflow.
Example: Manufacturing Bolts Bolts have a mean diameter of and . Defective bolts are those with diameter < 0.20 or > 0.28.
P(X < 0.20) = Z(-2.5) = 0.0062
P(X > 0.28) = Z(1.5) = 0.0669
Total defective probability = . Out of bolts, are defective.
Normal Approximation to the Binomial Distribution
The Binomial distribution can be approximated by the Normal distribution provided the sample size is large enough. The criteria for this are np > 5 and nq > 5. When using this approximation, a "continuity correction" is applied because we are using a continuous distribution to estimate discrete outcomes. This involves adjusting the discrete value by .
In a coin toss experiment ( times, ):
Mean:
Variance:
Probability of at least heads: using correction.
For a factory producing defective bolts, in a box of :
Mean: , Variance:
Probability of at most defective bolts: