statistics

2.1 Measures of Central Tendency

2.1.1 Introduction

  • Average: A central value of a statistical series that describes characteristics of a distribution.

  • Measures of Central Tendency: Main measures include:

    1. Arithmetic Mean

    2. Geometric Mean

    3. Harmonic Mean

    4. Median

    5. Mode

2.1.2 Arithmetic Mean

  • Definition: The sum of values divided by the number of items.

  • Calculation Methods:

    • Simple Arithmetic Mean for Ungrouped Data:

      • Direct Method: ( \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} )

    • Grouped Data:

      • Direct Method: ( \bar{x} = \frac{\sum{f_ix_i}}{\sum f_i} )

2.1.3 Geometric Mean

  • Formula: ( G.M. = (x_1 \times x_2 \times ... \times x_n)^{1/n} )

  • Frequency Distribution: For frequency weighted distribution ( G.M. = (x_1^{f_1} \times x_2^{f_2} \times ... \times x_n^{f_n})^{1/N} ) where ( N = \sum f )

2.1.4 Harmonic Mean

  • Definition: ( H.M. = \frac{n}{\sum \frac{1}{x_i}} )

  • For Frequency Distribution: ( H.M. = \frac{N}{\sum \frac{f_i}{x_i}} )

2.1.5 Median

  • Definition: The middle value that separates the higher half from the lower half of the data.

  • Calculation:

    • Individual Series:

      • If ( n ) is odd: ( Median = value_{(n+1)/2} )

      • If ( n ) is even: ( Median = \frac{value_{n/2} + value_{(n/2)+1}}{2} )

    • Continuous Series: ( Median = L + \frac{(N/2 - C)}{f} \cdot i )

2.1.6 Mode

  • Definition: The value that appears most frequently in a data set.

  • Formula for Continuous Series: ( Mode = L + \frac{(f_1 - f_0)}{(2f_1 - f_0 - f_2)} \cdot i )

2.1.7 Pie Chart

  • Description: A circular graph divided into slices, representing data proportions.

  • Central Angle Calculation: For a slice representing component ( n ), the angle = ( \frac{n/N \times 360}{1} )

2.1.8 Measure of Dispersion

  • Definition: Indicates how spread out the values are around the mean.

  • Measures Include:

    1. Range: Difference between highest and lowest values, ( Range = X_{max} - X_{min} )

    2. Variance: Average of squared deviations from the mean.

    3. Standard Deviation (S.D.): Square root of variance.

2.1.9 Variance

  • Definition: Measure of the dispersion of a set of values.

  • Formula:

    • For grouped data: ( \text{Variance} = \frac{\sum{f(x_i - \bar{x})^2}}{N} )

2.1.10 Skewness

  • Definition: Measure of the asymmetry of the distribution around its mean.

  • Types:

    • Negative Skew: Mean < Median < Mode

    • Positive Skew: Mean > Median > Mode

  • Formula: ( Skewness = \frac{E(X - \mu)^3}{\sigma^3} )

2.2 Correlation & Regression

2.2.1 Introduction

  • Definition: Correlation measures the degree to which two variables are related.

  • Types of Distributions:

    1. Univariate: One variable.

    2. Bivariate: Two related variables.

2.2.2 Covariance

  • Definition: Measure of how two random variables change together.

  • Formula: [ Cov(X,Y) = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{n} ]

2.2.3 Correlation

  • Types:

    • Perfect, Positive, Negative.

  • Karl Pearson’s Correlation Coefficient: [ r = \frac{Cov(X,Y)}{\sigma_X \sigma_Y} ]

2.2.4 Rank Correlation

  • Definition: Measures the correlation between the rankings of two variables.

  • Spearman's Rank Correlation Formula: [ r = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} ]

2.2.5 Linear Regression

  • Definition: Predictive modeling technique to model the relationship between two variables.

  • Regression Equations:

    • For ( y ) on ( x ): [ y - \bar{y} = b_xy(x - \bar{x}) ]

    • For ( x ) on ( y ): [ x - \bar{x} = b_{yx}(y - \bar{y}) ]

2.2.6 Important Points about Regression Coefficients

  • Regression coefficients are used to describe the strength of relationship between variables.

  • The product of gradients of the regression lines equals the square of the correlation coefficient.

2.2.9 Standard Error and Probable Error

  • Standard Error of Prediction: The deviation of predicted values from the observed values.

  • Relation to Correlation Coefficient:

    • [ S.E. = \frac{r \sigma}{\sqrt{n}} ]

    • Probable Error: Indicates the reliability of the correlation coefficient.

robot