Descriptive Data Analysis

Descriptive Data Analysis

Descriptive data analysis involves techniques and statistical methods to describe the characteristics and performance of a group.

  • Focuses on the overall sample.
  • Can be used to compare individual scores to the group.
  • Differs from inferential data analysis as it does not draw conclusions or infer training methods.

Types of Descriptive Statistics

  • Measures of Frequency: How often a score or test result occurs.
  • Measures of Distribution: The spread of the frequency.
  • Measures of Central Tendency: Points around which most scores are concentrated (mean, median, mode).
  • Measures of Variability: The spread of the data.

Measures of Central Tendency

Indicate the most common test scores or results.

Mode
  • Describes the most frequent score in the test.
  • Suitable for nominal data (data that cannot be ranked).
  • Example: Determining the most frequently achieved score in a 1RM test or analyzing simple knowledge tests.
Median
  • Describes the middle score of the data.
  • 50% of the data is below this value, and 50% is above.
  • Requires ordinal, interval, or ratio data.
  • Should not be used with nominal data.
Mean
  • Calculated as the sum of the scores divided by the number of scores.
  • Provides an average of the group's performance.
  • Formula: Mean=<em>i=1nx</em>inMean = \frac{\sum<em>{i=1}^{n} x</em>i}{n}, where xix_i represents the individual scores and nn is the number of scores.
  • Accuracy is affected by the distribution of the data; outliers can impact the mean's magnitude.

Data Distribution

Normal Distribution
  • Data is clustered around a central point (symmetrical histogram).
  • The mean is generally the most appropriate measure of central tendency.
Skewed Distributions
  • Negatively Skewed: Distribution is longer to the left.
  • Positively Skewed: Distribution has more scores towards the right.
  • The median should be used in these cases as it is not affected by the magnitude of scores.
Importance of Distribution
  • Most statistical measures (particularly those related to means) rely on normally distributed data.
  • Normal distribution is defined where most values fall within two standard deviations of the mean.
  • (95% of data falls within 2 standard deviations)(\approx 95\% \text{ of data falls within 2 standard deviations})
  • If data is not normally distributed, using the median and appropriate statistical models is recommended.

Measures of Variability

Describe the spread of the data.

Range
  • Calculated by finding the minimum and maximum values in the dataset.
  • Provides the least amount of information about the spread.
Interquartile Range
  • Gives a representative idea of the spread around the median.
  • Calculated using the 25th percentile below and above the median.
Standard Deviation
  • Describes the amount a score differs from the mean.
  • Indicates the accuracy of the mean value estimate.
  • Smaller standard deviation: athletes are closer to the mean.
  • Wider standard deviation: greater variability in the data.

Describing Individual Level Characteristics

Percentiles
  • Rank athletes' scores from worst to best.
  • Assign a percentile based on their rank within the data range.
  • Example: An athlete at the 60th percentile has 60% of the scores below them.
Standard Scores
  • Describe how far an individual's test scores are from the mean of the group.
  • Z Score:
    • A common standard score in strength and conditioning.
    • Formula: Z=XμσZ = \frac{X - \mu}{\sigma}, where XX is the athlete's test score, μ\mu is the mean score for the group, and σ\sigma is the standard deviation for the group.
    • Indicates how many standard deviations an athlete is away from the mean.
  • Standard Error of the Mean:
    • Describes the distance the population mean is likely to be away from the sample mean.
  • Confidence Intervals:
    • Allow assessment of the accuracy of the sample mean estimate using standard deviation or standard error of the mean.