Variable
A measure of a single characteristic that can vary.
Causes of Variations
Factors such as biologic differences, genes, nutrition, environmental exposures, age, sex, race, presence or absence of disease, and extent of disease that contribute to variations in medical data.
Measurement Error
Errors in measurement techniques that can lead to variations in results.
Quantitative Data
Data represented by numbers and measurements.
Qualitative Data
Data represented by words.
Types of Variables
Nominal, Dichotomous, Ordinal, Continuous, and Ratio variables.
Frequency Distributions
Tables showing the frequency of values in a variable.
Range of a Variable
The difference between the lowest and highest observations of a variable.
Parameters of a Frequency Distribution
Measures of Central Tendency (Mean, Median, Mode) and Measures of Dispersion (Mean Absolute Deviation, Variance, Standard Deviation).
Skewness
Horizontal stretching of a frequency distribution leading to longer tails on one side (left or right).
Kurtosis
Vertical stretching or flattening of a frequency distribution.
Variable
A measure of a single characteristic that can vary
Causes of Variations
Biologic differences
Presence or absence of disease and extent of disease
Different conditions of measurement
Different techniques of measurement
Measurement error
Biologic differences
Genes, Nutrition, Environmental, Exposures, Age, Sex, Race
Different conditions of measurement
Often account for the variations observed in medical data
Measurement error
Can also cause variation
Types of Errors
Systematic Error and Random Error
Systematic Error
Can distort data systematically in one direction.
Random Error
Does not introduce bias
Numbers and measurement
Generally use words
Nominal Variables
Naming or categoric variables that are not based on measurement scales or rank order.
Dichotomous (Binary) Variables
Variables with only two levels
Ordinal (Ranked) Variables
Data that can be characterized in terms of three or more qualitative values
Continous (Dimensional) Variables
Observation differs over time
Ratio Variables
If a continous scale has true 0 point
Frequency Distributions of Continuous Variable
Can be shown by creating a table that lists the values of the variable according to the frequency with which the value occurs.
Range of a variable
Range is the distance between the lowest and highest observations of the variable.
Real Frequency Distributions
Obtained from actual data or sample
Theoretical Frequency Distributions
Calculated using assumptions about the population from which the sample was obtained
Normal Distribution
Bell-shaped curve
Normal Distribution
Also called the Gaussian distribution after Johan Karl Gauss
Measures of Central Tendency
Mean
Median
Mode
Mean
Average value
Median
Middlemost or halfway value
Mode
Most frequent value
Mean Absolute Deviation
Does not have mathematical properties (as based form many statistical tests)
Variance
Standard Deviation
Square root of the variance
Standard Deviation
Used to describe the amount of spread in the frequency distribution
Standard Deviation
Average of deviations from the mean
Skewness
A horizontal stretching of a frequency distribution to one side or the other, so that one tail of observations is longer and has more observations than the other tail
Skewed to the left
When histogram or a frequency polygon has a longer tail on the left side of the diagram
Skewed to the left
Negatively skewed distribution
Skewed to the right
When histogram or a frequency polygon has a longer tail on the right side of the diagram
Skewed to the right
Positvely skewed distribution
Kurtosis
Characterized by a vertical stretching or flattening of the frequency distribution
Continous (Dimensional) Variables
Continous scales
Leptokurtic
Distribution with heavy tails.
Platykurtic
Distribution with light tails.
Mesokurtic
Distribution with moderate tails, similar to a normal distribution.
Graphs
It provide a visual way to understand the distribution and variation in the data.
Histogram
A bar graph that shows the frequency of data points within specified ranges (bins).
Box Plot (Box-and-Whisker Plot)
Displays the median, quartiles, and potential outliers. It helps visualize the spread and skewness of the data.
Dot Plot
Shows individual data points and their frequency.
Stem-and-Leaf Plot
Similar to a histogram but retains the original data values.
Density Plot
A smoothed version of the histogram, often used to estimate the probability density function of the data.
Five-Number Summary
Consists of the minimum, Q1, median, Q3, and maximum.
Summary Table
Includes mean, median, mode, range, variance, standard deviation, and other relevant statistics.
Outliers
Data points that significantly differ from the rest of the dataset.
Side-by-Side Box Plots
Useful for comparing the spread and central tendency of multiple groups.
Multiple Histograms
Placing histograms side by side or overlaying them for comparison.
Summary Statistics Comparison
Comparing means, medians, ranges, and standard deviations.