Quantitative Methods in Health Sciences: Descriptive Statistics and Continuous Variables
Data Classification and Statistics
Definitions and Scope: * Data: Consists of information derived from observations, counts, measurements, or responses. * Statistics: The science focused on the collection, organization, analysis, and interpretation of data for the purpose of informed decision-making.
Types of Data: * Qualitative Data: Consists of attributes, labels, or non-numerical entries. Categorized as: * Nominal: Labels/names without mathematical order. * Ordinal: Can be arranged in a specific order or rank. * Quantitative Data: Consists of numerical measurements or counts. Categorized as: * Interval: Differences between data are meaningful, but there is no true zero point. * Ratio: Differences are meaningful and there is an inherent zero point.
Variable Classifications: * Discrete Variable: A quantitative variable that results from countable numbers (whole values). * Continuous Variable: A quantitative variable that is measurable and can take on decimal values. * Qualitative (Categorical) Variable: Represents categories such as gender, ethnicity, or group membership.
Example: GPA Data Identification: * Data: Sally (3.22), Bob (3.98), Cindy (2.75), Mark (2.24), Kathy (3.84). * The names (Sally, Bob, etc.) represent Qualitative data. * The Grade Point Average (GPA) values represent Quantitative data.
Branches of Statistics
Descriptive Statistics: * Involves the organization, summarization, and visual display of data. * The primary goal is to turn raw data into accessible information.
Inferential Statistics: * Involves using a sample to draw conclusions about a larger population.
Practical Example: Sleep Study: * Study Detail: Volunteers with less than of sleep were four times more likely to answer incorrectly on a science test compared to participants with at least of sleep. * Descriptive Part: The statement "four times more likely to answer incorrectly" describes the sample data directly. * Inferential Conclusion: Drawing the inference that all individuals sleeping less than are more likely to answer science questions incorrectly than those sleeping at least .
The Role of Statistics in Experimentation (Three-Step Process): * Step 1: Experimentation: Comparing two teaching methods (Method A and Method B) applied to a population of first-grade children. Results in Test Scores for students in two samples. * Sample A Results: , , , , , , , , , , , , , , , , , , , , . * Sample B Results: , , , , , , , , , , , , , , , , , , , , , . * Step 2: Descriptive Statistics: Organizing and simplifying the data from Sample A and Sample B. * Sample A Average Score = . * Sample B Average Score = . * Step 3: Inferential Statistics: Interpreting results. The sample data show a 5-point difference. Researchers must decide between two interpretations: 1. There is actually no difference, and the result is due to chance (sampling error). 2. There is a real difference between the methods, accurately reflected by the data.
Measures of Central Tendency (Measures of Location)
Overview of Central Tendency: * Represents a typical or central entry in a data set. * If a distribution is perfectly "Normal" (bell curve), the Mean, Median, and Mode are identical.
The Mean (Arithmetic Average): * Calculated by the sum of entries divided by the number of entries ( or ). * Population Mean (mu): * * Sample Mean (x-bar): * * Characteristic: It is the most common measure but is highly sensitive to outliers (extreme values). * Example (Effect of Outliers): * Set 1 (): * Set 2 ():
The Median: * The numerical value in the exact middle of an ordered data set ( above, below). * Characteristic: It is not affected by outliers. * Determining Position: * * If is odd: The median is the single middle number. * If is even: The median is the average of the two middle numbers. * Example (Odd Set): Data: . Median = . * Example (Even Set): Data: . Median = .
The Mode: * The data entry occurring with the greatest frequency. * If no entry repeats, there is no mode. If multiple entries repeat equally, it can be bimodal or multimodal. * Example: Ages . Mode = .
Which measure is "Best"? * Mean: General standard, unless outliers exist. * Median: Best when extreme values are present (e.g., house prices in Ottawa).
Shapes of Distributions
Symmetric Distribution: * A vertical line drawn through the middle creates mirror-image halves. *
Uniform (Rectangular) Distribution: * All entries/classes have equal frequencies. This is also a type of symmetric distribution.
Skewed Left (Negatively Skewed): * The "tail" extends to the left. * \text{Mean} < \text{Median} * Example: Mode/Median = , Mean = .
Skewed Right (Positively Skewed): * The "tail" extends to the right. * \text{Mean} > \text{Median} * Example: Mode/Median = , Mean = (driven up by a outlier).
Measures of Variation (Measures of Dispersion)
Range: * * Disadvantages: Ignores data distribution; highly sensitive to outliers. * Example: Stock prices to . Range = .
Deviation: * The difference between an entry and the mean . * * The sum of deviations is always equal to .
Variance and Standard Deviation: * Population Variance (sigma squared): * * Population Standard Deviation (sigma): * * Sample Variance (): * * Sample Standard Deviation (): *
Degrees of Freedom (): * The number of values free to vary after using data to estimate a parameter (like the mean). * Example: If Mean = for , and , , then must be (it is not free to vary).
Coefficient of Variation (CV): * Measures relative variation as a percentage. * * Utility: Allows comparison of variation between datasets with different units or different means. * Comparison Example: * Stock A: Average = , . * Stock B: Average = , . * Result: Stock B is less variable relative to its price.
Measures of Position
Quartiles: * Divide an ordered data set into four equal parts. * (First Quartile): Median of the lower half (lower ). * (Second Quartile): The median of the whole data set (middle ). * (Third Quartile): Median of the upper half (upper ).
Interquartile Range (IQR): * * Represents the range of the middle of the data set.
Box-and-Whisker Plot: * Tool for highlighting data features using the Five-Number Summary: 1. Minimum entry 2. 3. (Median) 4. 5. Maximum entry
Outlier Detection (Rule of Thumb): * An entry is a potential outlier if it falls outside the following bounds: * Lower Bound: * Upper Bound: * Example Data (Outlier Check): * Set 2: * , , . * * * Lower Limit: * Upper Limit: * Entries are potential outliers.