STATS
Variables
- Definition:
- A variable is any characteristic whose value may change from one individual or object to another.
- Examples include:
- Data Definition:
- Data is a collection of observations on one or more variables.
Types of Datasets
Univariate Dataset
- Definition:
- A dataset that consists of observations made on a single variable.
- "Uni" means single or one.
- Types:
- Categorical Dataset:
- Variables are categorized into distinct groups.
- Examples:
- Hair color
- Calculator brand
- Opinions
- Numerical Dataset:
- Also called quantitative data.
- Examples:
- Height
- Weight
- Number of textbooks purchased
- Distance to college
- Test scores
Definitions Recap
- Categorical = Qualitative
- Numerical = Quantitative
Examples of Univariate Dataset
- UCLA Survey Example:
- The Higher Education Research Institute at UCLA surveys over 20,000 college seniors annually.
- Question asked to seniors: "If you could make your college choice over, would you still choose to enroll?"
- Response categories:
- Definitely yes (d y)
- Probably yes (p y)
- Probably no (p n)
- Definitely no (d n)
- Example Dataset from 20 students: responses indicating college choice preference.
Bivariate and Multivariate Datasets
Bivariate Dataset
- Definition:
- A dataset that consists of observations made on two variables.
- "Bi" means two.
- Example of Bivariate Dataset:
- Football player position (categorical) vs. player weight (numerical).
- Class (categorical) vs. grade (numerical).
Multivariate Dataset
- Definition:
- A dataset with two or more variables.
- Example:
- Height, weight, pulse rate, and blood pressure of individuals in a basketball team.
Types of Numerical Data
Types of Numerical Data
- Discrete Numerical Data:
- Definition:
- Countable values.
- Examples:
- Number of textbooks purchased.
- Number of siblings.
- Most values are whole numbers.
- Continuous Numerical Data:
- Definition:
- Measurable values.
- Examples:
- Height
- Weight
- Time taken for activities
- Visual Example:
- Number line illustration where:
- Discrete: Points on a number line.
- Continuous: Intervals on a number line.
Discrete vs Continuous Variables
Discrete Variables
- Mostly whole numbers but can involve decimal values in certain contexts.
- Example of acceptable decimal: number of credits (1.5 credits).
Continuous Variables
- Can take any value within a range.
- Example:
- Time for the first kernel of microwave popcorn to pop (e.g., 19.2 seconds).
- These variables can be represented on a number line indicating a range of values.
Frequency Distributions and Bar Charts
Frequency Distribution
- Definition:
- A table showing possible categories in categorical data along with their frequencies.
- Relative Frequency: Calculated as frequency divided by total observations.
- Example:
- Motorcyclists observed in NY regarding compliant and non-compliant helmet use:
- No helmet: 104
- Non-compliant helmet: 138
- Compliant helmet: 586
- Total: 828 motorcyclists.
- Relative frequencies calculated for each category, adding to approximately 1 due to rounding.
Bar Charts
- Purpose:
- Provide a graphical representation of frequency distributions for categorical data.
- Construction:
- Categories on horizontal axis.
- Frequencies on vertical axis.
- Example:
- Using the helmet observation data to display the number of riders in each helmet category as bars.
Dot Plots for Numerical Data
Purpose:
- To visualize:
- Typical values.
- Variability of data points.
- Presence of unusual values in a dataset.
Example:
- Graduation rates of basketball players vs. all student athletes:
- Dot plot showing:
- 100% graduation rates for basketball players (11 schools).
- Variation analysis: usual and unusual values noted.
- Shape Analysis:
- Skewed distributions noted (left or symmetric).
- Observations for basketball players show greater variation compared to all athletes.
Conclusion
- Dot plots are effective for spotting typical and unusual values, as well as understanding data variability and shape.