STATS

Definition:
- A variable is any characteristic whose value may change from one individual or object to another.
- Examples include:
  - Hair color
  - Height
  - Opinions
Data Definition:
- Data is a collection of observations on one or more variables.

Definition:
- A dataset that consists of observations made on a single variable.
- "Uni" means single or one.
Types:
- Categorical Dataset:
- Variables are categorized into distinct groups.
- Examples:
  - Hair color
  - Calculator brand
  - Opinions
- Numerical Dataset:
- Also called quantitative data.
- Examples:
  - Height
  - Weight
  - Number of textbooks purchased
  - Distance to college
  - Test scores

UCLA Survey Example:
- The Higher Education Research Institute at UCLA surveys over 20,000 college seniors annually.
- Question asked to seniors: "If you could make your college choice over, would you still choose to enroll?"
- Response categories:
- Definitely yes (d y)
- Probably yes (p y)
- Probably no (p n)
- Definitely no (d n)
- Example Dataset from 20 students: responses indicating college choice preference.

Definition:
- A dataset that consists of observations made on two variables.
- "Bi" means two.
Example of Bivariate Dataset:
- Football player position (categorical) vs. player weight (numerical).
- Class (categorical) vs. grade (numerical).

Definition:
- A dataset with two or more variables.
Example:
- Height, weight, pulse rate, and blood pressure of individuals in a basketball team.

Discrete Numerical Data:
- Definition:
- Countable values.
- Examples:
- Number of textbooks purchased.
- Number of siblings.
- Most values are whole numbers.
Continuous Numerical Data:
- Definition:
- Measurable values.
- Examples:
- Height
- Weight
- Time taken for activities
Visual Example:
- Number line illustration where:
- Discrete: Points on a number line.
- Continuous: Intervals on a number line.

Can take any value within a range.
Example:
- Time for the first kernel of microwave popcorn to pop (e.g., 19.2 seconds).
These variables can be represented on a number line indicating a range of values.

Definition:
- A table showing possible categories in categorical data along with their frequencies.
- Relative Frequency: Calculated as frequency divided by total observations.
Example:
- Motorcyclists observed in NY regarding compliant and non-compliant helmet use:
- No helmet: 104
- Non-compliant helmet: 138
- Compliant helmet: 586
- Total: 828 motorcyclists.
Relative frequencies calculated for each category, adding to approximately 1 due to rounding.

Purpose:
- Provide a graphical representation of frequency distributions for categorical data.
Construction:
- Categories on horizontal axis.
- Frequencies on vertical axis.
Example:
- Using the helmet observation data to display the number of riders in each helmet category as bars.

To visualize:
- Typical values.
- Variability of data points.
- Presence of unusual values in a dataset.

Graduation rates of basketball players vs. all student athletes:
- Dot plot showing:
- 100% graduation rates for basketball players (11 schools).
- Variation analysis: usual and unusual values noted.
Shape Analysis:
- Skewed distributions noted (left or symmetric).
- Observations for basketball players show greater variation compared to all athletes.

Dot plots are effective for spotting typical and unusual values, as well as understanding data variability and shape.