STATS

Variables

  • Definition:
    • A variable is any characteristic whose value may change from one individual or object to another.
    • Examples include:
      • Hair color
      • Height
      • Opinions
  • Data Definition:
    • Data is a collection of observations on one or more variables.

Types of Datasets

Univariate Dataset

  • Definition:
    • A dataset that consists of observations made on a single variable.
    • "Uni" means single or one.
  • Types:
    • Categorical Dataset:
    • Variables are categorized into distinct groups.
    • Examples:
      • Hair color
      • Calculator brand
      • Opinions
    • Numerical Dataset:
    • Also called quantitative data.
    • Examples:
      • Height
      • Weight
      • Number of textbooks purchased
      • Distance to college
      • Test scores

Definitions Recap

  • Categorical = Qualitative
  • Numerical = Quantitative

Examples of Univariate Dataset

  • UCLA Survey Example:
    • The Higher Education Research Institute at UCLA surveys over 20,000 college seniors annually.
    • Question asked to seniors: "If you could make your college choice over, would you still choose to enroll?"
    • Response categories:
    • Definitely yes (d y)
    • Probably yes (p y)
    • Probably no (p n)
    • Definitely no (d n)
    • Example Dataset from 20 students: responses indicating college choice preference.

Bivariate and Multivariate Datasets

Bivariate Dataset

  • Definition:
    • A dataset that consists of observations made on two variables.
    • "Bi" means two.
  • Example of Bivariate Dataset:
    • Football player position (categorical) vs. player weight (numerical).
    • Class (categorical) vs. grade (numerical).

Multivariate Dataset

  • Definition:
    • A dataset with two or more variables.
  • Example:
    • Height, weight, pulse rate, and blood pressure of individuals in a basketball team.

Types of Numerical Data

Types of Numerical Data

  • Discrete Numerical Data:
    • Definition:
    • Countable values.
    • Examples:
    • Number of textbooks purchased.
    • Number of siblings.
    • Most values are whole numbers.
  • Continuous Numerical Data:
    • Definition:
    • Measurable values.
    • Examples:
    • Height
    • Weight
    • Time taken for activities
  • Visual Example:
    • Number line illustration where:
    • Discrete: Points on a number line.
    • Continuous: Intervals on a number line.

Discrete vs Continuous Variables

Discrete Variables

  • Mostly whole numbers but can involve decimal values in certain contexts.
  • Example of acceptable decimal: number of credits (1.5 credits).

Continuous Variables

  • Can take any value within a range.
  • Example:
    • Time for the first kernel of microwave popcorn to pop (e.g., 19.2 seconds).
  • These variables can be represented on a number line indicating a range of values.

Frequency Distributions and Bar Charts

Frequency Distribution

  • Definition:
    • A table showing possible categories in categorical data along with their frequencies.
    • Relative Frequency: Calculated as frequency divided by total observations.
  • Example:
    • Motorcyclists observed in NY regarding compliant and non-compliant helmet use:
    • No helmet: 104
    • Non-compliant helmet: 138
    • Compliant helmet: 586
    • Total: 828 motorcyclists.
  • Relative frequencies calculated for each category, adding to approximately 1 due to rounding.

Bar Charts

  • Purpose:
    • Provide a graphical representation of frequency distributions for categorical data.
  • Construction:
    • Categories on horizontal axis.
    • Frequencies on vertical axis.
  • Example:
    • Using the helmet observation data to display the number of riders in each helmet category as bars.

Dot Plots for Numerical Data

Purpose:

  • To visualize:
    • Typical values.
    • Variability of data points.
    • Presence of unusual values in a dataset.

Example:

  • Graduation rates of basketball players vs. all student athletes:
    • Dot plot showing:
    • 100% graduation rates for basketball players (11 schools).
    • Variation analysis: usual and unusual values noted.
  • Shape Analysis:
    • Skewed distributions noted (left or symmetric).
    • Observations for basketball players show greater variation compared to all athletes.

Conclusion

  • Dot plots are effective for spotting typical and unusual values, as well as understanding data variability and shape.