Introduction to Statistics

Introduction to Statistics

  • Definition: Statistics is the science of collecting, organizing, summarizing, and analyzing data to answer questions and draw conclusions.

Importance of Statistics

  • Reasons We Care About Statistics:
    • Explore the world around us.
    • Use evidence to check the validity of beliefs.
    • Identify patterns leading to discoveries.
    • Share findings with others.

Data Sources

  • Population:

    • Collection of all data values for a group.
    • Often difficult to obtain all information.
  • Sample:

    • A subset of the population that represents it at large.
    • Easier to gather data from a sample than the full population.
Example of Data Source:
  • Survey on Hair Color:
    • Sample: Random survey of 2500 people.
    • Population: All individuals in the country concerning hair color data.
    • Data Collected: Responses on hair color.

Understanding Populations and Parameters

  • Population: The group of objects or individuals being studied.
  • Parameter: A numerical value characterizing an aspect of the population.
    • Example: If data on heights of all NBA players is collected, the population is all NBA players, while the mean height calculated is the parameter.

Samples and Statistics

  • Sample: A collection of objects or individuals from the population of interest.
  • Statistic: A numerical characteristic of a sample; often called an estimator as it is used to estimate a population characteristic.
    • Example: For a sample size of 1500 American women's heights with a mean of 63.6 inches, the sample is those 1500 women and the statistic is the mean height of 63.6 inches.

Statistical Inference

  • Definition: The process of drawing conclusions about a population based on characteristics observed from a sample.
  • Key Element: There is inherent uncertainty as not the entire population is measured.
  • Uncertainty Measurement: Important to include when making inferences.

Two Types of Statistics

  • Descriptive Statistics:

    • Organizes, summarizes, and displays data.
    • Describes results of a sample without making generalizations about the entire population.
  • Inferential Statistics:

    • Uses results from a sample to make conclusions about the entire population.
    • Measures the reliability of the results from the sample.
Example of Inferential Statistics:
  • Pew Research Center Report:
    • 37% of 2002 surveyed adults believed GMOs are safe.
    • Identify population (all Americans), sample (2002 surveyed adults), parameter of interest (proportion of Americans who believe GMOs are safe), and statistic (37%).

Statistics vs. Parameters

  • Statistics: Measurable and knowable through data collection.
  • Parameters: Typically unknown; can be estimated with statistics, leading to some uncertainty.

Classifying Data

  • Variable: Characteristics of assets, objects, or people.
  • Types of Variables:
    • Categorical (Qualitative):
    • Describes a quality; may include numeric values but no arithmetic operations.
    • Examples: Hair color, zip code, letter grades.
    • Numerical (Quantitative):
    • Describes a quantity or measurement.
    • Examples: Height, temperature.

Examples of Classifying Data

  • Example Variables:
    • Height of a bridge -> Numerical
    • GPA -> Numerical
    • Letter grade -> Categorical
    • Type of pets owned -> Categorical
    • Flower varieties planted -> Categorical

Two Types of Numerical Variables

  • Discrete Variables: Countable or listable values.

    • Examples: Number of siblings.
  • Continuous Variables: Cannot be listed or counted; they occur over a range of values.

    • Examples: Height, weight.

Identifying Discrete vs. Continuous Variables

  • Examples:
    • Number of cars owned (Discrete)
    • Time to commute (Continuous)
    • Height of a building (Continuous)

Presentation of Data

  • Frequency Tables:

    • Organizes data values and their counts.
    • Helps to see data distribution clearly.
  • Relative Frequency Distributions:

    • Proportion of observations within a category.
    • Formulated as: extrelativefrequency=frequencysum of all frequenciesext{relative frequency} = \frac{\text{frequency}}{\text{sum of all frequencies}}

Summarizing Distributions

  • Examination Process:
    • Visualize Data: Use effective graphs such as histograms or stemplots.
    • Summarize Data: Assess shape, center (most common value), and spread (variability of data).