Introduction to Statistics
Introduction to Statistics
- Definition: Statistics is the science of collecting, organizing, summarizing, and analyzing data to answer questions and draw conclusions.
Importance of Statistics
- Reasons We Care About Statistics:
- Explore the world around us.
- Use evidence to check the validity of beliefs.
- Identify patterns leading to discoveries.
- Share findings with others.
Data Sources
Population:
- Collection of all data values for a group.
- Often difficult to obtain all information.
Sample:
- A subset of the population that represents it at large.
- Easier to gather data from a sample than the full population.
Example of Data Source:
- Survey on Hair Color:
- Sample: Random survey of 2500 people.
- Population: All individuals in the country concerning hair color data.
- Data Collected: Responses on hair color.
Understanding Populations and Parameters
- Population: The group of objects or individuals being studied.
- Parameter: A numerical value characterizing an aspect of the population.
- Example: If data on heights of all NBA players is collected, the population is all NBA players, while the mean height calculated is the parameter.
Samples and Statistics
- Sample: A collection of objects or individuals from the population of interest.
- Statistic: A numerical characteristic of a sample; often called an estimator as it is used to estimate a population characteristic.
- Example: For a sample size of 1500 American women's heights with a mean of 63.6 inches, the sample is those 1500 women and the statistic is the mean height of 63.6 inches.
Statistical Inference
- Definition: The process of drawing conclusions about a population based on characteristics observed from a sample.
- Key Element: There is inherent uncertainty as not the entire population is measured.
- Uncertainty Measurement: Important to include when making inferences.
Two Types of Statistics
Descriptive Statistics:
- Organizes, summarizes, and displays data.
- Describes results of a sample without making generalizations about the entire population.
Inferential Statistics:
- Uses results from a sample to make conclusions about the entire population.
- Measures the reliability of the results from the sample.
Example of Inferential Statistics:
- Pew Research Center Report:
- 37% of 2002 surveyed adults believed GMOs are safe.
- Identify population (all Americans), sample (2002 surveyed adults), parameter of interest (proportion of Americans who believe GMOs are safe), and statistic (37%).
Statistics vs. Parameters
- Statistics: Measurable and knowable through data collection.
- Parameters: Typically unknown; can be estimated with statistics, leading to some uncertainty.
Classifying Data
- Variable: Characteristics of assets, objects, or people.
- Types of Variables:
- Categorical (Qualitative):
- Describes a quality; may include numeric values but no arithmetic operations.
- Examples: Hair color, zip code, letter grades.
- Numerical (Quantitative):
- Describes a quantity or measurement.
- Examples: Height, temperature.
Examples of Classifying Data
- Example Variables:
- Height of a bridge -> Numerical
- GPA -> Numerical
- Letter grade -> Categorical
- Type of pets owned -> Categorical
- Flower varieties planted -> Categorical
Two Types of Numerical Variables
Discrete Variables: Countable or listable values.
- Examples: Number of siblings.
Continuous Variables: Cannot be listed or counted; they occur over a range of values.
- Examples: Height, weight.
Identifying Discrete vs. Continuous Variables
- Examples:
- Number of cars owned (Discrete)
- Time to commute (Continuous)
- Height of a building (Continuous)
Presentation of Data
Frequency Tables:
- Organizes data values and their counts.
- Helps to see data distribution clearly.
Relative Frequency Distributions:
- Proportion of observations within a category.
- Formulated as:
Summarizing Distributions
- Examination Process:
- Visualize Data: Use effective graphs such as histograms or stemplots.
- Summarize Data: Assess shape, center (most common value), and spread (variability of data).