1/43
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Dot Plot
A simple graph where each observation is shown as a dot along a number line. Ex) number of rooms in 45 homes displayed with one dot per home.
What is statistics?
The art and science of learning from data
It deals with collection
classification
Left-skewed
Most values are large
Types of data collection
Interviews
3 main components of statistics
Study design: Planning how to obtain data to help answer the
questions of interest (Data Collection).
Description: Exploring and summarizing patterns in the data (Data
Analysis).
Inference: Making decisions and predictions based on data/known
evidence.
Types of statistics
Descriptive: Involves organizing
What is data?
Systematically recorded information such as numbers
Types of data
Tabular
Tabular Data
A collection of objects and their attributes. An object (record or observation) is what is described
Types of variables
Numeric and categorical
Numerical/Quantitative variable
A variable that records measurable numerical values with units. It represents amounts or degrees of something
Discrete (numerical) variable
A type of quantitative variable that takes on a finite or countably infinite set of values. Ex) number of students in a class.
Continuous (numerical) variable
A type of quantitative variable that can take on infinitely many values within a range. Example: age
Categorical/Qualitative variable
A variable that classifies observations into groups or categories
Nominal (categorical) variable
Categories that have no natural order. Ex) favorite color
Ordinal (categorical) variable
Categories that follow a logical order. Ex) drink sizes (small
Transforming numerical into categorical
A numerical variable can be grouped into ranges and treated as categories. Ex) age reported as 18–24
Population
The total group of individuals or objects you want to make conclusions about in a statistical study. Ex) all students at Columbia University.
Sample
A subset of the population used to draw conclusions
Parameter
A summary value calculated from a population. Ex) the average age of all students at Columbia.
Statistic
A summary value calculated from a sample. Ex) the average age of students in one statistics class.
Why can’t we usually observe an entire population?
Studying every individual is often impractical because it takes too long
A bad sample
A sample that is not representative of the population and lead to biased results. Ex) using income data from only Manhattan households to represent the entire U.S.
Sampling
A method that allows researchers to study a population by investigating a subset instead of every individual. Ex) estimating MLB player salaries by surveying part of the league.
Probability sampling
A method where every individual in the population has a known
Non-probability sampling
A method where not all individuals have a chance of being selected
Sampling Methods
Simple Random Sampling
Simple Random Sample (SRS)
A sample where every individual has the same chance of being chosen and every possible sample has the same chance of selection. Ex) giving each MLB player a number and drawing numbers at random.
Stratified Sampling
The population is divided into subgroups (strata) with similar characteristics
Cluster Sampling
The population is divided into clusters
Systematic Sampling
Individuals are chosen at regular intervals from an ordered list
Convenience Sampling
Participants are chosen based on availability or willingness
Sampling Frame
A complete list of all individuals or units in the population who are eligible to be selected in a sample. Ex) a university registrar’s list of enrolled students.
Bar Plot
A graph that shows the frequency of each category of a categorical variable using bars
Proportion (p-hat vs p)
The proportion of cases in a category is cases in category ÷ total cases. The sample proportion is written as p-hat
Contingency Table
A table that shows the frequency of cases for combinations of two categorical variables. Ex) class year (rows) by early class status (columns).
Segmented Bar Plot
A graph where each bar represents a group and is split into colored segments that show the distribution of a second categorical variable within that group. Ex) one bar for freshmen divided into early vs no early class segments.
Side-by-Side Bar Plot
A graph that compares groups by placing separate bars for each category of a second variable next to each other. Ex) separate bars for early vs no early class shown side by side within each class year.
Dot Plot
A simple graph where each observation is shown as a dot along a number line. Ex) number of rooms in 45 homes displayed with one dot per home.
Mosaic Plot
A graph that uses the area of rectangles to show relationships between two or more categorical variables
Histogram
A graph that groups numerical data into intervals of equal width and shows how many cases fall in each interval. Ex) living area of homes grouped into 250-sq-ft bins.
Symmetric/Bell-Shaped
Data is clustered in the middle with roughly equal smaller and larger values. Ex) heights of adults.
Symmetric but Not Bell-Shaped
Values on one side of the distribution mirror the other