Data Analysis
Chapter 1: What is Statistics
/
Why Study Statistics
Data is collected everywhere and requires statistical knowledge to make this information useful.
Statistics is used to make valid comparisons and predict outcomes.
Definition of Statistics
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.
Theories of Statistics
There are two main branches of statistics:
Descriptive Statistics: Summarizes and describes the features of a data set.
Inferential Statistics: Used to estimate properties of a population and draw inferences about it.
Key Concepts
Population: The entire set of individuals or objects of interest, or the measurement obtained from all individuals or objects of interest.
Sample: A portion or part of the population of interest.
Variable: A characteristic or attribute that can take different values across different observations in a data set.
Observation: A single data point or record in a data set that represents all the variables for a particular instance.
Time Series Data: A data set that tracks the same variables over a period of time at regular intervals.
Types of Variables
Qualitative Variables: Non-numeric characteristics or attributes recorded through observation.
Nominal Variables: Categories with no inherent order (can only be classified or counted).
Ordinal Variables: Categories with a meaningful order; the difference between values is not consistent (e.g., classification: freshman, sophomore).
Quantitative Variables: Numeric characteristics that can be measured.
Discrete Variables: Result from counting; values have gaps between them (e.g., number of students).
Continuous Variables: Usually result from measuring something; can assume any value within a specific range (e.g., height or weight).
Types of Variables
Discrete Variables
Discrete (Interval): Numerical, can take specific values typically resulting from counting.
Continuous Variables
Continuous (Ratio): Can take any value within a given range, typically resulting from measurement.
Categorical Variables
Nominal: Unordered categories that are mutually exclusive (e.g., colors).
Ordinal: Ordered categories that are mutually exclusive.
Measures of location is a value used to describe be central tendency of a set of data
Common measures of location
mean
median
mode