Stats
What is Statistics?
Definition: Statistics is the science of
Collecting, organizing, analyzing, and interpreting data to make decisions or conclusions
It's not merely numbers but about what those numbers mean
Understanding Data
Data is a plural term; singular is datum
Definition: Data refers to collections of facts or information, such as
Profit margins of a company
Election voting statistics
Descriptive vs. Inferential Statistics
Descriptive Statistics:
Summarizes and visualizes observations using graphs, tables, and numerical summaries
Aim: Clarity
Inferential Statistics:
Generalizes from sample data to a broader population and quantifies uncertainty
Aim: Justification of conclusions
Key Statistical Terms
Population: The entire set of individuals or items being studied
Example: All Baylor first-year students this fall
Sample: A subset of the population that is observed
Parameter: An unknown number describing the population, e.g., true average battery life
Statistic: A number computed from a sample used to learn about the parameter
Application of Statistics: Case Study
Example of a clinic testing a flu prevention program:
18% from the usual care group got the flu compared to 12% from the new program group
Descriptively, the new program appears better; inferentially, investigate if the difference could be due to chance
Steps in Using Statistics
Start with a clear question, identify observational units
Decide how to collect data: survey, experiment, or database
Use descriptive statistics to visualize data
Make a reasoned argument based on inference
Communicate findings in context
Understanding Populations and Samples
Population vs. Target Population vs. Accessible Population:
Target population: Group cared about in research, e.g., all incoming first-year students
Accessible population: Portion of the target population realistically observable
Mismatches can lead to biased results
The Role of Samples
A sample must represent the population well for the statistic computed to reflect the parameter accurately
Type of sampling:
Census: Attempts to measure every unit, rare and complex
Sampling is more practical and enables learning from large populations
Parameters and Statistics
Parameter: Numerical summary describing the population, examples include mean income, support proportions, standard deviations
Parameters are fixed and usually unknown
Statistic: Numerical summary computed from a sample, can vary from sample to sample
Sampling Error
Sampling error reflects natural variability when observing a subset of the population
Major goals of statistics include measuring and communicating uncertainty
Simulation of Sample Means
Example where the true mean of population is 10, standard deviation is 2:
Distribution of sample means shows variation decreases with larger sample sizes (n=20, n=50, n=200)
Case and Variable Definitions
Case: Rows in data representing observational units
Variable: Columns describing characteristics recorded about each case
Variables are essential for clarity in data understanding
Types of Variables
Categorical Variables:
Assign cases to groups or categories, e.g., blood types
Subtypes:
Nominal (no ordering, e.g., eye color)
Ordinal (meaningful order, e.g., survey responses)
Binary (two categories)
Quantitative Variables:
Numerical values where arithmetic operations are meaningful, e.g., height, weight
Subtypes:
Discrete (counting values)
Continuous (measuring values)
Importance of Measurement Scales
Nominal, Ordinal, Interval, Ratio Scales:
Describe comparisons and operations meaningful for a variable
E.g., temperature on scales shows differences are meaningful but ratios may not be on the interval scale
Importance of Data Context
Identifiers and date/time variables must be clearly defined to avoid analytical mistakes
Identifier variables (e.g., ID numbers) should not be treated as quantities
The Research Process
Not just about running tests; it’s a disciplined iterative process
Clear articulation of the research question is essential
Workflow of Research Design
Step 1: Specify the problem clearly
Step 2: Design the study (observational or experimental)
Step 3: Collect quality data
Step 4: Perform exploratory data analysis
Step 5: Model and infer
Step 6: Quantify uncertainty
Step 7: Communicate findings
Step 8: Ensure reproducibility
Conclusion
The process is rarely linear; iterative study design and quality judgment are key to sound statistical research and conclusions.