PSY201: Introduction to Quantitative Research in Psychology LEC 1

Instructor and Course Overview

Course: PSY201: Introduction to Quantitative Research in Psychology (Lecture: Mondays 9–11, Tutorials: Tue/Wed)
Instructor: Prof. Keisuke Fukuda (Office hour: Mondays 11–12 @ CCT4067)
Lecture times and locations: In person lectures in CC 1080; Tutorials in various rooms (CC 2160, CC 2140, CC 1080 occasionally)
Make sure to attend the practical you registered for!

Syllabus and Course Structure

Section codes and weekly schedules:
- LEC0101: Monday, 9:00–11:00, In Person: CC 1080
- PRA0101–PRA0112: various Tuesday/Wednesday slots in CC 2160, CC 2140, CC 2140, CC 2160
Attendance and registration alignment:
- Attend the practical you registered for; switch not advised without updating registration

Textbook Options and Purchase

Textbook options:
- Print ($199.95): Introduction to Statistics and Data Analysis, 7th Edition, Roxy Peck/Chris Olsen, ISBN: 9798214000008
- eBook ($76.95): Introduction to Statistics and Data Analysis, 7th Edition, Roxy Peck/Chris Olsen, ISBN: 9798214000152
Purchase through UTM bookstore strongly recommended; alternative methods not guaranteed to be supported
Link to adoption results: https://www.uoftbookstore.com/adoption-search-results? ccid=4863629&itemid=354166

Assessments and Grading (Course Evaluations)

Term test: 30% of final grade
- 90 minutes, on Oct 20th during lecture
- Multiple choice
- Allowed: hand-held non-programmable calculator and a 1-page, double-sided, Letter-size test aid
Final Exam: 40% of final grade
- 120 minutes during Exam period
- Multiple choice
- Allowed: same calculator and test aid as above
Written assignments: 10%
- Due 11:59 PM on Dec 2nd via Quercus
- Draft feedback by Nov 15th 11:59 PM via Quercus
Tutorial participation and completion: 18%
- Attendance and successful completion of worksheet (submitted by Friday 11:59 PM after each tutorial through Quercus) required for full marks
- You may miss 1 of 9 tutorials without losing a mark
SONA Experiment participation: 2% (for 3 credits = 3 hours)
- 3 hours of psychology experiments via SONA; deadline June 17th
- Create participant account and enroll in PSY201_2025F on SONA
- If you are late (no later than 10 minutes after appointment), penalty of -1 credit
- Opt-out substitutes: 1 assignment = 1 credit; deadline June 2nd; link: https://www.utm.utoronto.ca/psychology/faculty-research/experiment-database- overview/substitute-assignments-experimental-credit

Why Statistics? – Core Rationale

For informed consumers and producers of information:
- Informed consumer capabilities:
- Extract information accurately from visualized data (tables, graphs, etc.)
- Evaluate numerical arguments
- Decide whether to change behavior based on information
- Informed producer capabilities:
- Collect data appropriately
- Summarize data informatively (Descriptive statistics)
- Analyze data to draw fair conclusions (Inferential statistics)
- Visualize data to communicate to audiences

Consuming Information Wisely – Key Examples

Example 1: What does this information tell us?
- Emphasizes interpretation of summarized data rather than taking numbers at face value
Example 2: Spurious correlations (illustrative chart)
- 127.0 (Bachelor’s degrees in Library science) correlates with Google searches for 'how to hide a body' (illustrative; not causal)
- Other numbers: 115.5, 104.0, 92, 83, 71, 60, 48, 81, 2021, 36 (illustrative scale of search trends over years 2012–2021)
- Source: National Center for Education Statistics; Google Trends; Tyler Vigen spurious correlations
- Takeaway: Correlation does not imply causation; beware misleading interpretations
Example 3: Dangerous DHMO (Demo of persuasive yet misleading claims)
- Claims like erosion %, tumor presence, death risk, nuclear plants usage, animal experiments, processed foods
- Emphasizes checking sources, context, and evidence before accepting claims

Exercise 1 – Water Quality Example and Decision Making

Problem context: Five water specimens sampled daily; compute the average concentration for each day; a histogram summarizes the 200-day distribution of daily averages
After a spill: One month later, five specimens from the same well show an average of
- 14.5 ppm
Question: Considering pre-spill variation, is this convincing evidence that the well water was affected?
Implications: Interpretation depends on baseline variability, sampling variability, and threshold for detecting a spill
What does this information tell us? (Encourages critical thinking about evidence strength and uncertainty)

Population vs Sample; Descriptive vs Inferential Statistics

Population: The entire collection of individuals or objects of interest
Sample: A subset of the population from which information is collected
Descriptive statistics: Methods to organize and summarize data
Inferential statistics: Methods to make inferences about the population from the sample
Data: A collection of observations on one or more variables
Variable: A characteristic that can change in value across observations
Example: Height of humans

Data Types and Datasets

Numerical data: Observations that are numerical (e.g., heights, test scores)
Categorical (qualitative) data: Observations that are categories (e.g., gender, color)
Univariate data set: Observations vary in one characteristic
Multivariate data set: Observations vary in multiple characteristics
Example: Height of humans

Types of Numerical Variables

Discrete numerical variable: Possible values are isolated and limited points on the number line
- Example: Countable events (e.g., number of items)
Continuous numerical variable: Possible values can be anywhere on the number line (theoretically infinite)
- Example: Measurements like height, weight

Frequency Distributions for Categorical Data

Frequency: Number of times a category occurs in the data
Relative frequency: Proportion of observations in a category
Example uses: demonstration with shark attack data; context described for understanding distribution

How Do We Collect Data? Observational Study

Observational study: Observe characteristics of a sample drawn from one or more existing populations
Purpose: Draw conclusions about the population or differences between populations regarding the characteristics
Example study question: “The internet usage across young (21–40) adults?”
Methodology: 1000 individuals (gender-balanced) from age 21–40 answer: “Do you use internet everyday?”
- Reported percentages by age group (Young 21–40, Middle 41–60, Old 61–80)

How to Collect Data Sensibly? Avoiding Bias

Bias types:
- Selection (sampling) bias: Systematic exclusion of part of the population
- Measurement (response) bias: Measurement methodology affects results
- Nonresponse bias: Nonparticipation affects representativeness
Source: https://thedailyjaws.com/news/florida-is-the-shark-bite-capital-of-the-world
Image credit: https://www.flickr.com/photos/61056899@N06/

Random Sampling and Practicality

Recommended approach: Simple random sampling
Simple random sample of size n: Every possible sample of size n has the same chance of being selected
Sampling without replacement: Once selected, an individual cannot be selected again
Sampling with replacement: An individual can be selected multiple times
Practical note: When sample size n is less than 10% of the population, both sampling approaches are practically equivalent
Visual aid: Sample (n = 4) vs Population; X marks represent sample drawn from population

How Much Data is Enough? – Sample Size Guidance

Example: Population distribution of MATH SAT scores of all applicants (n = 5000)
Question: How big should the random sample be to know reliably about the population?
Answer: With random sampling, 1% (50/5000) sample size can tell us reliably about the population
Caveat: Simple random sampling is often difficult and costly in practice

Practical Sampling Alternatives

Alternative sampling methodologies:
- Stratified Random Sampling: Divide population into non-overlapping strata, sample within each stratum proportionally to its size
- Cluster Sampling: Randomly sample at cluster/group level rather than individuals
- Systematic Sampling: Select a random first element, then sample every kth element
- Convenience Sampling: Sample based on ease of access
Practical caution: Convenience sampling is common but generalization to the population must be done with extreme care due to potential bias

Experimental Study – Core Concepts

Experimental study: One or more explanatory variables are manipulated to observe effects on a response variable
Explanatory variables: Also called independent variables or factors; values controlled or manipulated by the researcher
Response variables: Also called dependent variables; measured but not controlled by the researcher; hypothesized to be affected by explanatory variables
Experimental conditions: Combinations of values for the explanatory variables (also called treatments)
Extraneous variables: Not explanatory but can affect the response variables
Good experimental design aims to ensure that only explanatory variables explain observed effects; extraneous variables are controlled or accounted for

Four Pillars of Good Experimental Design

Direct control: Hold extraneous variables constant across conditions
Random assignment: Randomly assign subjects to conditions to balance extraneous factors
Blocking: Use extraneous variables to create groups that are evenly assigned across conditions
Replication: Repeat the experiment to ensure results are reliable and not due to idiosyncrasies of a single data set
Start from here: these principles are foundational to designing robust experiments

Experimental Study Example – Does a 'Thank you' on the check increase tips?

Explanatory variable: Presence of a 'Thank you' note on the check
Experimental conditions (treatments): 'Thank you' vs No 'Thank you'
Response variable: Percentage of tips left by customers
Participants: 200 customers during shifts (Thursday and Friday)
Assignment options:
- A: 'Thank you' on Thursday, No 'Thank you' on Friday
- B: 'Thank you' on Friday, No 'Thank you' on Thursday
- C: Half get 'Thank you' and half do not on each day
- D: For each participant, flip a coin to assign treatment
Important design note: Day of week could confound with the treatment effect; blocking by day could help mitigate confounding
Real-world constraint: Random sampling is often hard; random assignment within practical constraints allows assessment of the treatment effect
Blocking reference: Conceptually similar to stratified sampling

Organizing Experimental Design – Flow and Practical Tips

Visualize design with a flow chart to organize steps and ensure clarity
Acknowledge real-world complexities in randomization; plan for blocking and randomization where possible

Quick References and Takeaways

Always consider threats to validity (bias, confounding variables, measurement error)
Understand the distinction between population and sample, descriptive vs inferential statistics
Recognize different data types and the appropriate analyses for each
Plan data collection with bias reduction and practical constraints in mind
Use experimental design principles (control, randomization, blocking, replication) to strengthen causal claims