Vocabulary Flashcards: Essentials of Statistics for the Behavioral Sciences (Ch. 1-3)
1.1 Statistics, Science, and Observations
- Statistics = a set of mathematical procedures for organizing, summarizing, and interpreting information.
- Two general purposes of statistics:
- Organize and summarize information so researchers can see what happened and communicate results.
- Use sample data to answer questions about population parameters and justify general conclusions.
- Population vs. Sample:
- Population: the entire set of individuals of interest for a research question.
- Sample: a smaller, more manageable group drawn from the population, intended to be representative.
- Population size can be very large; samples are used because examining everyone is usually impractical.
- Data, Datum, and Data Sets:
- Datum (singular) = a single measurement/observation (score).
- Data (plural) = measurements/observations (a data set is a collection of scores).
- Descriptive vs. Inferential Statistics:
- Descriptive statistics: organize, summarize, and present data (e.g., tables, graphs, means).
- Inferential statistics: use sample data to draw conclusions about populations and generalize beyond the data.
- Inferential statistics must address sampling error (the discrepancy between a sample statistic and the population parameter).
- Key terms:
- Parameter: a value that describes a population.
- Statistic: a value that describes a sample.
- Each population parameter has a corresponding sample statistic; most research uses statistics to infer parameters.
- Margin of Error (Box 1.1): sampling error is the naturally occurring discrepancy between sample statistics and population parameters; polls report margins of error (e.g., +/− 4 percentage points).
- Diagrammatic idea (conceptual): different samples from the same population yield different statistics, illustrating sampling error.
1.2 Populations and Samples
Research typically begins with a question about a population (the entire group of interest).
Sample should be representative of its population; results from the sample are generalized to the population.
Constructs, variables, and measurement:
- Variable: a characteristic that changes or has different values for different individuals (e.g., height, mood, temperature).
- Datum/score: the measurement obtained for each individual.
- Data set: collection of scores.
Parameter vs. Statistic (definitions repeated for clarity):
- Parameter: a characteristic of a population.
- Statistic: a characteristic of a sample.
Parameter–Statistic relationship:
- Every population parameter has a corresponding sample statistic; most research uses sample statistics to infer population parameters.
Two data structures (used to classify research methods and statistical procedures):
- Descriptive and Inferential statistics are connected to the data structures below.
- Data Structure I: Measuring two variables for each individual (the correlational method).
- Data Structure II: Comparing two (or more) groups of scores (experimental vs. nonexperimental methods).
Variables, measurement, and data types:
- A variable can be a characteristic that changes (e.g., wake-up time, academic performance).
- A datum is a single measurement; the set of all scores is the data set.
SECTION 1.2 KEY CONNECTIONS:
- Population parameter ↔ Sample statistic (and their sampling error).
- Samples provide the practical basis for inferring about populations.
Visual/graphic idea:
- Figure 1.1 shows the relationship: Population ⟶ Sample ⟶ Generalize to Population.
- Figure 1.2 illustrates sampling error via two samples from the same population with different statistics.
The two major data structures in research:
- Data Structure I (Correlational): two variables measured for each individual; example scatter plot showing relationship between wake-up time and academic performance; correlation describes the relation but does not imply causation.
- Data Structure II (Group Comparisons): two or more groups defined by a variable; compares scores across groups; can be experimental (manipulation + control) or nonexperimental (no random assignment).
Key terms related to data, sampling, and statistics:
- Datum, Data Set, X, Y notation for variables; N for population size, n for sample size (notation varies; in this text, N denotes population size and n denotes sample size).
- Sampling error: discrepancy between a sample statistic and the corresponding population parameter, due to the randomness of sampling.
Important connections to research method:
- Correlational studies assess relationships between two variables but cannot establish causation.
- Experimental method manipulates one variable (independent variable) to observe causal effects on another (dependent variable) while controlling extraneous variables.
1.3 Data Structures, Research Methods, and Statistics
- Data Structure I: Correlational method
- Measure two variables for each individual (e.g., wake-up time and academic performance).
- Data presentation: table of two scores per individual and a scatter plot where x = wake-up time and y = academic performance.
- Limitation: cannot establish causation; relationship observed does not imply one variable causes changes in the other.
- Example: wake-up time vs. academic performance; as wake-up time increases, performance tends to decrease (illustrative pattern).
- Data Structure II: Experimental and nonexperimental methods (group comparisons)
- Compare two or more groups defined by a variable; then measure the second variable to obtain scores for each group.
- Experimental method: manipulation of an independent variable to create treatment conditions, then observe a dependent variable; aim is to demonstrate causation with control of extraneous variables.
- Example: weather violence study (hypothetical); kids exposed to violent TV show vs. non-violent; measure aggression on playground.
- The Experimental Method (two defining features)
1) Manipulation: the researcher changes the value of an independent variable across conditions.
2) Control: the researcher controls extraneous variables to prevent them from influencing the relationship. - Example: money-counting experiment (Zhou & Vohs, 2009)
- Independent variable: material participants handle (money vs. blank paper).
- Dependent variable: pain rating after hands are placed in hot water.
- Finding: counting money reduces pain perception relative to counting paper.
- Participant variables and environmental variables (potential confounds):
- Participant variables: age, gender, intelligence, etc. (could confound results if groups differ on these factors).
- Environmental variables: time of day, lighting, weather, etc.
- Techniques to control extraneous variables in experiments
- Random assignment: equal chance of being assigned to each treatment condition; helps distribute participant characteristics evenly and controls environmental variables.
- Matching: create equivalent groups based on key characteristics (e.g., gender proportions).
- Holding variables constant: study uses a single age group, for example.
- Terminology in experimental research
- Independent variable (IV): the variable manipulated by the experimenter (e.g., money vs. paper).
- Dependent variable (DV): the variable measured to assess the effect of the IV (e.g., pain rating).
- Control group: does not receive the experimental treatment (baseline).
- Experimental group: receives the treatment.
- Confounded: when more than one factor varies with the treatment, making it difficult to attribute effects to a single cause.
- Nonexperimental methods (comparative studies without true manipulation)
- Nonequivalent groups design: groups defined by a preexisting characteristic (e.g., gender) without random assignment.
- Pre–post design: same participants measured before and after a treatment, but no control over the passage of time.
- In all nonexperimental designs, causal conclusions are weaker due to potential confounds.
- Terminology for nonexperimental studies
- Quasi-independent variable: the variable used to create groups in a nonexperimental study (not truly manipulated).
- Dependent variable remains the measured outcome.
- Recap: Data structures in practice
- I. Correlational: two variables per individual; analyze relationship (correlation) but not causation.
- II. Group comparisons: two or more groups; can be experimental (causal inference) or nonexperimental (no random assignment; weaker causal claims).
1.4 Constructs and Operational Definitions
- Constructs and measurement
- Some variables are directly observable (e.g., height, weight), others are internal constructs (e.g., intelligence, anxiety, hunger) that require indirect measurement.
- Constructs (hypothetical constructs) are internal attributes useful for describing/explaining behavior.
- Operational definitions define how a construct will be measured or observed; they specify the measurement procedure and use resulting measurements as the definition of the construct.
- Examples of operational definitions
- Intelligence measured via IQ test scores; IQ test results serve as an operational definition of the construct intelligence.
- Hunger measured by number of hours since last eating; this defines hunger operationally.
- Important terminology
- Discrete vs. Continuous variables
- Discrete: separate, indivisible categories (e.g., number of children, race, gender, occupation).
- Continuous: infinite number of possible values within an interval (e.g., time, height, weight). Continuous variables can be subdivided into fractional parts; real limits define measurement boundaries.
- Scales of measurement (start with simple to complex)
- Nominal scale: categories have names with no quantitative order (e.g., major, race, gender). Differences between categories are not meaningful in magnitude or direction.
- Ordinal scale: categories have a meaningful order (e.g., ranks, class standing, shirt sizes) but not equal intervals; you can say which is bigger but not by how much.
- Interval scale: ordered categories with equal intervals between adjacent values; no true zero point (arbitrary zero) (e.g., Fahrenheit temperature, calendar years, some test scores).
- Ratio scale: interval scale with an absolute zero (nonarbitrary zero) that allows meaningful ratio comparisons (e.g., height, weight, reaction time, number of errors).
- Real limits (for continuous variables)
- Measurements are continuous and boundaries between scores are real limits (e.g., weight measured to the nearest pound yields real limits 149.5 to 150.5 around a score of 150).
- Real limits define intervals; each observed score corresponds to an interval on the measurement scale.
- Practical implications of scales
- Numerical calculations (means, standard deviations) are appropriate for interval/ratio scales.
- For nominal/ordinal data, use nonparametric techniques (e.g., median, mode, Spearman correlation, chi-square tests).
- Examples from Section 1.4
- Interval vs. ratio distinction: height in inches is inherently a ratio scale (true zero = none of height); a converted scale that centers around the average (e.g., deviations from the mean) becomes an interval scale.
- A common classroom exercise: transforming a measurement (e.g., height) to a different scale while preserving the information about differences but changing the zero point; ratio comparisons become invalid under the transformed scale.
- Practical questions (Learning checks) based on scales
- Identify scales for given variables (income, number of dependents, SSN, grades, preferences, number of children, etc.).
- Determine whether variables are discrete or continuous, and identify real limits for specific measurement precisions.
1.5 Scores Summation Notation
- Basic notation
- X, Y: scores for variables X and Y.
- N: population size (uppercase for population); n: sample size (lowercase for sample).
- Σ (sigma) denotes summation. The expression ΣX is the sum of X values.
- Examples of summation notation
- Given scores: 4, 3, 7, 1
- ΣX = 15
- ΣX^2 = 75
- (ΣX)^2 = 225
- Σ(X − 1) = 11
- Σ(X − 1)^2 = 49
- Example table for pairwise data (X, Y) with XY products:
- ΣX = 15, ΣY = 14, ΣXY = 54
- Important steps and order of operations
- Order of mathematical operations (as applied to statistical computations):
- Parentheses first.
- Exponents (squaring, etc.).
- Multiplication and/or division (left to right).
- Summation (Σ) next.
- Addition and/or subtraction last.
- Some computations involve sequences of steps that lead to multiple intermediate results (e.g., computing oX, oX^2, (oX)^2, o(X−1), o(X−1)^2).
- Worked examples (from the text)
- Example: oX = ΣX for X = {3, 1, 7, 1} → 15.
- oX^2 = ΣX^2 = 9 + 1 + 49 + 1 = 75.
- (oX)^2 = (15)^2 = 225.
- o(X−1) = Σ(X−1) = (3−1) + (1−1) + (7−1) + (1−1) = 2 + 0 + 6 + 0 = 8? (Note: actual text example shows 11 for {4,3,7,1}, works if using the given data; use the exact numbers from the text’s example: X = {4,3,7,1} yields ΣX = 15; Σ(X−1) = 11; Σ(X−1)^2 = 49.)
- o(X−1)^2 = Σ(X−1)^2 = 49 for the example above.
- Additional computational examples
- Example 1.6 shows: for pairs (X, Y) with XY products, ΣX = 15, ΣY = 14, ΣXY = 54.
- Practical advice for using Σ notation
- Two key points:
- The Σ sign is always followed by the symbol/expression identifying which values to add.
- The summation operation is combined with other operations (multiplication, squaring) and must follow the proper order of operations.
- Summary of implications for statistics students
- Summation is a core operation in many statistics formulas (means, variances, covariances, etc.).
- Mastery of Σ notation and order of operations is essential for correct calculations.
1.6 Chapter-wide connections and learning tools (brief overview)
- Real-world relevance: statistics provide a structured, objective approach to gathering, organizing, and interpreting data.
- Practical study notes from the Preface (briefly): the book emphasizes conceptual understanding, problem solving, and real-world examples to aid learning.
- End-of-chapter materials (not detailed here) include problems, demonstrations, and learning checks to reinforce concepts.
SUMMARY OF CHAPTER 1: INTRODUCTION TO STATISTICS
- Statistics defined as procedures for organizing, summarizing, and interpreting data; used to describe samples and infer properties of populations.
- Two major functions:
- Descriptive statistics: organize/summarize data.
- Inferential statistics: use sample data to draw conclusions about populations, accounting for sampling error.
- Populations vs. samples; parameters vs. statistics; the inevitability of sampling error when generalizing from sample to population.
- Data structures in behavioral research:
- Correlational method (two variables measured for each individual) – cannot establish causation.
- Experimental/nonexperimental methods (group comparisons) – manipulation of an independent variable and control of extraneous variables; causal conclusions depend on design strength.
- Constructs and measurement:
- Constructs are internal attributes (e.g., intelligence, hunger) defined by operational definitions based on observable behavior.
- Operational definitions specify how a construct is measured and the resulting scores.
- Scales of measurement (nominal, ordinal, interval, ratio) and the implications for statistical techniques.
- Discrete vs. continuous variables and the concept of real limits for continuous measurements.
- Notation: X, Y for scores; N vs. n; Σ as summation; and the importance of order of operations in statistical calculations.
- The derivative goal: help students move from memorization to conceptual understanding and principled application of statistics in research.