Chapter 6: Making Systematic Observations
Choosing Specific Variables
Research Tradition
When studying topics previously researched, using traditional measures can allow for comparison of results of different manipulations across experiments.
Theory
Variables may sometimes be influenced based upon your theoretical viewpoint.
Availability of New Techniques
Technology or the development of new techniques may allow for the study of variables that previously were not possible.
Choosing Measures
Measures must be adapted to the special situations posed in your particular study
There are several types of behavioral measure commonly used in psychological research
Reliability of a Measure
Reliability: measure’s ability to produce similar results when repeated measures are made under identical conditions.
Gives similar outcomes everytime is used
Consistent
Ex. weight scale
Reliability of a Physical Measure
Repeatedly measuring a fixed quantity of variable and then using observed variations in measured value to derive the precision of the measure. (Height, Weight, etc.)
Reliability of Population Estimates
Estimate the average value of the variable in a given sample drawn from the target population. Precision of this estimate is called margin of error.
Reliability of Psychological Tests or Measures
Basic method is to administer the test twice to a large group of individuals and determine the correlation. (Higher the correlation, the greater the reliability)
Test-retest reliability: to measure this, administer the test twice to the same individuals with a fairly long interval of time in between each administration.
Test now - test later = same score
The time between both varies depending on the measure used
Exact test
Stronger positive correlations between results indicate level of reliability
Low correlation = test not reliable
Disadvantages of Test-Retest Reliability
On second Administration, participants might simply remember answers
Counter this with parallel-forms reliability
Participants may change from one administration to the next:
Counter this with split-half reliability
Accuracy of a Measure
Accuracy: a measure that produces results that agree with a known standard.
Hampered by a lack of precision
In some cases, accuracy can be determined by measuring the standard a large number of times and computing the average. Then compare your measurement to the average of the standard.
Any difference between average value and standard value is termed bias.
Validity of a Measure
Validity: extent to which a measure measures what you intend it to measure.
If its measuring what you want it to measure
Face-validity: how well a measurement instrument appears to measure what it is designed to measure.
Least important - but it may be important to participants
Content validity: how adequately the content of a test samples the knowledge, skills, or behaviors that the test is intended to measure.
Does it cover what it's supposed to cover
No analysis
Criterion-related validity: how adequately a test score can be used to infer an individual’s value on some criterion measure. Two major types: Concurrent and Predictive
Concurrent Validity: if the scores on your test and the criterion are collected at about the same time.
Highly correlated to other measures(0.07)
Ex: new IQ measure similar results to current IQ measure
Predictive validity: comparing the scores on a test with the value of a criterion measure observed at a later time.
Ex: arm span predicts height, so we can assume psychic height predicts arm span if it's reliable
Construct validity: when a test is designed to measure a construct which is variable, not directly observable, that has been developed to explain behavior on the basis of a theory.
Convergent validity:
What should height be related to? Height span, weight
Divergent or discriminant validity:
What should height not be related to? Intelligence, income, ear size
Multi-trait multi-method matrix:
Trait: Measure should be related to what it's supposed to and should not be related to what it's not supposed to
Different traits (depression, anxieyt) you are trying to measure
Method: the self-report score is highly related to other self-report scores than to physiological scores
Different methods (questionnaires, tests) are used to measure the traits
Scales of Measurement
Nominal Scales: Lowest level of measurement. Values are assigned different names but are not ordered in any particular way. (Ex: Male not higher or lower value than female...just different.)
Ordinal Scales: Different values are assigned different names, but also can be ranked according to quantity (high, moderate, and low).
However, the degree of separation between high, moderate, and low is not known
Interval and Ratio Scales
If the spacing between values along the scale is known, then the scale is either an interval scale or a ratio scale.
In both cases you know if one value (unit) is smaller or larger than another and by how much.
Ex: The Celsius scale for temperature.
Interval: numerical, intervals are equal, low, and high numbers. Mathematical operations, but you can say something is 2 time bigger than__ because there are no 0 values. (Ex: weight)
Ratio:
Choosing a Scale of Measurement
Information yielded:
Nominal provides the least info., ordinal adds basic information about quantity, interval refines the measurement of quantity by specifying degrees of difference, and ratio indicates precisely how much of the quantity exists.
When possible, adopt the scale that provides the most information.
Statistical Tests
Typically, results are less sensitive to relationships among variables in nominal and ordinal scales than in interval and ratio scales.
Ecological Validity
Remember to use the scale-type that best fits your study.
It is possible that research questions may limit your choice of measurement scale, and so remembering to use the scale that best fits/is realistic is key.
Ecological Validity: does the study reflect what people must do in real-life situations.
Adequacy of a Dependent Measure
Sensitivity
Some measures of a dependent bariable may be insensitive to the effect of a manipulation, whereas other measures under the same conditions definitely show an effect
How sensitive is the measure to changes
Ex: measure depression based on the DSM (yes or no) or questions (interval or ratio)
Range Effects
Occur when the values of a variable have an upper or lower limit.
Sensitivity of a Dependent Measure
Some measures are insensitive to manipulations, while others under the same conditions most certainly show an effect.
Unsystematic observations carried out during the course of the experiment can provide a useful check on the adequacy of your measure as well as uncover potential defects
Range Effects
Range Effects: occur when the values of a variable have an upper or lower limit, which is encountered during the course of the observation.
Two types: Floor and Ceiling effects
Effect data in two ways
By limiting the values of your highest or lowest data points, range effects decrease the differences among your treatment means (possibly to the point that statistically reliable differences virtually disappear).
Reduce the variability of scores within the affected treatments.
Tailoring Measures to Participants
Consider the participants realistic capabilities.
Consider representing measures through graphs or another visual format.
Using blocks, balloons, pictures of expressions, etc.
Consider the habituation technique or the preference technique.
Remember that research jargon is research jargon...not everybody gets it.
Use language, instructions, and explanations that are appropriate to your participants.
Types of Dependent Variables
Behavioral Measures
Recording the actual behaviors of the subjects.
Examples: Frequency, Latency, and Number of Errors
Physiological Measures
Typically requires special equipment and monitors the participant’s bodily functions (heart rate, respiration rate, etc.).
Examples: EEG’s in sleep labs, PET scans, fMRI scans, etc.
Types of Dependent Measures 2.0
Self-Report Measures
Common forms include rating scales often employing a 0-10 type of rating system.
Retrospective verbal reports and prospective verbal reports
Q-sort methodology
Qualitative measurement technique that involves establishing evaluative categories and then sorting items into said categories.
Self-report measures suffer from reliability and validity problems.
Humans aren’t perfect…
Memory, perspective, circumstances, and old-fashioned lying.
Implicit Measures
A dependent measure that is not under the direct conscious control of participants.
Example: prejudice individual may not admit to this, but still might react to the stimuli towards which they are prejudice.
Implicit Association Test (IAT)
Present participant with set of images and words that they must classify into groups as quickly as possible.
The quicker the responses, the closer to automatic, and this means the less conscious control
Reactivity in Research with Human Participants
Humans are aware of their present status during research and such awareness could alter their natural behaviors.
Put yourself in their shoes...what might that be like for you?
What do you see?
Nervous?
Thoughts about the researchers?
Demand Characteristics
Cues inadvertently provided by the researcher or research context concerning the purposes of the study or behaviors expected of the participants
Problems arise when the participant’s hypothesis is different from the purpose of the experiment
Role attitude cues: unintended cues in an experiment that suggest to participants how they are expected to behave.
Preexisting attitudes:
Cooperative attitude: desire to please
Apprehensive attitude: worried or defensive concerning what might happen to them
Negative attitude: harboring intent to ruin the experiment
Experimenter Bias
Participants aren’t the only ones with preconceived notions…
Clever Hans Phenomenon
Facilitated Communication
Expectancy effects:
When a researcher’s preconceived ideas of how a participant should behave are subtly communicated to the participant and in turn affect said behavior.
Threat to both internal and external validity
Ways to Reduce Bias
Single-blind technique: experimenter does not know which treatment a subject has been assigned to.
Double-blind technique: neither experimenter nor participants know at the time of testing which treatments the participants are receiving.
When possible, automation of the experiment process is another method of reducing bias.
Detecting and Correcting Problems
Conduct a Pilot Study
Pilot study: small-scale version of a study used to establish procedures, materials, and parameters to be used in the full study.
Adding Manipulation Checks
Manipulation check: test of whether or not independent variables had the intended effects on participants.
Psychometric: validity & reliability of measures
Reliability a necessary but not sufficient condition for validity
Reliable ≠ valid
Valid = reliable
Key terms:
Reliability
Whether a measure or questionnaire produces the same or similar responses with multiple administrations of the same or a similar instrument.
Ex: a weight scale
test–retest reliability
A method of assessing the reliability of a questionnaire by administering the same test twice, separated by a relatively long interval of time, to the same individuals.
Giving the same test twice but far apart
parallel-forms reliability
Establishing the reliability of a questionnaire by administering parallel (alternate) forms of the questionnaire repeatedly
2 different versions (measure the same thing) but they both have the same things Which prevents participants from remembering past responses
split-half reliability
A method of assessing reliability of a questionnaire using a single administration of the instrument. The questionnaire is split into two parts, and responses from the two parts are correlated.
Test spilt in half and both should be equally reliable
Correlate 2 halves
Alpha coefficient (Cronbach's alpha) is used instead - finds every slip-half reliability and averages them
Accuracy
Agreement of a measurement with a known standard.
Validity
The extent to which a measuring instrument measures what it was designed to measure
What it's supposed to measure
face validity
How well a test appears to measure (judging by its contents) what it was designed to measure.
Just looking at the questions/ based on appearance
Example: A measure of mathematical ability would have face validity if it contained math problems.
content validity
Validity of a test established by judging how adequately the test samples behavior representative of the universe of behaviors the test was designed to sample.
Entire rage of the concept/ all relevant aspects
criterion-related validity
The ability of a measure to produce results similar to those provided by other, established measures of the same variable.
Compares to other established measures
concurrent validity
The validity of a test established by showing that its results can be used to infer an individual’s value on some other, accepted test administered at the same time
At the same time
Ex. new depression scale taken at the same time an already established one
predictive validity
The ability of a measure to predict some future behavior.
Future outcomes
Ex. standardized test predicts future academic performance
construct validity
Validity that applies when a test is designed to measure a “construct” or variable “constructed” to describe or explain behavior on the basis of theory (e.g., intelligence). A test has construct validity if the measured values of the construct predict behavior as expected from the theory (e.g., those with higher intelligence scores achieve higher grades in school).
Making sure it truly test what it aims to measure
nominal scale
Variables whose values differ in quality and not quantity. This scale yields the least information.
Categories with no order
Ex: grocery list (greens, dairy, etc.)
ordinal scale
A measurement scale in which cases can be ranked according to quantity (e.g., large, medium, or small). The distances between scale values are unknown.
Intervals are not equal
Ex: rating satisfaction (5 is better than 4 but we don’t know by how much), 1st, 2nd, 3rd, place
interval scale
A measurement scale in which values along the scale increase in equal increments. The zero point of an interval scale is arbitrary.
Equal intervals - distance between two points is the same
Ex. temperature however 0 does not mean absence of temperature, or IQ has no absolute 0
ratio scale
Equal intervals and true zero
0 represents the absence of something
Ex: weight, height, age
range effects
A problem in which a variable being observed reaches an upper limit (ceiling effect) or lower limit (floor effect).
Results cluster at extremes
Test is too easy or too hard
behavioral measure
A measure of a subject’s actual behavior in a situation, for example, the number of times a rat presses a lever (frequency of responding).
Measures an actual action
physiological measure
A measure of a bodily function of subjects in a study (e.g., heart rate).
Boldly reactions to things
self-report measure
A measure that requires participants to report on their own behavior, emotions, or thoughts.
Participants provide information about themselves
Q-sort methodology
A qualitative measurement technique that involves establishing evaluative categories and sorting items into those categories.
Subjective categories
Most agree to least agree (bell curve)
implicit measure
A measure of attitudes or prejudice that is not under direct conscious control.
Measures automatic responses
Implicit Association Test (IAT)
A popular measure of implicit attitudes that uses responses that are not under direct conscious control.
Associate race with attributes
demand characteristics
Cues inadvertently provided by the researcher or research context concerning the purposes of a study or the behavior expected from participants.
Hints that give away to participants what the purpose of the study is which can make them change their behavior
Single/double blind techniques help reduce this
Ex: role attitude cues, experimenter bias
role attitude cues
An unintended cue in an experiment that suggests to the participants how they are expected to behave.
Type of demand characteristics: cues that tell the participants the role they are supposed to play
Ex: participants need to pretend to be a teacher = act more nurturing and strict
experimenter bias
When the behavior of the researcher influences the results of a study. Experimenter bias stems from two sources: expectancy effects and uneven treatment of subjects across treatments.
expectations/ belief influences participants responses
Ex: researcher smiles when they get a response they expect so participant continues to give responses they think the researcher wants. How questions are worded can also influence the way participants respond
expectancy effects
When a researcher’s preconceived ideas about how subjects should behave are subtly communicated to subjects and, in turn, affect the subjects’ behavior.
Researcher believes a certain group will do better = favoritism towards that group = they do perform better
Ex. teacher is told a group is expected to improve = more attention towards the group = they perform better
single-blind technique
The person testing subjects in a study does not know which treatment a subject has been assigned to.
Subject is blind
Ex: subjects dont know what group they are in (placebo or not)
double-blind technique
Neither the participants in a study nor the person carrying out the study knows at the time of testing which treatment the participant is receiving
Subject and researcher is blind
Ex: researcher and participant don’t know who is getting placebo and who is getting treatment
pilot study
A small, scaled-down version of a study used to test the validity of experimental procedures and measures.
Trial run
Helps identify issues
Rehearsal
Tests for everything that can be wrong so they can fix it before the main study is done
manipulation check
Measures included in an experiment to test the effectiveness of the independent variables.
Make sure that the independent variable was manipulated as intended
Make sure participants experienced the change
As if they notice anything different after
Ex: teaching styles