Chapter 6: Making Systematic Observations

Choosing Specific Variables

  • Research Tradition

    • When studying topics previously researched, using traditional measures can allow for comparison of results of different manipulations across experiments.

  • Theory

    • Variables may sometimes be influenced based upon your theoretical viewpoint.

  • Availability of New Techniques

    • Technology or the development of new techniques may allow for the study of variables that previously were not possible.

Choosing Measures

  • Measures must be adapted to the special situations posed in your particular study

  • There are several types of behavioral measure commonly used in psychological research

Reliability of a Measure

  • Reliability: measure’s ability to produce similar results when repeated measures are made under identical conditions.

    • Gives similar outcomes everytime is used

    • Consistent 

      • Ex. weight scale

  • Reliability of a Physical Measure

    • Repeatedly measuring a fixed quantity of variable and then using observed variations in measured value to derive the precision of the measure. (Height, Weight, etc.)

  • Reliability of Population Estimates

    • Estimate the average value of the variable in a given sample drawn from the target population. Precision of this estimate is called margin of error.

  • Reliability of Psychological Tests or Measures

    • Basic method is to administer the test twice to a large group of individuals and determine the correlation. (Higher the correlation, the greater the reliability)

  • Test-retest reliability: to measure this, administer the test twice to the same individuals with a fairly long interval of time in between each administration.

    • Test now - test later = same score

    • The time between both varies depending on the measure used

    • Exact test

    • Stronger positive correlations between results indicate level of reliability

      • Low correlation = test not reliable

Disadvantages of Test-Retest Reliability

  • On second Administration, participants might simply remember answers

    • Counter this with parallel-forms reliability

  • Participants may change from one administration to the next:

    • Counter this with split-half reliability

Accuracy of a Measure

  • Accuracy: a measure that produces results that agree with a known standard.

  • Hampered by a lack of precision

    • In some cases, accuracy can be determined by measuring the standard a large number of times and computing the average. Then compare your measurement to the average of the standard.

    • Any difference between average value and standard value is termed bias.

Validity of a Measure

  • Validity: extent to which a measure measures what you intend it to measure.

    • If its measuring what you want it to measure

  • Face-validity: how well a measurement instrument appears to measure what it is designed to measure.

    • Least important - but it may be important to participants 

  • Content validity: how adequately the content of a test samples the knowledge, skills, or behaviors that the test is intended to measure.

    • Does it cover what it's supposed to cover

    • No analysis

  • Criterion-related validity: how adequately a test score can be used to infer an individual’s value on some criterion measure. Two major types: Concurrent and Predictive

    • Concurrent Validity: if the scores on your test and the criterion are collected at about the same time.

      • Highly correlated to other measures(0.07)

        • Ex: new IQ measure similar results to current IQ measure

    • Predictive validity: comparing the scores on a test with the value of a criterion measure observed at a later time.

      • Ex: arm span predicts height, so we can assume psychic height predicts arm span if it's reliable

  • Construct validity: when a test is designed to measure a construct which is variable, not directly observable, that has been developed to explain behavior on the basis of a theory.

  • Convergent validity:

    • What should height be related to? Height span, weight

  • Divergent or discriminant validity:

    • What should height not be related to? Intelligence, income, ear size

    • Multi-trait multi-method matrix: 

      • Trait: Measure should be related to what it's supposed to and should not be related to what it's not supposed to

        • Different traits (depression, anxieyt) you are trying to measure

      • Method: the self-report score is highly related to other self-report scores than to physiological scores

        • Different methods (questionnaires, tests) are used to measure the traits

Scales of Measurement

  • Nominal Scales: Lowest level of measurement. Values are assigned different names but are not ordered in any particular way. (Ex: Male not higher or lower value than female...just different.)

  • Ordinal Scales: Different values are assigned different names, but also can be ranked according to quantity (high, moderate, and low).

    • However, the degree of separation between high, moderate, and low is not known

  • Interval and Ratio Scales

    • If the spacing between values along the scale is known, then the scale is either an interval scale or a ratio scale.

    • In both cases you know if one value (unit) is smaller or larger than another and by how much.

      • Ex: The Celsius scale for temperature.

    • Interval: numerical, intervals are equal, low, and high numbers. Mathematical operations, but you can say something is 2 time bigger than__ because there are no 0 values. (Ex: weight)

    • Ratio: 

Choosing a Scale of Measurement

  • Information yielded:

    • Nominal provides the least info., ordinal adds basic information about quantity, interval refines the measurement of quantity by specifying degrees of difference, and ratio indicates precisely how much of the quantity exists.

    • When possible, adopt the scale that provides the most information.

  • Statistical Tests

    • Typically, results are less sensitive to relationships among variables in nominal and ordinal scales than in interval and ratio scales.

Ecological Validity

  • Remember to use the scale-type that best fits your study.

    • It is possible that research questions may limit your choice of measurement scale, and so remembering to use the scale that best fits/is realistic is key.

  • Ecological Validity: does the study reflect what people must do in real-life situations.

Adequacy of a Dependent Measure

  • Sensitivity

    • Some measures of a dependent bariable may be insensitive to the effect of a manipulation, whereas other measures under the same conditions definitely show an effect

    • How sensitive is the measure to changes

    • Ex: measure depression based on the DSM (yes or no) or questions (interval or ratio)

  • Range Effects

    • Occur when the values of a variable have an upper or lower limit.

Sensitivity of a Dependent Measure

  • Some measures are insensitive to manipulations, while others under the same conditions most certainly show an effect.

  • Unsystematic observations carried out during the course of the experiment can provide a useful check on the adequacy of your measure as well as uncover potential defects

Range Effects

  • Range Effects: occur when the values of a variable have an upper or lower limit, which is encountered during the course of the observation.

    • Two types: Floor and Ceiling effects

  • Effect data in two ways

    • By limiting the values of your highest or lowest data points, range effects decrease the differences among your treatment means (possibly to the point that statistically reliable differences virtually disappear).

    • Reduce the variability of scores within the affected treatments.

Tailoring Measures to Participants

  • Consider the participants realistic capabilities.

  • Consider representing measures through graphs or another visual format.

    • Using blocks, balloons, pictures of expressions, etc.

    • Consider the habituation technique or the preference technique.

  • Remember that research jargon is research jargon...not everybody gets it.

    • Use language, instructions, and explanations that are appropriate to your participants.

Types of Dependent Variables

  • Behavioral Measures

    • Recording the actual behaviors of the subjects.

    • Examples: Frequency, Latency, and Number of Errors

  • Physiological Measures

    • Typically requires special equipment and monitors the participant’s bodily functions (heart rate, respiration rate, etc.).

    • Examples: EEG’s in sleep labs, PET scans, fMRI scans, etc.

Types of Dependent Measures 2.0

  • Self-Report Measures

    • Common forms include rating scales often employing a 0-10 type of rating system.

      • Retrospective verbal reports and prospective verbal reports

    • Q-sort methodology

      • Qualitative measurement technique that involves establishing evaluative categories and then sorting items into said categories.

  • Self-report measures suffer from reliability and validity problems.

    • Humans aren’t perfect…

    • Memory, perspective, circumstances, and old-fashioned lying.

Implicit Measures

  • A dependent measure that is not under the direct conscious control of participants.

    • Example: prejudice individual may not admit to this, but still might react to the stimuli towards which they are prejudice.

  • Implicit Association Test (IAT)

    • Present participant with set of images and words that they must classify into groups as quickly as possible.

    • The quicker the responses, the closer to automatic, and this means the less conscious control

Reactivity in Research with Human Participants

  • Humans are aware of their present status during research and such awareness could alter their natural behaviors.

    • Put yourself in their shoes...what might that be like for you?

    • What do you see?

    • Nervous?

    • Thoughts about the researchers?

Demand Characteristics

  • Cues inadvertently provided by the researcher or research context concerning the purposes of the study or behaviors expected of the participants

    • Problems arise when the participant’s hypothesis is different from the purpose of the experiment

  • Role attitude cues: unintended cues in an experiment that suggest to participants how they are expected to behave.

  • Preexisting attitudes:

    • Cooperative attitude: desire to please

    • Apprehensive attitude: worried or defensive concerning what might happen to them

    • Negative attitude: harboring intent to ruin the experiment

Experimenter Bias

  • Participants aren’t the only ones with preconceived notions…

    • Clever Hans Phenomenon

    • Facilitated Communication

  • Expectancy effects:

    • When a researcher’s preconceived ideas of how a participant should behave are subtly communicated to the participant and in turn affect said behavior.

      • Threat to both internal and external validity

Ways to Reduce Bias

  • Single-blind technique: experimenter does not know which treatment a subject has been assigned to.

  • Double-blind technique: neither experimenter nor participants know at the time of testing which treatments the participants are receiving.

  • When possible, automation of the experiment process is another method of reducing bias.

Detecting and Correcting Problems

  • Conduct a Pilot Study

    • Pilot study: small-scale version of a study used to establish procedures, materials, and parameters to be used in the full study.

  • Adding Manipulation Checks

    • Manipulation check: test of whether or not independent variables had the intended effects on participants.

Psychometric: validity & reliability of measures

Reliability a necessary but not sufficient condition for validity

Reliable ≠ valid

Valid = reliable 

Key terms:

  • Reliability

    • Whether a measure or questionnaire produces the same or similar responses with multiple administrations of the same or a similar instrument.

      • Ex: a weight scale

  • test–retest reliability

    • A method of assessing the reliability of a questionnaire by administering the same test twice, separated by a relatively long interval of time, to the same individuals.

      • Giving the same test twice but far apart

  • parallel-forms reliability

    • Establishing the reliability of a questionnaire by administering parallel (alternate) forms of the questionnaire repeatedly

      • 2 different versions (measure the same thing) but they both have the same things Which prevents participants from remembering past responses

  • split-half reliability

    • A method of assessing reliability of a questionnaire using a single administration of the instrument. The questionnaire is split into two parts, and responses from the two parts are correlated.

      • Test spilt in half and both should be equally reliable

      • Correlate 2 halves 

    • Alpha coefficient (Cronbach's alpha) is used instead - finds every slip-half reliability and averages them 

  • Accuracy

    • Agreement of a measurement with a known standard.

  • Validity

    • The extent to which a measuring instrument measures what it was designed to measure

      • What it's supposed to measure

  • face validity

    • How well a test appears to measure (judging by its contents) what it was designed to measure. 

      • Just looking at the questions/ based on appearance 

      • Example: A measure of mathematical ability would have face validity if it contained math problems.

  • content validity

    • Validity of a test established by judging how adequately the test samples behavior representative of the universe of behaviors the test was designed to sample.

      • Entire rage of the concept/ all relevant aspects

  • criterion-related validity

    • The ability of a measure to produce results similar to those provided by other, established measures of the same variable.

      • Compares to other established measures

  • concurrent validity

    • The validity of a test established by showing that its results can be used to infer an individual’s value on some other, accepted test administered at the same time

      • At the same time

      • Ex. new depression scale taken at the same time an already established one

  • predictive validity

    • The ability of a measure to predict some future behavior.

      • Future outcomes

      • Ex. standardized test predicts future academic performance

  • construct validity

    • Validity that applies when a test is designed to measure a “construct” or variable “constructed” to describe or explain behavior on the basis of theory (e.g., intelligence). A test has construct validity if the measured values of the construct predict behavior as expected from the theory (e.g., those with higher intelligence scores achieve higher grades in school).

      • Making sure it truly test what it aims to measure

  • nominal scale

    • Variables whose values differ in quality and not quantity. This scale yields the least information.

      • Categories with no order

      • Ex: grocery list (greens, dairy, etc.)

  • ordinal scale

    • A measurement scale in which cases can be ranked according to quantity (e.g., large, medium, or small). The distances between scale values are unknown.

      • Intervals are not equal

      • Ex: rating satisfaction (5 is better than 4 but we don’t know by how much), 1st, 2nd, 3rd, place

  • interval scale

    • A measurement scale in which values along the scale increase in equal increments. The zero point of an interval scale is arbitrary.

      • Equal intervals - distance between two points is the same

      • Ex. temperature however 0 does not mean absence of temperature, or IQ has no absolute 0

  • ratio scale

    • Equal intervals and true zero

      • 0 represents the absence of something

      • Ex: weight, height, age

  • range effects

    • A problem in which a variable being observed reaches an upper limit (ceiling effect) or lower limit (floor effect).

      • Results cluster at extremes

      • Test is too easy or too hard

  • behavioral measure

    • A measure of a subject’s actual behavior in a situation, for example, the number of times a rat presses a lever (frequency of responding).

      • Measures an actual action

  • physiological measure

    • A measure of a bodily function of subjects in a study (e.g., heart rate).

      • Boldly reactions to things

  • self-report measure

    • A measure that requires participants to report on their own behavior, emotions, or thoughts.

      • Participants provide information about themselves

  • Q-sort methodology

    • A qualitative measurement technique that involves establishing evaluative categories and sorting items into those categories.

      • Subjective categories

      • Most agree to least agree (bell curve)

  • implicit measure

    • A measure of attitudes or prejudice that is not under direct conscious control.

      • Measures automatic responses

  • Implicit Association Test (IAT)

    • A popular measure of implicit attitudes that uses responses that are not under direct conscious control.

      • Associate race with attributes

  • demand characteristics

    • Cues inadvertently provided by the researcher or research context concerning the purposes of a study or the behavior expected from participants.

      • Hints that give away to participants what the purpose of the study is which can make them change their behavior

      • Single/double blind techniques help reduce this

      • Ex: role attitude cues, experimenter bias

  • role attitude cues

    • An unintended cue in an experiment that suggests to the participants how they are expected to behave.

      • Type of demand characteristics: cues that tell the participants the role they are supposed to play

      • Ex: participants need to pretend to be a teacher = act more nurturing and strict

  • experimenter bias

    • When the behavior of the researcher influences the results of a study. Experimenter bias stems from two sources: expectancy effects and uneven treatment of subjects across treatments.

      • expectations/ belief influences participants responses

      • Ex: researcher smiles when they get a response they expect so participant continues to give responses they think the researcher wants. How questions are worded can also influence the way participants respond

  • expectancy effects

    • When a researcher’s preconceived ideas about how subjects should behave are subtly communicated to subjects and, in turn, affect the subjects’ behavior.

      • Researcher believes a certain group will do better = favoritism towards that group = they do perform better

      • Ex. teacher is told a group is expected to improve = more attention towards the group = they perform better

  • single-blind technique

    • The person testing subjects in a study does not know which treatment a subject has been assigned to.

      • Subject is blind

      • Ex: subjects dont know what group they are in (placebo or not)

  • double-blind technique

    • Neither the participants in a study nor the person carrying out the study knows at the time of testing which treatment the participant is receiving

      • Subject and researcher is blind

      • Ex: researcher and participant don’t know who is getting  placebo and who is getting treatment

  • pilot study

    • A small, scaled-down version of a study used to test the validity of experimental procedures and measures.

      • Trial run

      • Helps identify issues

      • Rehearsal 

      • Tests for everything that can be wrong so they can fix it before the main study is done

  • manipulation check

    • Measures included in an experiment to test the effectiveness of the independent variables.

      • Make sure that the independent variable was manipulated as intended

      • Make sure participants experienced the change

      • As if they notice anything different after

      • Ex: teaching styles