Unit 2: Research Methods
Introduction
People turn to psychology in many ways and for many different reasons. The trick, though, is to distinguish fact from opinion.
Many people and world leaders believe in their instincts and go by their “gut feeling”.
Many times our instincts are wrong, however. People tend to repeatedly over-estimate their ability to go by gut feelings.
Did we know it all along? Hindsight bias
People often have hindsight bias – looking back in time makes an event seem as though it were inevitable to happen. This is like saying, “I knew it all along,” or, “Hindsight is 20-20.”
After something happens, it just seems so obvious. What’s more, people tend to believe that they actually saw it coming.
Hindsight bias seems to be common sense. But, it goes both ways. People can have “common sense” in opposite or conflicting directions. Thus, there’s a problem.
Psychological studies often oppose “common sense” and our body determines much about what we do and feel.
Overconfidence
People are often overconfident. Overconfidence occurs when we are more confident that we know something than we are correct.
An example study had people unscramble jumbled words. Once they saw the word unscrambled, hindsight bias made them think they knew it all along and that the task was easy.
People predicted they’d unscramble the word in a few seconds. In actuality, it was minutes—overconfidence.
The scientific attitude
The main principle of science is to let the facts speak for themselves. There are 3 underlying parts to science…
Curiosity – you need to really want to find the truth.
Skepticism – scientists don’t take people merely at their claims, scientists seek factual proof.
Humility – a scientist has to be able to admit when he or she is proven wrong by the facts.
Critical thinking
Scientists always ask questions, the top questions being, “What evidence do you have to support that?” or “Are there any possible alternative explanations?”
The scientific method
The basic steps of the scientific method are:
Hypothesis – a statement predicts the outcome.
Procedure – decide on a research method and a procedure to carry it out.
Observation – recording the results by letting the results speak for themselves.
Conclusion – measuring the observed results against the hypothesis’ prediction.
Report findings – by reporting results to others, they can replicate the experiment if desired.
A theory is an explanation that organizes observations and tries to predict outcomes.
A hypothesis is a prediction that can be tested.
Experimenter bias can be a problem when testing a hypothesis. If not careful, they can see what they expect to see.
To cut down on bias, an operational definition is formed. An operational definition has two parts: (1) a precise statement of the experimenter’s procedures and concepts and (2) something that is measured numerically.
We need numbers because otherwise there’s no way to be sure about things. This is the subjective vs. objective problem.
Subjectivity is a judgment based on or including a person’s opinion or emotions – “A rose is a beautiful symbol of love.”
A good operational definition should be detailed enough to enable other scientists to replicate the experiment if they wished. This way, other scientists can replicate the experiment and either support the hypothesis or debunk it.
The 3 main types of methods of research used by psychologists are:
Description
Case study
A case study is a thorough study of one person in hopes of learning about people in general. Think of a manila folder stuffed with everything about a person – letters written, photos, researcher’s notes, interviews, 3rd grade test scores, etc.
A problem with case studies is that of numbers – studying only one person then applying the results to all people is risky. The person may not exemplify all people.
Case studies can lead psychologists to other, more reliable, types of studies.
Survey
A survey asks questions and deals with many more people (cases), but in much less depth.
Surveys can easily gather info from a large number of people. They can easily quantify data (turn something into numbers) whereas a case study may have trouble doing this. This is critical to meet the operational definition.
Surveys often have problems…
Wording – the results of the survey can be dramatically different depending on the wording of the survey and/or the question order. Here, the survey itself is changing its own results.
Random sampling – surveys must be from a representative sample of whatever group they’re trying to represent. To get a representative sample (where the small group truly represents the whole group), the survey-takers must come from a random sample. A random sample is where every person in the group has the same chance of being selected for the survey.
Getting a random sample, is trickier than it sounds, but there are two main ways to do it: (1) the bigger the number in the original group of people the better, and (2) use a computer or machine to do the picking.
Non-examples: standing on the corner asking people to do the survey is not random (some people won’t walk by), giving everyone a questionnaire is not random (some will/won’t return it).
A representative sample can be very accurate at predicting the entire population. A non-representative sample can be very wrong. So, how to tell if we have a representative sample? Random sampling!
Naturalistic observation
A naturalistic observation is watching a person or animal behave in its normal surroundings. Think of a person squatting in a jungle taking notes on gorillas.
The idea is to be as little intrusive as possible so the subject will behave normally, then record only what is observed. The explanation, the why? must come from another source.
Correlation
When two things are related or they go together, they are said to correlate.
Statisticians use a correlational coefficient to measure how closely two things go together (or not). This coefficient is usually seen in two ways:
Numerically – The correlation coefficient is on a scale from 1.0 down to -1.0
A coefficient of say 0.95 is very high (it’s close to 1.0) and would look like the bottom-left to top-right graph.
A coefficient of say -0.87 is very high as an inverse relationship (it’s close to -1.0) and would look like the top-left to bottom-right graph.
A coefficient of 0 is exactly no correlation at all (random pattern of dots).
In scatterplots – Graphs with the two things on the X and Y axes and dots scattered throughout the graph.
The tighter the dots line up from bottom-left to top-right, the higher the two things have a positive correlation. Like this:
In real life, things usually don’t correlate that highly. The dots may line up much more generally. Like this scatterplot that would have a correlation of about 0.6 or so:
If the dots line up the other way (top-left to bottom-right), they have a high inverse correlation or negative correlation. Be aware, just because it’s negative, it’s still a high correlation if it’s close to -1.0.
If the dots are splotched everywhere in no pattern, there is a very low correlation or none at all.
Correlation and causation
Correlations have a trap – suppose A and B correlate very highly. Just because the two things correlate, it’s incorrect to say A causes B. A might cause B, but we can’t prove that just from a correlation. There might be a third or fourth factor involved that is the cause.
Traps like these are reported all the time. For example, you may hear a report that says, "People who eat French fries at least twice a week are 3 times more likely to become obese." It's tempting to say, "Eating fries causes obesity," but, it's incorrect to say that. It's correct to say, "Eating fries 3 times a week and obesity have a positive correlation" (the two go together).
Only an experiment shows causation because it isolates one variable to be tested.
An example: a person’s shoe size and their vocabulary correlate very highly – the larger the shoe size, the greater number of words the person knows. Why is this? Does a larger foot cause a person to know more words? Of course not, there is another factor involved – age or time.
Illusory correlations
People often see correlations that are not there. If two things happen that seem out-of-the-ordinary, we link them together. Such as an astrology prediction that seems to come true. However, we ignore them when they don’t pan out.
The result is that we remember the times things work and forget the times they don’t. Then we wrongly conclude they actually correlate.
Experimentation
The gold-standard research method is the experiment. All of the other methods have their pluses, but experiments alone show cause-and-effect.
Parts of a good experiment…
Random selection – The participants come from a large population and are randomly selected to be involved.
Random assignment – Once chosen to be in the study, participants are randomly assigned to either the control or experimental group.
Double-blind procedure – A technique where the participants and researchers don’t know which group they’re in and/or the hypothesis being tested. Having participants and researchers "blind" is to cut down on any bias.
Placebo effect – People are often given a placebo (a fake drug that’s just a sugar pill) in double-blind experiments. Though fake, they think it’s real and have real positive benefits.
Independent variable (“IV”) – The IV is what the experimenter manipulates. This is the only thing different between the experimental and control groups.
Dependent variable (“DV”) – The DV is what the IV supposedly affects. The DV is what is measured.
Confounding variables – These are other factors that might make the experiment go wrong. Specifically, these are factors that might affect the DV. We want an experiment where only the IV affects the DV. Otherwise, we don’t know what caused the effect. Our results are said to be “confounded”.
Random selection, random assignment, and good procedures filter out confounding variables.
A simple example experiment…
Suppose a student wants to measure reaction time differences between students chewing sugar-free gum and chewing gum with sugar. From a list of every student in her school, the experimenter has a computer generate a list of 50 people. They all agree to be in the study. A coin flip determines if they will be in the sugar-free or sugar group. They’re not told which group they’re in. With their hand flat on a table, the students chew gum, then when a light comes on, they smack a switch. The reaction time is measured.
The parts of this experiment would be…
Random selection – The experimenters started with a large list and had a computer randomly generate 50 names. To be even better, the experimenters would use an even larger list, say of every student in the county.
Random assignment – The coin flip is random assignment, this will cancel out confounding variables.
IV – Type of gum, sugar or sugar-free (this is what’s different between the groups, it’s what the experimenters manipulated).
DV – The reaction time (this is what is measured; it’s what the type of gum supposedly affects).
Summary and strengths/weaknesses of methods of research (see chart at the bottom of this page).
Describing data
When reviewing data, it’s important to look at things carefully and critically. Things like the scale of graphs can influence how they’re perceived.
Measures of central tendency – Central tendency refers to the center of a bunch of numbers. There are three usual measurements of central tendency:
Mode – The number which occurs most frequently.
Mean – The average.
Median – The middle number, once they’ve all been listed from lowest to highest. This is also the 50th percentile (50% are lower and 50% are higher).
Measures of variation
Range – Is the distance between the lowest and highest numbers in a group.
Standard deviation – A measurement of how much the numbers vary from the mean (average). Just realize that if the numbers are all pretty close, the standard deviation will be low. It the numbers have a wide range, the standard deviation will be high.
Normal curve – Also called a “bell curve”, is a graph that often occurs in nature with things like height and intelligence scores on tests. Look at this graph of IQ scores. We see the median IQ is 100 (also the 50th percentile is 100). The break-point scores are the standard deviations. So, at -1 standard deviation a person scored at 85. Notice the percentages. 68.2% of people score within 1 standard deviation (between 85 and 115). And almost 96% score within 2 standard deviations (between 70 and 130). These percentages are amazingly stable in nature.
Positive Skew: a curve where most values are on the lower end, but there are some exceptionally large/high values.
Negative Skew: a curve where most values are on the higher end, but there are some exceptionally small/low values.
Making inferences
Inferential statistics - Can knowledge be inferred from a small sample to a large population?
This will be covered later (Unit 11: Testing and Individual Differences), but it should be mentioned here. There are two pillars of measurement:
Validity – A test or bit of research measures what it’s supposed to measure. If you took a test called “The Geography Assessment” but the questions were all algebra word problems, your score wouldn’t reflect your geography knowledge at all. The test would not be valid.
Reliability – The test yields the same results over and over. If you took a “U.S. Literature” test and scored a 97. Then took the “U.S. Literature” test twice more and scored a 57 and a 83, you’d say the results are not very reliable. But if you scored a 97, then a 94, then a 95, the test would be pretty reliable. Ways to make tests as reliable as possible:
Representative sampling – This is done with random selection of participants.
Low variability – Results with low ranges and low standard deviations are more reliable than high ones.
More is better – The more numbers we have, the better the results.
Statistical significance – The observed difference between two numbers is not due to chance. Usually statistical significance is measured by a "p-value" and goes by the 5% rule. A p-value is normally shown like 0.04 (or 4%), for example. Scientists will say the numbers are statistically significant if there is less than a 5% chance that they were caused by chance (in other words, the p-value is 0.05 or lower). For example, in an experiment, we're at least 95% certain that the Dependent Variable’s numbers were caused by Independent Variable.
PISSed - P Is Statistical Significance -
Psychology applied
Research in a laboratory can test principles that can be applied to everyday life.
Culture comes into play during research.
Due to cultural differences, a study in one culture may or may not be able to be applied to another culture.
In some ways, humanity is united in certain ways across cultures. For instance, smiles and frowns are universal.
Gender differences, and similarities, are real as well.
Ethics in research
Scientists often study animals for many reasons such as…
Their biological systems are often simplified versions of humans'.
Their life-cycles are often short so it’s faster to study life spans and generations.
It’s not ethical or moral to do certain things to humans.
Animal research
Using animals in research is sometimes controversial. The arguments center on the morality of using animals to hopefully better humankind (and animalkind).
Over time, we’ve developed an unofficial ranking of animals: (1) humans, (2) primates and household pets, (3) mammals, (4) birds, fish, reptiles, (5) insects.
Human research
When using humans in research there are ethical stop-signs. It would be wrong to electrically shock people in a study. But would it be okay to fake a shock? To put people through stress? (The general consent is, "No, that’s not ethical").
The APA (American Psychological Association) suggest two things:
Informed consent – This means participants (1) know what’s going on and (2) they give their permission to be in the study.
Debriefing of participants – When the study’s finished, researchers and participants go back over the whole thing.
Values still work their way into studies. For instance, our values impact the decision of what to study (and what not to study). And values affect the way we word things, like asking if you think a person is “cheap” (negative connotation) or “penny-wise” (positive connotation).
Type of Research | Purpose | Strengths | Weaknesses |
---|---|---|---|
Case study | To gather information. | Inexpensive. Requires only one participant. Can often be done when ethical factors won’t allow experiments. | Individual cases can be misleading. Doesn’t show causation. |
Survey | To gather information. | Inexpensive. Can gather lots of info fast. Can often be done when ethical factors won’t allow experiments. | Wording and personal bias can severely alter results. Doesn’t show causation. |
Naturalistic observation | To gather information. | Inexpensive. Can often be done when ethical factors won’t allow experiments. | Individual cases can be misleading. Doesn’t show causation. |
Correlation | To find out if two things go together, if they co-relate. | Can handle large numbers of people/data. Can often be done when ethical factors won’t allow experiments. | Doesn’t show causation. |
Experiment | To find cause-and-effect. | Shows cause-and-effect. | More costly. Ethical factors may make experiments impractical. |