research methods 2
psychology as a science
Science
Kuhn
Scientific methods — Skinner’s rats
Scientific terminology — ‘synaptic transmission’, ‘variables’
Popper
Falsifiable — Skinner’s rats, brain scans
Replicable — strange situation
Objective — Skinner’s rats, brain scans
Not science
Kuhn
No paradigms/paradigm shifts — bio vs cog., not agreed assumptions
Doesn’t have scientific methods — Genie and other case studies
Doesn’t have scientific terminology — schema (unscientific)
Popper
Not falsifiable — Freud’s unconscious
Not reliable — Genie and other case studies
Not objective — case studies, schema
kuhn
Criteria of a science:
✓
Scientific methods, e.g lab experiments
Scientific terminology, e.g independent/dependent variable or hypothesis, synaptic transmission
X
Assumption about behaviour (shared/agreed with), e.g different approaches
Paradigm shifts
A paradigm is a shared belief on how something works, a paradigm shift is the change in paradigm when an older theory is disproven and a newer theory is introduced
Psychology is different because there are different approaches to test ideas, over one unifying theory to test ideas, which means paradigm shifts are less likely to happen because psychology is a broad subject with too many researchers researching on different things that the chance of finding a theory wrong and creating a new theory is reduced, meaning it is harder to get to the truth
karl popper
Falsifiability
Trying to prove theory is biased and easy, trying to disprove it is more scientific (verification)
Scientific topic — empirically testable, so you have the ability to prove it false. Impossible to ‘prove’ a scientific theory is 100% true because no amount of evidence assures no contradicting evidence can ever be found
Science only accomplished by proving theories wrong, which enables evolution/development of the subject because 1 single observation that fit the theory means the theory is debunked and a new theory will replace it
Freud’ psychodynamic which is unfalsifiable, as ‘unconscious’ cannot be empirically tested to see if it exists or not
Replicable — ‘reliability’, researcher replicates the study to see if its the same results. Standardised procedures = easier to replicate = easy to test reliability (external)
E.g reliable: strange situation, Harlow’s monkeys. Not reliable: Genie
Objective — ‘unbiased’, not affected by researcher’s own opinions and feelings, must be objective to be truly scientific
E.g objective: experiments like Skinner, brain scans. Subjective: case studies or ppt observations, schema
analysing qualitative data
content analysis
Qualifying qualitative data through indirectly observing the presence of certain words, images or concepts (coding units) within the media (advertisements, books, films etc.) and counting the number of times they occur. INferences about the messages within the data are then made. It is usually carried out on secondary data, e.g data already published
Waynforth and Dunbar carried out a content analysis of lonely hearts columns, discovered distinct gender differences in the way men and women advertised themselves (men advertised resources like job, house etc. and sought attractive, youthful partners, while women advertised attractiveness and sought resources)
Evaluations:
Advantage — relatively reliable, method is easy to replicate through others using the same materials and examining the same source. Because categories to look out for are already created before the investigation begins and therefore are objective. Stronger chance of being reliable
Disadvantage — only collects numerical data. Does not reveal underlying reasons for behaviour, only that they exist. For example, recording times men look for beauty but not what type of beauty. May make it difficult to apply these findings to society as the data has limited uses
thematic analysis
Method for identifying and reporting themes within data, using coding. Organises, describes and intercepts data, the identified themes become categories for analysis and the process of coding involves 6 stages:
Familiarisation with the data — read the content
Coding — decide what codes you will use to record (i.e numbers, letters, columns)
Searching for themes — read back through after identifying themes you want to find and use your code to annotate themes
Reviewing themes — ensure themes are relevant/not too exclusive or inclusive, after searching
Defining and naming themes — identifying clear and concise names for themes so they make sense at the write-up
Writing up — write up the report of what this is stating or inferring
Preferred over content analysis as involves comparison of themes, identification of co-occurrences and graphs to display differences in themes
Evaluations:
Advantage — looks at meanings and preserves qualitative data, due to finding qualitative themes in data, it can provide rich insight into why a behaviour occurs. More sensitive
Disadvantage — inter-observer reliability often low, qual. data relies on subjective interpretation — researchers may disagree which theme the info fits into. Method suggests reasonably low reliability, the outcomes of the method are not consistent
reliability
Consistency of a test or procedure, one aspect is testing consistency between different observers (inter-rater reliability) and another is achieving consistent measuring instruments when developing tests:
Internal reliability — the extent to which something is consistent within itself, e.g scales should measure the same weight between 50g or 100g etc, ruler is another consistent measurement
External reliability — this concerns the extent to which the test measures consistency over time (gets consistent results)
Ways of assessing and improving reliability
Split half method — measures the internal reliability by splitting a test into 2 halves and having the same ppt do both halves. If the 2 halves of the test provide similar results, this indicates the test has internal reliability
Test retest method — measures the external reliability by giving the same test to the ppt on 2 occasions. If the result is obtained then the test has external reliability
Inter-observer reliability — this measures whether different observers are viewing or rating a behaviour in the same way. If it low, could lead to biased observation
Can be improved by:
Develop clearly defined and separate categories for observational criteria
Increase the number of observers as data will be less subject to bias
Record the test/experiment so it can be reviewed at any time
validity
Concerns accuracy, to which something measures what it claims to measure and the extent to which findings can be generalised beyond research setting:
Internal validity — this concerns whether results are to do with the manipulation of the Iv instead of confounding variables
External validity — this concerns the extent to which can experiment’s results can be generalised to other settings (ecological validity), other people (population validity) and over time (temporal validity)
Improved by a more natural or realistic setting
Ways of assessing validity
Face validity — simple way of assessing validity. Extent to which results look like what a test claims to measure, not scientific and usually only used for pilot study
Concurrent validity — correlating scores on a test with another test that is already known to be valid, i.e a new IQ test would check the score with another IQ test that is already known to be valid to see if they’re the same
Predictive validity — involves testing a group of subjects and then comparing scores to the results obtained at some point in the future to see if they match, i.e a school entrance test predicting later exam results
Temporal validity — use of the test rest method in another tie context and see if the same results were found, e.g conduct an investigation on how oppressed females feel 1901 and 2020 — different results = low temporal validity
choosing an appropriate statistical test
Statistical tests reveal whether results are significant and experimental hypothesis can be accepted or rejected
Difference (1 dependent variable) or correlation (2 dependent variables)
Level of data/measurement
Experimental design
Nature of hypothesis Level of measurement Ind. Rep.
Difference Nominal Chi-squared Sign test
Ordinal Mann-Whitney U Wilcoxon
Interval Ind. t-test related t-test
Correlation Ordinal Spearman’s rho
Interval Pearson product moment
analysing quantitative data
Interval data — difference between measurements but no true zero. Distance between units is internally reliable, usually standardised unit of measurement, e.g kg, mm
Ordinal data — ordered categories (rankings, order). Data which can be ranked, ordered or scored but what is being measured is not internally reliable/standardised
Nominal data — categories (no ordering or direction). Data which can be ranked, ordered or scored but what is being measured is not internally reliable/standardised, e.g self report techniques, scores. No individual score, number represents frequency of a behaviour/category
calculating significance
To calculate significance of results, observed/calculated value is compared with critical value which is provided in a critical value table.
Observed value from statistical test
Directional or non-directional hypothesis
Number of participants
Level of significant (5% or 1%)
Mann whitney, wilcoxon and sign tests require
Chi-squared — uses ppt/d.f
Mann Whitney U test — number of ppts in n1 and n2
Spearman’s rho — number of ppts
Sign test — not always given observed value, get rid of ‘same’/’no difference’ and add up each category, least occurring is observed value. One tailed and two tailed have different significance levels
probability
Probability 0 = WON’T happen, 1 = WILL happen
Significant can accept experimental hypothesis, while not significant can accept the null
P<0.05 means there is a 5% risk that results gained are due to chance and therefore we have accepted and rejected the wrong hypothesises
After conducting the [] test, the results of this test are significant/not significant at the significance level of p<0.05, with [] ppts using a one/two tailed hypothesis. This is because the observed value is MORE/LESS THAN the critical value. In this case, the experimental/null hypothesis can be rejected and the null/experimental hypothesis can be accepted
type 1 and 2 errors
Type 1 error (possible if obtained significant result) —detecting a significant difference between your IV conditions that is not actually present. This is more likely when your significance level is lenient, dangerous
Type 2 error (possible if obtained not significant results) — failing to detect a significant difference that should be present. This is more likely when using a probability level too strict, less likely to get product approved/publish results
Anything more lenient than 0.07 is too lenient, anything less than 0.03 is strict, 0.05 is used as it cancels/averages out chances of type 1 and 2 errors.