Research Methods in Social Sciences - Data Collection and Sampling

Identifying Variables in Research

  • Variables are attributes, characteristics, or properties researchers explore and analyze.
  • They help establish relationships, make predictions, or validate hypotheses.

Types of Variables:

1. Independent Variables (IVs):
  • Also known as "predictor variables."
  • Researchers manipulate or vary these variables to observe their impact on dependent variables.
  • Example: 'Study hours' in a study analyzing the effect of study hours on exam scores.
2. Dependent Variables (DVs):
  • Also known as "response variables."
  • These variables are the outcomes or effects measured in the research.
  • They change as a result of alterations in the independent variable(s).
  • Example: 'Exam scores' in the same study.
Examples of Independent and Dependent Variables
  • Effects of teaching methods on student performance:
    • Independent Variable: Teaching methods.
    • Dependent Variable: Student performance.
  • Impact of sleep quality on academic success:
    • Independent Variable: Sleep quality.
    • Dependent Variable: Academic success.
  • How dietary choices affect energy levels:
    • Independent Variable: Dietary choices.
    • Dependent Variable: Energy levels.
  • Relationship between exercise frequency and mental health:
    • Independent Variable: Exercise frequency.
    • Dependent Variable: Mental health.
Additional Variable Types:
  • Research designs often incorporate other variable categories to enhance accuracy and reliability.
1. Control Variables:
  • These are kept constant or controlled across the study to isolate the effects of the independent variable on the dependent variable.
  • By controlling these variables, researchers can ensure that the observed changes in the dependent variable are directly attributable to the independent variable, not to external influences.
  • Example Scenario:
    • Research Question: Does exercise intensity influence heart rate?
    • Independent Variable: Exercise Intensity (categorized as low, moderate, high).
    • Dependent Variable: Heart Rate.
    • Control Variable: Age of Participants.
    • Purpose: Controlling for age helps ensure that differences in heart rate are due to changes in exercise intensity rather than age-related factors in cardiovascular response.
  • Significance:
    • Utilizing control variables allows for a more rigorous test of the hypothesis by minimizing the influence of external factors. This leads to more reliable and valid results, providing a clearer understanding of the causal relationships at play.
2. Mediating Variables
  • These are variables that explain the relationship between the independent and dependent variables.
  • They help elucidate the underlying mechanisms or processes through which the independent variable influences the dependent variable.
  • Example scenario:
    • Research Question: Does self-esteem mediate the relationship between social media use and well-being?
    • Independent Variable: Social media use
    • Dependent Variable: Well-being
    • Mediating Variable: Self-esteem (explains how social media use influences well-being through its effect on self-esteem)
  • Significance:
    • Understanding mediating variables allows researchers to dissect complex relationships and identify specific points for intervention. It enables a deeper understanding of how changes in one variable can lead to changes in another, mediated by a third variable
3. Moderating Variables
  • Moderating variables are specific types of variables that alter the strength, direction, or nature of the relationship between an independent and a dependent variable.
  • They essentially act as catalysts or inhibitors in the cause-effect relationship being studied, providing insights into the conditions under which certain outcomes occur.
  • Example Scenario:
    • Research Question: Does the effect of exercise on stress levels vary according to gender?
    • Independent Variable: Exercise (presence or absence)
    • Dependent Variable: Stress levels
    • Moderator Variable: Gender
    • Explanation: This scenario investigates whether gender affects how exercise influences stress levels. The assumption is that the impact of exercise on reducing stress might differ between males and females, suggesting that gender moderates this relationship.
  • Significance: By identifying and analyzing moderating variables, researchers can determine the variability of effects based on different subgroups or conditions. This specificity allows for more tailored and effective interventions or recommendations based on demographic or contextual factors.

Data Collection Methods

Data Collection Considerations:
  • The setting.
  • The data collection methods.
  • The data sources.

Interviewer-Administered Techniques:

Personal Interviewing:
  • Effective for participation.
  • Builds confidence.
  • Time-consuming.
  • Costly (requires trained interviewers).
  • Potentially difficult to reach every person in the sample.
Telephone Interviewing:
  • Lower costs.
  • Higher response rate than mail samples.
  • Better access than personal interviews.
  • Shorter data collection period.
  • Possibly less appropriate for personal questions.
Focus Group Discussion:
  • High participation rates.
  • Possible to explain the study and answer questions.
  • Not feasible to bring all people selected for the session.

Self-Administered Techniques:

Mail Procedures:
  • Low costs.
  • Minimal staff and facilities needed.
  • Provides access to widely dispersed samples.
  • Respondents can give thoughtful answers.
  • May not be an effective way to get people to reply.
Questionnaires:
  • Interviewer can explain the study and designate household respondents.
  • Doesn't require trained interviewers.
  • Respondents have time to give answers.
  • Field staff is required.
Internet Surveys:
  • Low costs.
  • Potential for high-speed returns.
  • Respondents have time to give thoughtful answers.
  • Challenge of getting people to respond.
  • Limited to internet users.

Data Types and Initial Processing:

Qualitative Data:
  • Examples: Interviews (audio/video recordings, notes, open-ended responses), questionnaires (written responses, opinions), observations (notes, photos, recordings), think-aloud (diaries, descriptions).
  • Initial Processing: Transcription of recordings, expansion of notes, synchronization between data recordings.
Quantitative Data:
  • Examples: Age, job role, years of experience, responses to close-ended questions, demographics, time spent on a task, number of people involved.
  • Initial Processing: Entry of answers to close-ended questions into a spreadsheet, data cleanup, filtering into different data sets.

Sampling

  • Sampling involves selecting a smaller, representative group from a population to determine truths about that population.
  • A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field,2005)(Field, 2005)

Why Use Sampling?

  • Costs less than a census.
  • To answer questions about the whole population.
  • Offers greater scope than census.
  • Possible to study the population of a larger area.
  • "Surveys draw representative samples from human populations whose observed characteristics provide unbiased estimates of the characteristics of those populations." (MarsdenandWright(2010))(Marsden and Wright (2010))
  • It is a process of selecting a number of individuals for a study in such a way that the individuals represent the larger group that are selected.

Purpose of Sampling

  • To gain an impression of population on certain issues/matters
  • To estimate a population
  • To test hypotheses/ parameter unproven theories
  • To draw conclusion/ represent the total population

Sampling Process

  1. Define the target population.
  2. Select sampling frame.
    • State the objective of the survey
    • Define the data to be collected
    • Define the measurement instrument
  3. Determine the sampling method.
  4. Plan procedure for selecting sampling unit.
  5. Determine sample size.
  6. Select actual sampling units.
  7. Conduct fieldwork.

Probability Sampling

1. Simple Random Sampling
  • Basic form of probability sampling.
  • Each member of a population is assigned an identifier.
  • Those selected are picked at random, often using software e.g. randomizer.
2. Stratified Random Sampling
  • The population is divided into sub-groups.
  • Within those sub-groups, a simple random sample is performed.
  • Enables a random sample that is representative of a larger population and its specific segment.
3. Cluster Sampling
  • A population is divided into clusters.
  • Clusters are unique but represent a diverse group.
  • From the list of clusters, a select number are randomly selected to take part in a study.
4. Systematic Sampling
  • Participants are selected using a fixed interval.

Non-Probability Sampling

1. Convenience Sampling
  • Uses people who are convenient to access to complete a study.
  • Quick and easy, but the results can be applied to a broader population.
2. Snowball Sampling
  • Recruits some sample members who then recruit people they know to join a sample.
  • Works well for reaching very specific populations who meet the selection criteria.
3. Purposive Sampling
  • The sample selection is left up to the researcher and their knowledge of who will fit the study criteria.
  • When studying specific characteristics, this selection method may be used, though bias may be introduced.
4. Quota Sampling
  • A population is divided into subgroups by characteristics, and targets are set for the number of respondents needed from each subgroup.
  • Main difference between quota sampling and stratified random sampling is that a random sampling technique is not used in quota sampling.

Calculating Sampling Size

Components Needed for Sample Size Calculation:
  • Step 1. Population size
    • How many people are you talking about in total?
  • Step 2. Margin of error (confidence interval)
    • A percentage that tells you how much you can expect your survey results to reflect the views of the overall population - +/- 5 %
  • Step 3. Sample proportion
    • It can use from previous survey results or be collected by running a small pilot survey. If unsure, one can use 0.5 as a conservative approach, and it will give the largest possible sample size.
  • Step 4. Confidence level
    • A percentage that reveals how confident you can be that the population would select an answer within a certain range - 90%, 95%, and 99% confident.
  • Step 5. Find your Z-score
    • The Z-scores for the most common confidence levels:
      • 90% – Z Score = 1.645
      • 95% – Z Score = 1.96
      • 99% – Z Score = 2.576
Sample Size Formula:
  • N: Population size; e: Margin of error; z: z-score; p: Sample proportion
Formula for estimated sample size (unknown population size)
  • Estimated Sample Size = (zscore)2(z-score)^2 x Standard Deviation x (1StandardDeviation)/(marginoferror)2(1 - Standard Deviation) / (margin of error)^2

  • Margin of error (confidence interval)

    • A percentage that tells you how much you can expect your survey results to reflect the views of the overall population - +/- 5 %
  • Standard Deviation

    • Estimate how much responses you received based on previous survey. If unsure, one can use 0.5 will give the largest possible sample size.
Determining Sample Size by Krejcie & Morgan (1970) Table
  • A table is used to determine the sample size needed to be representative of a given population.
  • No calculations are needed to use the table. For example, to ascertain the opinions of 9000 high school teachers, N = 9000 in Table 1, which gives a sample size of 368.
  • Formula used is X2NP(1P)+d(N)+X2P(1P)X²NP(1-P)+d(N)+X²P(1-P).

*X=the table value of chi-square for 1 degree of freedom at the desired confidence level (3.841)
N-the population size.

  1. 96 x 1.96 3.8416
    P-the population proportion (assumed to be .50 since this would provide the maximum sample size).
    d = the degree of accuracy expressed as a proportion (.05).
Raosoft Sample Size Calculator
  • Online tool requiring:
    • Margin of error.
    • Confidence level.
    • Population size.
    • Response distribution.

Questionnaire

  • “Questionnaire is a research tool consisting of a series of questions and other prompt (fact) asked to individuals to obtain statistically useful information about a given topic.” (Pandya(2010))(Pandya (2010))
Purpose of the Questionnaire
  • To extract data from the respondents.
  • It is the vehicle used to pose the questions that the researcher wants respondents to answer.
  • The questionnaire is probably most used and most abused of the data gathering devices.
  • Normally used where one cannot see personally all of the people from whom he desires responses.
Advantages of Questionnaires
  • Easily prepared and administered.
  • Provide structured and standardised process for data gathering.
  • Helps in saving more time, money, and energy.
  • Increase speed and accuracy of recording.
  • Provide anonymity to respondent.
  • Suitable for geographically scattered and large population.
Disadvantages of Questionnaires
  • If improperly designed, questionnaires can lead to incomplete responses received.
  • The data become less reliable.
  • Not suitable for all segments.
  • Occurrence of error.
  • Answer obtained can be wrong.
Characteristics of a Good Questionnaire
  • Its significance is clearly stated in the cover letter.
  • Provide anonymity to the respondent.
  • Only seeks data that cannot be obtained from secondary resources like books, reports, and records.
  • Use appropriate wording and as short as possible.
  • It is attractive in appearance and clearly stated.
  • Directions are clear and complete; important terms are clarified.
  • Questions are objective, with no clues, hints, or suggestions.
  • Questions are arranged according to logical sequencing.
Elements of a Questionnaire
  1. Title: Clear and captivating.
  2. General Information: Description and purpose of the study; assurance of confidentiality.
  3. Specific Instruction: Concise demonstration of how to carry out the questionnaire.
  4. Questionnaire Items: Main part of the questionnaire schedule.
  5. Ensure there is no wrong or right answer
  6. Additional Information: Full information of the researcher or administrator.
  7. Thank You
Questionnaire Design Process
  1. Specify the information needed.
    • Define the survey/ research objective.
    • Determine the list of variables to be measured.
    • Determine the target group.
  2. Specify the type of method of reaching respondent
    • Personal interview, group discussion, mail questionnaire, telephone interview, online questionnaire
  3. Decide on the content of question.
    • Always be prepared to ask, "Is this question really needed?"
  4. Decide on the question structure.
    • Choose between structured, unstructured, or semi-structured.
  5. Determine the question wording.
    • Words should be easily understood by the respondents.
  6. Identify the form and layout.
    • The format, positioning, and spacing of questions has a significant effect on the results.
  7. Pilot test.
    • Testing the questionnaires on a small sample of actual respondents.
  8. Final draft.
    • Process of setting up the questionnaire in its final form - appropriate order, numbering questions, and inserting interviewer instructions.

Type of Data

1. Nominal
  • Used for labeling variables, without any quantitative value.
  • Considered simply as “labels.”
2. Ordinal
  • The order of the values is what’s important and significant.
  • It is the order of the values is what’s important and significant.
  • Typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc.
3. Interval
  • Numeric scales in which we know not only the order but also the exact differences between the values.
  • Hold no true zero and can represent values below zero, such as -10 degrees.
4. Ratio
  • Numbers can be compared as multiples of one another.
  • Ratio scale never falls below zero, such as height and weight measure from 0 and above.
Characteristics and Examples of Data Types
TypeCharacteristicsBasic OperationExample
NominalClassification, but no order, distance, or natural originDetermination of equalityGender (male, female); Marital Status (single, married, divorced)
OrdinalClassification and order, but no distance or natural originDetermination of greater/lesser valueHappiness (Very unhappy to Very happy); Satisfaction (Very unsatisfied to Very satisfied)
IntervalClassification, order, and distance but no natural originEquality of intervals/differencesTemperature in degrees
RatioClassification, order, distance, and natural originEquality of ratiosAge in years
Type of Questionnaire
  • Open-ended
  • Close-ended
    • Simple dichotomous
    • Multiple choice
    • Likert-scale
  • Mixed
    • Contingency
    • Matrix
  • Unstructured
  • Structured
  • Semi-structured

Structured Questionnaire

  • When to use:
    • Require rated or ranked data.
    • Good idea of how to order the ratings in advance.
    • Want respondents to answer using a pre-specified set of response choices.
    • Prefer to count the number of choices.
    • Will report statistical data.

Structured (Closed-Ended) Questions:

  • Simple Dichotomous: Two possible responses (e.g., Yes/No).
    *Example: Do you have a library membership card? Yes ( / ) No ( )
  • Multiple Choices: Researcher provide a choice of answers and respondents are asked to select one of the alternatives given.
    *Such as age group; income level, etc.
  • Example: What purpose do you visit the library? You may answer more than one. To read news papers ( / ) To refer books ( / ) To borrow and return book ( / ) To print assignment ( )
  • Likert Scale: Measure respondents’ attitudes by asking the extent to which they agree or disagree with a particular question or statement.
  • *Example: To what extent the information obtained from the ilearn are useful to you?
  1. Unsatisfied 2. Somewhat satisfied 3. Neutral 4. Satisfied 5. Extremely satisfied*
Structured (Closed-Ended) Questions: (Continued)
  • Matrix: Multiple questions presented on a grid.
  • Example: Please rate how much you like our food at the event?
AwesomeGoodFairAveragePoor
Breakfast
Lunch
Dinner
  • Contingency: A question that is asked only if the respondent gives a particular respond to a previous question.
    *Example: Do you smoke? Yes ( / ) No ( )
    If yer, how many times did you smoke in a day?
  1. Once 2. 2 to 5 times 3. 5 to 10 times 4. More than 10 times*

Unstructured Question

  • When to use:
    • Believed that the respondents' own word is essential.
    • Respondents are capable of providing answers in their own words.
    • The choices of responses are unknown.
    • Have the skills to analyze respondents' comments, even though answers may vary considerably.
    • Will thematically report the data based on the pattern of the verbal responses.
    • Example: How can we promote youth empowerment in Malaysia?

Semi-Structured Question

  • When to use:
    • Require rated or ranked data.
    • Provide some flexibility to respondents in the hope for better responses.
    • Want respondents to answer using a pre-specified set of response choices, but are also capable of providing their own answers.
    • Are able to analyze both the statistical and non-statistical data.
    • Will report statistical data along with the verbal responses.
      Example: What purpose do you use web based resources?
      For research work To write assignments To improve subject knowledge For the purpose of seminar presentation  Any other (Please specify):

Interview

Unstructured Interview
  • Interviewer introduces the topic briefly.
  • No sequences of questions.
  • Type of questions varies (depending on the management level).
  • Records the replies.
  • Respondent can say as much as they like.
  • Aim - to explore the various factors/variables in the situation that might be central to the broad problem area.
  • Variables that need to be explored further or that need more focus.
Structured Interview
  • Used when the researcher knows exactly what information is needed (e.g., as a result of an unstructured interview).
  • Likely to focus on questions that surfaced during the unstructured interview.
  • Interviewer has a predetermined set of questions (can be a list, a schedule, or a questionnaire).
  • Asks questions and records responses.
  • Little scoop of probing issues.
  • May use supporting materials, visual aids, pictures, line drawings, cards, other materials.

Observational Survey

  • No questions asked.
  • Observed people in their natural environment, or lab setting and record their behavior.
  • Research can be:-
    • Non-participant observer (pure researcher)
    • Participant observer (become part of research setting)

Case Studies

  • Case – the particular occurrence of the topic of research (what, where, how, etc.).
    • e.g., procurement method of a project, a particular building, etc.
  • Variety of data collection techniques:
    • A combination of interviews and hard documentary data.
    • Many use questionnaires to gain an understanding of the general situation of the case.