statistics

Research Variables

By: Claudine T. Villa
Inspired by: Mathspace

Learning Outcomes

Differentiate between research variables and measurement.
Identify the correct sampling size using Slovin's formula.
Determine the appropriate research design and sampling technique to use.

Variables and Measurement

Definition of Variables:
- Factors that can be manipulated and measured.
- Characteristics or attributes of persons or objects that assume different values across different objects under consideration.

Classification of Variables

Discrete and Continuous Variables

Discrete Variable:
- Countable infinite number of values.
- Usually measured by counting or enumeration.
- Example: Students, professors, psychologists, counselors, hospitals.
Continuous Variable:
- Cannot be counted due to their distinct divisions.
- Abstract variables that assume values corresponding to a line of intervals.
- Example: Intelligence, beauty, effectiveness, cleanliness, weight, height, temperature.

Qualitative and Quantitative Variables

Qualitative Variable:
- Provides categorical responses.
- Example: Occupation, gender, civil status, religious affiliation, political parties.
Quantitative Variable:
- Numerical values representing an amount or quantity.
- Example: Height, salary, number of children, weight, time.

Dependent and Independent Variables

Independent Variable:
- The variable that the researcher controls or manipulates according to the purpose of the investigation.
Dependent Variable:
- Measures based on the effect of the independent variable.
- Example: To determine the predictive validity of entrance requirements for freshman students, the independent variables include the national achievement test, entrance examination, and school grades, while the dependent variable is the performance in first-year college.

Cause and Effect

Matching Exercise:
- Cause: Decrease in the number of continuous rainy season.
- Effect:
- Grocery items you can buy (increased price of goods).
- Increase in umbrella sales.

Independent Variable

Changes to this variable will affect the other variable.

Dependent Variable

A variable whose value is affected by another variable.
Example Structure:
Time (in hours) spent studying
Exam Score

4
84

3
80

6
95

2
76
Identification:
- Which variable is dependent?
- Explanation of dependency.

Time (in hours) spent studying	Exam Score
4	84
3	80
6	95
2	76
Identification:

Think-Pair-Share Activity

Discussion prompt: How do independent and dependent variables relate to cause and effect?
- Use the example below for discussion:
- Price of goods vs. number of grocery items you can buy.

Variables and Measurement

Classification of Variables

Univariable, Bivariable, and Multivariable Distribution

Univariable Distribution:
- Involves only one variable.
- Example: Age of Grade 7 pupils, Temperature, Sales.
Bivariable Distribution:
- Data classified based on two variables.
- Example: Ice cream shop monitoring ice cream sales versus temperature of the day.
- Data Structure:
  | Temperature | Sales |
  | ------------ | ------ |
  | 14.2 | Php 215|
  | 16.4 | 325 |
  | 11.9 | 185 |
  | …. | …. |
Bivariate Data: Numerical data consisting of two variables organized into pairs of values.
- Examples: Hours studied vs. score on the exam, Favorite ice cream flavor vs. number of students.

Multivariable Distribution

Involves three or more variables.
- Example: Tracking enrollment in college based on program, year level, and gender.
- Data Structure Example:
  | Grand Total | Program | Year Level | M | F |
  | ----------- | ------- | ----------- | - | - |
  | 1,095 | Psycholo| 1st Year | 115 | 178 |
  | …. | …. | …. | … | … |

Levels of Measurement

Nominal Scale

Classification without numerical value.
Also called categorical scales or categorical data.
Examples: Sex, employment status, marital status.

Ordinal Scale

Classifies and ranks subjects based on degree of possession of a characteristic.
Example: Classroom performance rankings (5 - outstanding to 1 - poor).

Interval Scale

Combines characteristics of nominal and ordinal scales with predetermined equal intervals.
Examples: Heights, weights, prices.
Note: Lacks a true zero point (e.g., IQ test scores ranging from 0 to 200).

Ratio Scale

Represents the highest, most precise level of measurement.
Contains a meaningful zero point (where quality being measured does not exist).
Examples: Height, weight, time, distance, and speed.

Identifying Qualitative and Quantitative Variables

Task: Classify the following:
1. Type of school - Qualitative
2. Number of words correctly spelled - Quantitative, Discrete
3. House ownership - Qualitative, Nominal
4. Civil status - Qualitative, Nominal
5. Educational attainment of respondents - Qualitative, Ordinal
6. Job satisfaction of employees - Qualitative, Ordinal
7. Favorite color - Qualitative, Nominal
8. Number of siblings - Quantitative, Discrete
9. Study habits - Qualitative, Ordinal
10. Faculty evaluation - Qualitative, Ordinal

Measurement Levels Categorization

Examples with Reasons:
1. Ranking of college team - Ordinal (has order but unequal differences between ranks).
2. Student number - Nominal (identifier only).
3. Temperature in Celsius - Interval (equal intervals but no true zero).
4. House number - Nominal (labels/identifiers).
5. Brands of soft drinks - Nominal (no order).
6. Socio-Economic Status - Ordinal (ordered categories).
7. Number of vehicles registered - Ratio (true zero).
8. Zip Code number - Nominal (codes used as labels).
9. Annual income - Ratio (true zero).
10. Amount of time spent on online games - Ratio (true zero).

Population and Sample

Population

Total or entire group of individuals, events, objects, observations, reactions with unique patterns and characteristics from which information is sought. This is referred to as the universe in statistical investigation.

Sample

Portion or subset of the population used to gather information. Represents the unique qualities or characteristics of the population.

Essential Steps in Determining Sample Sizes

Determine the population from which the data is needed.
Identify the target group to generalize the study's results.
Determine the kind of sample to be drawn.
Establish desired sample size using Slovin's formula: $n = \frac{N}{1 + Ne^2}$
- Where:
  - $n$ = Sample size
  - $N$ = Population
  - $e$ = Estimated margin of error (acceptable error; maximum = 5% or 0.05).

Parameter and Statistics

Parameter (μ): Measures of the population or numerical characteristic of the population.
Statistics: Numerical value that describes the sample; synonymously used with estimates.

Probability Sampling Method

Definition

A sampling process where each unit in the population has a known non-zero probability of being included in the sample.
Most unbiased yet difficult to execute.

Types of Probability Sampling

Simple Random Sampling:
- Each member has an equal chance of being selected.
- Can be performed via fishbowl technique, lottery, or random number tables.
- Advantages: Easy to understand and apply.
- Disadvantages: May be difficult for large populations; best used for geographically close populations.
Stratified Random Sampling:
- Samples randomly selected from different groups or sections.
- Population divided into sub-populations (strata) based on factors like age, gender.
- Each stratum is subjected to simple random sampling.
- Advantages: More accurate, tailored sampling designs.
- Disadvantages: Stratum variable values may be hard to access.
Systematic Random Sampling:
- Every kth name on a list is selected, useful in arrangements like alphabetical listings.
- $K = \frac{N}{n}$ where $N$ is population and $n$ is sample size.
- Advantages: Easy sampling process.
- Disadvantages: Can lead to bias with periodicity in the population.
Cluster Sampling:
- Identifies naturally occurring group units for sample selection.
- Clusters should ideally be heterogeneous.
- Advantages: Efficient and cost-effective.
- Disadvantages: Can be misleading.
- When to use: When population can be grouped into clusters.
Multi-Stage Sampling:
- Used for large geographical areas with respondents spread out.
- Involves multiple sampling stages.
- Advantages: Avoids random sampling issues in large populations.
- Disadvantages: Subjectivity arises during group selections.

Non-Probability Sampling

Definition

Sampling methods where selection probabilities aren't specified for individual units in the population. Used when generalization isn't necessary.

Types of Non-Probability Sampling

Purposive Sampling:
- Selects respondents based on judgment of who can provide the best information.
Convenience Sampling:
- Selection based on availability of respondents during data collection.
Quota Sampling:
- Researcher sets a quota and selects participants accordingly.
Snowball Sampling:
- Utilized when subjects are difficult to identify; recruits through referrals from known participants.

Research Design

Action Research

Used for investigating localized problem-solving.

Descriptive Research

Aims to understand the characteristics and aspects of a situation.

Explanatory Research

Seeks to explain relationships between two or more variables.

Exploratory Research

Investigates phenomena not well understood.

Correlational Research

Examines the significance of relationships between characteristics or factors.

Evaluation Research

Assesses the impacts or outcomes of actions, policies, or programs.

Policy Research

Generates information relevant for policy development and assessment of impacts.

Ex-Post Facto or Causal-Comparative Research

Observes existing conditions and explores causal factors retrospectively.

Historical Research

Addresses problems arising from historical contexts using past data.

Ethnographic Research

Seeks holistic descriptions of phenomena through multiple data collection techniques.

Phenomenological Research

Begins with shared experiences and investigates effects through respondents' narratives.

Assessment Questions

Temperature reading:
- B. Interval data
Comparing number of girls to boys:
- D. Nominal data
Grade ranking of senior class:
- A. Ordinal data
Eye color of students:
- D. Nominal data
Grade percentage in Science test:
- C. Ratio data
Top 50 movies:
- A. Ordinal data
Jersey numbers of players:
- D. Nominal data
Most expensive cars:
- C. Ratio data
Weight of children:
- C. Ratio data
Waist measurement of contestants:
- C. Ratio data

Sample Size Estimation Questions:

Estimate the sample size from 5000 students using 5% error.
Sample size needed for 600 target population of Psychometricians with 1% error.

Identifying Sampling Techniques:

Interviewing 20 friends for initial insights - Convenience Sampling.
Selecting 10 specialized neurosurgeons - Purposive Sampling.
Studying a rare disease with referrals - Snowball Sampling.
Surveying 50 out of 500 with random number generation - Simple Random Sampling.
Surveying student satisfaction proportionally across years - Stratified Random Sampling.
Surveying households in randomly selected city blocks - Cluster Sampling.