Section B → Types of Data, Preparation, Collection 📊📈📉

Data Preparation 📊

Raw Data: Before analysis, raw data should be cleaned by removing outliers and fixing errors.
Think of CLEAR
- C: Check for Outliers – Identify and remove extreme values that might skew the data.
- L: Look for Missing Data – Ensure all necessary data is included or appropriately handled.
- E: Examine for Errors – Check for input mistakes or inaccuracies.
- A: Assess for Consistency – Ensure data is uniform and follows the same standards or format.
- R: Refine the Dataset – Finalize the cleaning process by making sure the data is ready for analysis.

Types of Data 📈

Categorical Data: Can be nominal (no natural order, e.g., colors) or ordinal (has a meaningful order, e.g., rankings).
Bivariate vs. Multivariate:
- Bivariate Data: Involves two variables (e.g., height vs. weight).
- Multivariate Data: Involves more than two variables (e.g., height, weight, age).

Data Collection Methods 📉

Experiments:
- Laboratory: Conducted in controlled settings/variables..
  - e.g.: Scientists test effects of new medicine on cell growth in petri dishes. (biology scenario)
- Field: Done in natural environments.
  - e.g.: Observing wild lions to understand their hunting patterns.
- Natural: The researcher has no control over variables.
  - e.g.: Studying the impact of a hurricane on local businesses without manipulating any variables.
Other Methods:
- Simulation: Uses models to replicate real-world processes.
  - e.g.: Use of computer models to simulate city traffic flow in order to determine best placement for traffic lights. (reduce traffic as much as possible)
- Questionnaires: Structured sets of questions for data collection.
  - e.g.: Surveys to students asking of their study habits & stress levels to find relation; involve topics such as hours spent studying, stress levels, on-and-off sleeping patterns.
- Observation: Collecting data by observing subjects directly.
  - e.g.: Researchers observing/noting how often students raise their hands or participate in discussions as means of how they interact with different teaching methods.
- Reference: Using existing data sources (e.g., books, articles).
  - Researches analyzing changes in demographic factors i.e. age, income, highest education level, etc..
- Census: Data collected from every member of a population.
  - e.g.: The French government collecting data on the # of people living in each home, their ages, and employment status.
- Population: The entire group under study.
  - The entire employee body of a law firm’s workforce totals 127 people. Each employee counts as part of the population.
- Sampling: Selecting a subset of the population for analysis.
  - e.g.: Selecting/sampling 100 students from the same high school to analyse academic performance. The sample population of the 100 students depending on the sampling method can potentially represent the larger population of 1,000 students. (the entire high school bodySampling Techniques

Judgment Sampling: Can introduce bias because the selection is based on the researcher's judgment.

Opportunity (Convenience) Sampling: Also prone to bias, as it doesn’t account for the entire population.

Stratification: Divides the population into subgroups (strata) to ensure representation of different segments.

Reliability vs. Validity 📊

Reliability: How consistently a method measures something. If repeated, results should be the same or similar.

Validity: Refers to how well a method measures what it is intended to measure.