Obtaining Data in Engineering Data Analysis

Introduction to Engineering Data Analysis

  • The primary focus of Module 1 is the process of obtaining data, which serves as the foundation for Engineering Data Analysis.
  • This module aims to provide a comprehensive understanding of data gathering techniques and the practicalities of conducting surveys and experiments.
  • Students, especially those from the STEM (Science, Technology, Engineering, and Mathematics) track in junior and senior high school, are expected to apply their prior research experience to the final outputs of this course.
  • The final assessment for the summer class involves the application of a complete engineering data analysis, which includes conducting surveys, testing, and applying various analytical methods.

Fundamental Concept of Data

  • Definition of Data: Data is defined as a systematic record of a particular quantity. It is a collection of facts and figures gathered for a specific purpose, such as a survey or a mathematical analysis.
  • The Purpose of Data Gathering: Data should never be collected without a clear intent or objective. Collecting data without a purpose (e.g., gathering arbitrary numbers in a classroom) results in wasted effort.
  • Organized Data and Information: The relationship between raw data and usable knowledge is summarized by the concept: Organized Data=Information\text{Organized Data} = \text{Information}. Once data is collected, it must be analyzed and organized to answer a specific hypothesis or fulfill the purpose of the study.
  • Data Categories: Data is broadly classified into two types: Qualitative and Quantitative.

Qualitative Data

  • Qualitative Data (Categorical Data): This describes data that fits into specific categories rather than numerical values. It is based on natural language specifications rather than numbers.
  • Examples of Categorical Variables:
    • Gender.
    • Hometown.
    • Wisdom.
    • Cleanliness.
    • Creativity.
  • Types of Qualitative Data:
    1. Nominal Data:
    • A type of qualitative information used to label variables without assigning numerical value.
    • It is often referred to as the nominal scale.
    • It cannot be ordered or measured in a hierarchical sense.
    • Examples include letters, symbols, words, and gender.
    • Analysis Method: Nominal data is analyzed using the grouping method. Data is categorized, and researchers calculate frequencies or percentages (e.g., finding that 20%20\% of a group prefers a certain category).
    • Visualization: Nominal data is typically represented using pie charts to show the distribution of categories.
    1. Ordinal Data:
    • Unlike nominal data, ordinal data follows a natural order or hierarchy.
    • The values have a relative rank, but the exact difference between those ranks is not determined or fixed.
    • Example: A survey of fast-food chains like Jollibee, McDonald's (McDo), and KFC. If 5050 out of 100100 respondents choose Jollibee, it is ranked as number 11. While there is a clear hierarchy (Rank 11, Rank 22, Rank 33), the objective "distance" in quality between them is subjective and not mathematically fixed.

Quantitative Data

  • Quantitative Data: This type of data is measured and expressed numerically. Unlike qualitative data, it is not merely observed but quantified.
  • Characteristics: Quantitative data allows for mathematical operations such as addition and division (e.g., computing an average).
  • Types of Quantitative Data:
    1. Discrete Data:
    • This consists of a finite number of possible values.
    • Discrete data usually involves whole numbers that cannot be broken into decimals in a real-world context.
    • Example: The number of students in a year level. You can have 350350 students, but you cannot have 300.5300.5 students.
    1. Continuous Data:
    • This data can take any value within a range and often involves heavy numerical values and decimals.
    • It is the type of data most frequently encountered when conducting scientific experiments.
    • Statistical Applications: Continuous data is often subjected to complex tests such as the χ2\chi^2 (Chi-square) test or the t-testt\text{-test} in research and theses.

Data Collection and Hypothesis Testing

  • Data Collection: A methodical process of gathering and analyzing specific information to be subjected to hypothesis testing.
  • Hypothesis Testing: Data is collected to answer a specific question or test a claim.
    • The Research Question: For example, a researcher might ask about the difficulty of Engineering Data Analysis for first-year students.
    • The Hypothesis: A formal statement such as, "Engineering Data Analysis is a very hard subject."
    • Analysis: After gathering survey data, the researcher applies statistical tests (t-testt\text{-test}, etc.) to either accept or reject the hypothesis.

Primary vs. Secondary Data

  • Primary Data:
    • Raw data collected directly from the source.
    • It is original data collected by the researcher for a specific research purpose.
    • Sources include original surveys and experiments.
  • Secondary Data:
    • Secondhand data that has been previously collected and organized by someone else.
    • The current researcher is not the original user.
    • Examples include using datasets from existing studies or research papers.

Six Methods of Data Collection

  1. Literature Sources:
    • Data extracted from textbooks, government/private company reports, newspapers, magazines, and online published articles.
    • Classified as Secondary Data Collection.
  2. Surveys:
    • Information gathered through questionnaires.
    • Captures individual or group experiences regarding a specific subject.
    • Delivery Methods: Web-based questionnaires (e.g., Google Forms/GForms) or paper-based questionnaires.
  3. Interviews:
    • A more intensive engagement between the respondent and the interviewer.
    • Allows for follow-up questions and in-depth exploration that questionnaires may miss.
  4. Observations:
    • Gathering information by monitoring participants in a specific situation or environment.
    • Example: Testing the effectiveness of a skin ointment. A researcher observes the subject over a day or a week to see if there is improvement.
    • Often used to gather quantitative data in experimental settings.
  5. Documents and Records:
    • Involves tracking organized data over a period of time.
    • Examples include examining call logs, email logs, databases, minutes of meetings, and staff reports.
    • Specific Case: Medical records can track the health progress of an individual over a year of monthly check-ups.
    • Difference from Literature: Records are typically longitudinal and tracking-oriented.
  6. Experiments:
    • A research method used to establish a causal relationship between two or more variables.
    • It involves scientific and mathematical study to examine interactions within a controlled environment.

Six Major Steps in Planning and Conducting a Survey

  1. Establish the Purpose: Decide on the topic or idea. For example, ranking the best fast-food chains in the Philippines to promote a blog.
  2. Identify the Target Group: Determine who the respondents will be (e.g., residents within Metro Manila).
  3. Plan the Reach Out Method: Decide how to distribute the survey. In the modern era, online questionnaires like Google Forms are considered the most practical and accessible.
  4. Design the Questions: Select the variables and limit the scope. For a fast-food survey, criteria might include the quality of fries, burgers, or chicken, as well as specific attributes like "crispiness" or "juiciness."
  5. Draft and Proofread: Create an initial version of the survey, then check it for errors and clarity before finalizing.
  6. Finalize and Submit: Print the questionnaire or finalize the digital link for distribution and eventual analysis.