Foundations of Data, AI, and Sustainability — Study Notes

Data Foundations: Key Concepts, Types, and Applications

What is Data?

  • Definition overview from multiple sources:

    • Russell Dawson (2023, Data Analytics): Data is a piece of information that usually lacks context; when multiple data points are gathered together, we have raw data.

    • Oxford Dictionary of Data Science (2021): Data are facts and statistics collected together for reference or analysis.

    • Merriam-Webster (2025): Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.

    • O'Reilly Data Science Handbook (2018): Data are recorded information, often numeric, collected from observations, surveys, experiments, or digital traces, which can be analyzed to generate insights.

  • Raw data vs information:

    • RAW DATA: Numbers or data points without context; no meaning until processed.

    • INFORMATION: Meaningful data that provides context and can inform decisions.

    • DECISION-MAKING: Actions taken based on information (e.g., provide a remedial session).

  • Simple illustration of transformation:

    • RAW DATA: 87 90 78 65 92 73 90 76 89

    • INFORMATION: e.g., Average score = 82%

    • DECISION-MAKING: Implement remedial action based on the average.

  • Raw data definition (precise):

    • Raw data refers to a data point (or points) that has not yet been treated or processed.

  • Data vs. Information (illustrative example):

    • Data: CAT ROOF IS GRAY THE ON THE

    • Information: THE GRAY CAT IS ON THE ROOF

  • Why data is valuable for educators (contextual relevance):

    • Data as a resource informs teaching and learning decisions, assessment design, and resource allocation.

Data Scales

  • Data can be qualitative (non-numeric expressions, usually text) or quantitative (expressed in numerical values).

  • 2.1 Qualitative (categorical) data types:

    • Nominal data: Categories with no inherent order.

    • Ordinal data: Categories with a meaningful order but not necessarily equal spacing.

  • 2.1 Qualitative data types – Nominal data:

    • Definition: Labels or names with no ranking.

    • Education example: Subject enrolled (Math, English, Science).

    • Sustainability example: Waste type (Plastic, Paper, Organic).

  • 2.1 Qualitative data types – Ordinal data:

    • Definition: Categories with ranking but not equal spacing.

    • Education example: Student rating of teacher (Poor, Fair, Good, Excellent).

    • Sustainability example: Air quality index categories (Good, Moderate, Unhealthy).

  • 2.2 Quantitative (numerical) data types:

    • Numeric data expressed in numerals; examples include test scores, study hours, age, etc.

    • Examples presented as a sequence of numbers (illustrative data values).

  • 2.2 Quantitative data types – Ratio data:

    • Context examples: Education (Test Scores 0-100, Study Hours per Week, Number of Students), Sustainability (Electricity Use in kWh, Water Consumption in liters/day, Carbon Emissions in tons CO2), Everyday Life (Age in years, Weight in kg, Income in currency units).

    • Example values:

    • Test Scores: 0, 45, 72, 100

    • Study Hours/Week: 0, 5, 10, 20

    • Number of Students: 0, 10, 25, 50

    • Electricity Use (kWh): 0, 100, 250, 400

    • Water Consumption (liters/day): 0, 20, 75, 150

    • Carbon Emissions (tons CO2): 0, 1.5, 3.0, 6.0

    • Age (years): 0, 5, 15, 30

    • Weight (kg): 0, 50, 75, 100

    • Income: 0, 10{,}000, 20{,}000, 50{,}000

  • 2.2 Quantitative data types – Interval data:

    • Context examples: IQ Scores (e.g., 90, 100, 110, 120), Year of Study (e.g., 2010, 2015, 2020), Test Scores (Scaled: 200, 250, 300), Temperature (Celsius): 0°C, 10°C, 20°C, Calendar Dates: 1990, 2000, 2020, Air Quality Index (AQI): 50, 100, 150.

  • Note on page 27: a placeholder or missing content is indicated by '???????'.

Data Set Types

  • A data set is a collection of data arranged in a table where each column/row has a specific data type.

  • 3.1 Numerical data set: data points that are words establishing a category of characteristic (note: in practice, numerical vs categorical labeling is used, but the slide outlines numerical data as a dataset that supports mathematical calculations).

  • 3.2 Categorical data set: data focused on categories (e.g., gender, species, color) and textual descriptors.

  • 3.3 Bivariate data set: a dataset with two variables of different categories (e.g., age and height of children in a class) to explore relationships between the two variables.

  • 3.4 Multivariate data set: more than two variables (e.g., Student ID, Study Hours/Week, Test Score (%), Attendance (Days), Device, Carbon Emissions Used (kg CO2/Month)); example table demonstrates multiple variables across observations.

  • 3.5 Correlation dataset (spelled "Corelation" in the slides): a dataset that establishes a relationship between variables and indicates dependency between them.

  • Practical implication: different data types and datasets guide what questions we can ask and what analytical approaches are appropriate.

Data Problem

  • Overview: After identifying data types, the next step is to define the data problem and determine how the data will help answer the question and what alternative approaches exist.

  • 4.1 Problem statement and goal: defines the problem and the objective of the analysis.

  • 4.1.1 S.M.A.R.T criteria:

    • Specific: What exactly am I trying to solve? What is the impact and scope? Example: How can teacher education students at PNU Visayas reduce paper usage in their coursework by 30% within one semester while ensuring equal access for male and female students? 30 ext{\%} reduction within one semester.

    • Measurable: Can the problem be measured? What data will be collected to ensure an objective answer? Example data could include: Number of printed vs. digital submissions; Survey responses on student access to technology (disaggregated by gender).

    • Achievable: Based on the data available, is the objective attainable? What information is needed to reach a conclusion? Example: Most students already use digital platforms (Google Classroom, LMS, email); library/ICT office support can ensure accessibility.

    • Relevant: Why is the analysis important? What change or decision will result? Example: Reducing paper waste supports sustainability, prepares teachers for digital tools, and ensures gender-fair access.

    • Timely: Can the analysis be completed within the needed timeframe? How much time is available? Example: One semester (16 weeks) to inform policies for the next academic year.

  • 4.1.2 Four Ws:

    • What am I looking for?

    • Who will benefit from this information?

    • When (timeframe) did this take place?

    • Where can this analysis be applied?

Data Literacy

  • Definition: The ability to read, understand, create, and communicate data as information.

  • Example: A teacher notices rural barangays consistently submit assignments late. By analyzing submission timestamps and internet access data, she discovers connectivity is a major barrier.

Artificial Intelligence (AI)

  • Definition: AI refers to systems that simulate human intelligence—learning, reasoning, problem-solving.

  • Examples: Machine learning, natural language processing, computer vision, robotics.

  • Potential benefits in education (6.1 AI for Future-ready teaching):

    • ✓ Personalized learning paths for diverse learners.

    • ✓ Real-time data insights for instructional decisions.

    • ✓ AI-assisted curriculum design and content creation.

  • AI and Sustainability in Education (6.2):

    • Definition: Strategic use of intelligent technologies to create learning environments that are adaptive, inclusive, and ecologically responsible.

    • Implications: Personalize instruction, optimize resource use, promote climate literacy; ensure practices meet present needs without compromising the future.

Data Sources and Reliability (Lesson 2)

1. Data Sources

  • Data can be collected from almost anywhere, whether stored physically or electronically; access and analysis are possible across formats.

  • There are two main data categories: quantitative (numbers) and qualitative (non-numeric formats like text, images, graphs).

  • Data sources can be primary or secondary:

    • Primary data: collected directly by the researcher/organization.

    • Secondary data: data originally collected by someone else and reused.

  • The quality of data is essential since conclusions depend on it.

1.1 Types of Data Sources

  • First-party data: data collected directly by you or your organization.

  • Second-party data: data collected by another entity, shared with you; effectively first-party data to you.

  • Third-party data: data collected by external sources; less reliable than second-party, may be rented or sold; connection to your organization is not guaranteed.

  • Reliable sources lay the foundation for credible research and informed teaching by helping to avoid misinformation, bias, and ensuring inclusive, ethical, sustainable decisions.

1.2 Reliable Source – Criteria

  • Criteria for evaluating reliability:

    • Accuracy: Are facts supported by evidence? Look for citations, data, and peer-reviewed references.

    • Completeness: Is the data complete (no missing information) for the analysis?

    • Reliability: Can you trust the data? Are there checks for bias or data quality?

    • Relevance: Is the data information for the problem or analysis at hand?

    • Timeliness: When was the data collected? Is it still applicable?

1.3 Where to Find Reliable Sources

  • Academic Databases: JSTOR, Google Scholar, ERIC, ScienceDirect

    • Example search: "AI in Philippine education" on ERIC for peer-reviewed studies

  • Government & Institutional Websites: DepEd, CHED, Philippine Commission on Women, UNESCO, UNDP

    • Example: Use DepEd’s reports for curriculum data or GAD integration

  • University Publications: Thesis repositories, faculty research, open-access journals

    • Example: UP Diliman education research portal

  • Reputable News Outlets: Rappler, Philippine Daily Inquirer, CNN Philippines (use to check current events)

    • Tip: Cross-check facts with other sources to avoid media bias

  • Open Educational Resources (OER): OER Commons, Merlot, DepEd Commons

    • Example: Use OER for sustainability modules with localized content

References

  • Dawson, R. (2023). Fundamentals of analytics: learn essential skills, embrace the future, and catapult your career in a data-driven world.

  • Southern Methodist University. (2025, January 9). How artificial intelligence in education is changing schools. SMU Learning Sciences Blog.

  • Theobald, O. (2019). Data analytics for absolute beginners: A deconstructed guide to data literacy (2nd ed.). Scatterplot Press.