Foundations of Data, AI, and Sustainability — Study Notes
Data Foundations: Key Concepts, Types, and Applications
What is Data?
Definition overview from multiple sources:
Russell Dawson (2023, Data Analytics): Data is a piece of information that usually lacks context; when multiple data points are gathered together, we have raw data.
Oxford Dictionary of Data Science (2021): Data are facts and statistics collected together for reference or analysis.
Merriam-Webster (2025): Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation.
O'Reilly Data Science Handbook (2018): Data are recorded information, often numeric, collected from observations, surveys, experiments, or digital traces, which can be analyzed to generate insights.
Raw data vs information:
RAW DATA: Numbers or data points without context; no meaning until processed.
INFORMATION: Meaningful data that provides context and can inform decisions.
DECISION-MAKING: Actions taken based on information (e.g., provide a remedial session).
Simple illustration of transformation:
RAW DATA: 87 90 78 65 92 73 90 76 89
INFORMATION: e.g., Average score = 82%
DECISION-MAKING: Implement remedial action based on the average.
Raw data definition (precise):
Raw data refers to a data point (or points) that has not yet been treated or processed.
Data vs. Information (illustrative example):
Data: CAT ROOF IS GRAY THE ON THE
Information: THE GRAY CAT IS ON THE ROOF
Why data is valuable for educators (contextual relevance):
Data as a resource informs teaching and learning decisions, assessment design, and resource allocation.
Data Scales
Data can be qualitative (non-numeric expressions, usually text) or quantitative (expressed in numerical values).
2.1 Qualitative (categorical) data types:
Nominal data: Categories with no inherent order.
Ordinal data: Categories with a meaningful order but not necessarily equal spacing.
2.1 Qualitative data types – Nominal data:
Definition: Labels or names with no ranking.
Education example: Subject enrolled (Math, English, Science).
Sustainability example: Waste type (Plastic, Paper, Organic).
2.1 Qualitative data types – Ordinal data:
Definition: Categories with ranking but not equal spacing.
Education example: Student rating of teacher (Poor, Fair, Good, Excellent).
Sustainability example: Air quality index categories (Good, Moderate, Unhealthy).
2.2 Quantitative (numerical) data types:
Numeric data expressed in numerals; examples include test scores, study hours, age, etc.
Examples presented as a sequence of numbers (illustrative data values).
2.2 Quantitative data types – Ratio data:
Context examples: Education (Test Scores 0-100, Study Hours per Week, Number of Students), Sustainability (Electricity Use in kWh, Water Consumption in liters/day, Carbon Emissions in tons CO2), Everyday Life (Age in years, Weight in kg, Income in currency units).
Example values:
Test Scores: 0, 45, 72, 100
Study Hours/Week: 0, 5, 10, 20
Number of Students: 0, 10, 25, 50
Electricity Use (kWh): 0, 100, 250, 400
Water Consumption (liters/day): 0, 20, 75, 150
Carbon Emissions (tons CO2): 0, 1.5, 3.0, 6.0
Age (years): 0, 5, 15, 30
Weight (kg): 0, 50, 75, 100
Income: 0, 10{,}000, 20{,}000, 50{,}000
2.2 Quantitative data types – Interval data:
Context examples: IQ Scores (e.g., 90, 100, 110, 120), Year of Study (e.g., 2010, 2015, 2020), Test Scores (Scaled: 200, 250, 300), Temperature (Celsius): 0°C, 10°C, 20°C, Calendar Dates: 1990, 2000, 2020, Air Quality Index (AQI): 50, 100, 150.
Note on page 27: a placeholder or missing content is indicated by '???????'.
Data Set Types
A data set is a collection of data arranged in a table where each column/row has a specific data type.
3.1 Numerical data set: data points that are words establishing a category of characteristic (note: in practice, numerical vs categorical labeling is used, but the slide outlines numerical data as a dataset that supports mathematical calculations).
3.2 Categorical data set: data focused on categories (e.g., gender, species, color) and textual descriptors.
3.3 Bivariate data set: a dataset with two variables of different categories (e.g., age and height of children in a class) to explore relationships between the two variables.
3.4 Multivariate data set: more than two variables (e.g., Student ID, Study Hours/Week, Test Score (%), Attendance (Days), Device, Carbon Emissions Used (kg CO2/Month)); example table demonstrates multiple variables across observations.
3.5 Correlation dataset (spelled "Corelation" in the slides): a dataset that establishes a relationship between variables and indicates dependency between them.
Practical implication: different data types and datasets guide what questions we can ask and what analytical approaches are appropriate.
Data Problem
Overview: After identifying data types, the next step is to define the data problem and determine how the data will help answer the question and what alternative approaches exist.
4.1 Problem statement and goal: defines the problem and the objective of the analysis.
4.1.1 S.M.A.R.T criteria:
Specific: What exactly am I trying to solve? What is the impact and scope? Example: How can teacher education students at PNU Visayas reduce paper usage in their coursework by 30% within one semester while ensuring equal access for male and female students? 30 ext{\%} reduction within one semester.
Measurable: Can the problem be measured? What data will be collected to ensure an objective answer? Example data could include: Number of printed vs. digital submissions; Survey responses on student access to technology (disaggregated by gender).
Achievable: Based on the data available, is the objective attainable? What information is needed to reach a conclusion? Example: Most students already use digital platforms (Google Classroom, LMS, email); library/ICT office support can ensure accessibility.
Relevant: Why is the analysis important? What change or decision will result? Example: Reducing paper waste supports sustainability, prepares teachers for digital tools, and ensures gender-fair access.
Timely: Can the analysis be completed within the needed timeframe? How much time is available? Example: One semester (16 weeks) to inform policies for the next academic year.
4.1.2 Four Ws:
What am I looking for?
Who will benefit from this information?
When (timeframe) did this take place?
Where can this analysis be applied?
Data Literacy
Definition: The ability to read, understand, create, and communicate data as information.
Example: A teacher notices rural barangays consistently submit assignments late. By analyzing submission timestamps and internet access data, she discovers connectivity is a major barrier.
Artificial Intelligence (AI)
Definition: AI refers to systems that simulate human intelligence—learning, reasoning, problem-solving.
Examples: Machine learning, natural language processing, computer vision, robotics.
Potential benefits in education (6.1 AI for Future-ready teaching):
✓ Personalized learning paths for diverse learners.
✓ Real-time data insights for instructional decisions.
✓ AI-assisted curriculum design and content creation.
AI and Sustainability in Education (6.2):
Definition: Strategic use of intelligent technologies to create learning environments that are adaptive, inclusive, and ecologically responsible.
Implications: Personalize instruction, optimize resource use, promote climate literacy; ensure practices meet present needs without compromising the future.
Data Sources and Reliability (Lesson 2)
1. Data Sources
Data can be collected from almost anywhere, whether stored physically or electronically; access and analysis are possible across formats.
There are two main data categories: quantitative (numbers) and qualitative (non-numeric formats like text, images, graphs).
Data sources can be primary or secondary:
Primary data: collected directly by the researcher/organization.
Secondary data: data originally collected by someone else and reused.
The quality of data is essential since conclusions depend on it.
1.1 Types of Data Sources
First-party data: data collected directly by you or your organization.
Second-party data: data collected by another entity, shared with you; effectively first-party data to you.
Third-party data: data collected by external sources; less reliable than second-party, may be rented or sold; connection to your organization is not guaranteed.
Reliable sources lay the foundation for credible research and informed teaching by helping to avoid misinformation, bias, and ensuring inclusive, ethical, sustainable decisions.
1.2 Reliable Source – Criteria
Criteria for evaluating reliability:
Accuracy: Are facts supported by evidence? Look for citations, data, and peer-reviewed references.
Completeness: Is the data complete (no missing information) for the analysis?
Reliability: Can you trust the data? Are there checks for bias or data quality?
Relevance: Is the data information for the problem or analysis at hand?
Timeliness: When was the data collected? Is it still applicable?
1.3 Where to Find Reliable Sources
Academic Databases: JSTOR, Google Scholar, ERIC, ScienceDirect
Example search: "AI in Philippine education" on ERIC for peer-reviewed studies
Government & Institutional Websites: DepEd, CHED, Philippine Commission on Women, UNESCO, UNDP
Example: Use DepEd’s reports for curriculum data or GAD integration
University Publications: Thesis repositories, faculty research, open-access journals
Example: UP Diliman education research portal
Reputable News Outlets: Rappler, Philippine Daily Inquirer, CNN Philippines (use to check current events)
Tip: Cross-check facts with other sources to avoid media bias
Open Educational Resources (OER): OER Commons, Merlot, DepEd Commons
Example: Use OER for sustainability modules with localized content
References
Dawson, R. (2023). Fundamentals of analytics: learn essential skills, embrace the future, and catapult your career in a data-driven world.
Southern Methodist University. (2025, January 9). How artificial intelligence in education is changing schools. SMU Learning Sciences Blog.
Theobald, O. (2019). Data analytics for absolute beginners: A deconstructed guide to data literacy (2nd ed.). Scatterplot Press.