S

KAND Lecture 1

Chapter 1: Examples of Data, Information, and Knowledge

  • Examples of raw data: "yes, no, no, yes, no", "8.5, 7.5", "123, 11.1"

  • Without context, data is useless

  • Adding meaning to data makes it useful and allows for further actions and analysis

  • Processing rules and inference rules can be applied to turn information into knowledge

  • Knowledge allows for decision-making and taking actions based on the information

Chapter 2: Contextualization and Interpretation

  • Contextualization increases the usefulness and interpretability of information

  • The more context given to data, the higher it goes in the knowledge pyramid

  • Interpretable information is more reusable

  • Knowledge involves more subjective and psychological aspects

  • Knowledge can be explicit (written down) or implicit (in someone's head)

  • Experts possess valuable knowledge that is difficult to formalize

Chapter 3: Formalizing Knowledge

  • Explicit knowledge is written down in rules, databases, schemas, etc.

  • Formal languages are used to represent explicit knowledge

  • Knowledge graphs are used to combine data, information, and formalized knowledge

  • The amount of written knowledge is limited compared to the knowledge in people's minds

  • Intuition and experience are challenging to capture in written form

Chapter 4: The Right Facts

  • Contextualizing data and information makes it more useful and valuable

    • Knowledge can be used to contextualize information

    • There is a two-way circle between knowledge and information

  • Data scientists spend a significant amount of time cleaning and organizing data

  • Using knowledge to automatically contextualize and interpret data is valuable from a business perspective

  • Formal knowledge allows for interpretation of data, making data science easier

  • Formal knowledge is written down and explicit, not the knowledge in the data scientist's head

  • Formal knowledge is not an alternative to machine learning, but they can work together

  • Knowledge graphs are a common way of writing down information, data, and knowledge

What are knowledge graphs?

  • Knowledge graphs are a way to represent data, information, and knowledge.

  • They are useful when dealing with heterogeneous data from different sources.

  • Knowledge graphs make the semantics and meaning of information explicit.

  • They are represented in a network-like structure.

Knowledge graphs vs. databases

  • Knowledge graphs are an alternative to databases.

  • Databases are represented in tables with columns and rows, while knowledge graphs have nodes and edges.

  • The two models have fundamental differences.

Motivation for knowledge graphs

  • There is an increasing amount of data available online, such as medical, government, and museum data.

  • Data is spread out and shaped differently, creating silos.

  • Silos are geographically and semantically distributed.

  • Connecting these silos would allow for more comprehensive analysis and understanding.

  • Connecting silos is particularly important in domains like culture heritage.

Chapter 4: Web Of Data

  • Tim Berners-Lee invented the World Wide Web and wrote a document outlining his idea while working at CERN.

    • He proposed the idea of sharing information and documents between researchers.

    • His boss responded with the famous quote, "It's vague but exciting."

  • The first page of the document shows familiar elements of the World Wide Web, such as documents and links.

    • It allowed for the connection of remotely hosted documents.

    • It also included hierarchies, concepts, and relations between people and documents.

  • In 2001, Tim Berners-Lee recognized the need for a web of data (instead of web of documents) and proposed the concept of the Semantic Web.

    • The current web consists of applications hosted on different locations, but there is no web of data.

    • The vision is to build a web of data that allows for the integration of different applications.

  • The analogy between a web of documents and a web of data is powerful.

    • The web of documents allows for easy linking to external sources of information.

    • It relieves the burden of having to know everything about a topic and allows for information reuse.

  • The web of data would provide users with connections and access to information from multiple contributors.

Introduction

  • The web of data is similar to the World Wide Web but with data instead of websites.

  • It involves using databases and data items instead of web pages.

  • The goal is to increase the usefulness of data and enable the reuse of existing data.

The Web of Data

  • The web of data is a network of data points or datasets.

  • Two challenges need to be resolved: integrating heterogeneous information and dealing with physical distribution.

  • The physical integration is solved by the web, while the semantic integration is solved by understanding how to write down knowledge.

Linked Data and Semantic Web

  • Linked data is the idea of linking data sets from multiple sources.

  • Adding meaning to linked data results in the semantic web.

  • Knowledge graphs are the formulation for creating semantic web.

Web of Data for Machines

  • The web of data is intended to be interpreted by machines.

  • Data is linked and comes from different sources.

  • The power of linked data and knowledge graphs is evident when dealing with heterogeneous data.

Data Integration and Challenges

  • Data integration becomes a challenge when dealing with different types of data and structured data from different sources.

  • Knowledge graphs and linked data help solve the challenges of connecting distributed information and writing down diverse information.

Building Knowledge Graphs

  • Tim Berners-Lee and others proposed four principles for building knowledge graphs.

Chapter 5: Data On Web

Principle 1: Giving all things a name

  • Data provider gives names to things they want to talk about

  • Names depend on the domain and task at hand

  • Not all details need to be named, only relevant ones

Principle 2: The names are addresses on the web

  • Names can be addresses on the web (URIs → Uniform Resource Identifiers)

  • URIs provide globally unique identifiers for objects

  • Browsers or applications can follow URIs to find the object

Principle 3: Relations form a graph between things

  • Relations can be established between different things

  • Relations create a network or graph data model

  • Adding relations allows for the creation of graphs

Principle 4: Making the meaning (semantics) of things explicit

  • Emphasizes the importance of explicitly defining the meaning of things in knowledge graphs.

  • Adding semantics makes connections and relationships between entities more meaningful and understandable.

  • Semantics clarify the context and purpose of information stored in the knowledge graph.

  • Explicitly defining the meaning of things enables extraction of valuable insights and various data science tasks.

  • Allows for effective communication and utilization of knowledge within a web setting.

Distributed graphs

  • Graphs can be distributed across different sources

  • Different sources can have their own identifiers for the same concept

  • Relations between different things can be established across sources

Outsourcing of information

  • Different sources can focus on different aspects of the data

  • Schema.org defines the concept of a person

  • RDF determines the meaning of a type of something

  • Information is distributed and outsourced across different sources

Benefits of using web identifiers

  • Creates a globally distributed graph of linked data

  • Allows for the outsourcing of information to different sources

  • Enables the creation of smaller graphs for specific purposes

Chapter 6: Linking Data Sets

  • In the past, the speaker would have to manually input information about Harlem into their own database.

  • Now, they can make a link to another dataset, specifically one governed by geo names.

  • By linking their data to the geo names dataset, they gain access to a wealth of information for free.

  • This allows for network queries and various data science tasks such as visualization, data analysis, and data querying.

Chapter 7: Principles of Linked Data

  • The speaker introduces the first three principles of linked data:

    • Use web URIs for names.

    • Put relations between things to create a web of linked data.

    • Use knowledge graphs to enable data science tasks.

  • They mention that there is a fourth principle, which will be discussed after a break.

Chapter 6: The Right Thing

  • Computers struggle to understand textual information and context

    • Humans are good at reading and interpreting English text

    • Computers need a formal representation of information to understand it

  • Formal representation of information allows for predictable inferencing

    • Example: Using a formalism to understand the meaning of a statement

    • Computers can derive new facts based on the formal representation

  • Semantic web combines naming, graph relationships, and explicit semantics

    • Giving things names and addressing them on the web

    • Representing relationships between things using graphs of data

    • Adding explicit semantics for predictable inferencing

  • Semantic web enables the web of data

    • Machines can understand and derive information from machine-readable formats

    • Knowledge graphs play a role in the semantic web

  • Knowledge graphs may not be widely known or discussed in the news

Chapter 7: The Google Knowledge

  • There are many knowledge graphs available for searching and use in various domains.

    • Life sciences, government data, media data, publication, social networking, etc.

    • These are public and open knowledge graphs.

  • Combining different databases in life sciences is important for drug research and discovery.

    • Understanding how enzymes work, existing drugs, and genomic pathways.

  • There has been an opening up and connecting of data in life sciences and other domains.

  • Some generic knowledge graphs mentioned:

    • Iago: a general-purpose knowledge graph with various facts.

    • DBpedia: a project that converts Wikipedia info boxes into a knowledge graph format.

    • Freebase: a combination of several knowledge graphs used in commercial applications.

    • Google Knowledge Graph: one of the largest knowledge graphs used by Google in search results.

  • Google Knowledge Graph provides information boxes about entities searched for.

    • Derived from their own knowledge graph.

  • Many companies and organizations are using knowledge graphs in their infrastructure.

    • Netflix for recommendations.

    • German National Library.

    • Elsevier for better services for scientists.

    • IKEA for product catalog and user information.

  • Knowledge graphs are used for various tasks in different industries.

    • Amazon for a product graph in their web tool.

    • Uber for a food ontology for food recommendations.

Chapter 8: AI and Machine Learning

  • AI is not limited to machine learning, it encompasses various tasks and approaches

  • AI includes machine learning, knowledge representation, and other theories

  • Machine learning relies on statistics and signal processing

  • Knowledge representation is based on logics

  • AI can be divided into symbolic AI and sub-symbolic AI

    • Symbolic AI focuses on knowledge representation, while sub-symbolic AI focuses on deep learning

  • There are interesting connections between machine learning and knowledge representation

Chapter 3: Combining Symbolic and Sub-symbolic AI

  • Symbolic methods, such as knowledge graphs, can be combined with machine learning

  • Knowledge graphs can provide formal knowledge to enhance machine learning results

  • Symbolic methods can be used for explainability and data input/output

  • The combination of symbolic and sub-symbolic AI is an interesting approach

  • The course will not cover this combination, but students should be equipped to explore it after completing their AI bachelor's degree