Chapter 1: Knowledge Representation and Information Retrieval Overview

Chapter 1: Knowledge Representation and Information Retrieval: Overview

Introduction

The history of knowledge representation and information retrieval systems dates back to the mid-19th century. In 1876, Melvil Dewey laid the foundation of modern knowledge representation through his classification system. This system serves as a fundamental tool for organizing and providing access to knowledge (Taylor & Wynar, 1985). However, the field of knowledge representation within the domain of information science did not gain traction until after World War II. Since then, there have been extensive efforts to develop this rich field, attracting the interest of researchers across multiple disciplines.

The terms "knowledge representation" and "information retrieval" have evolved over the years, originating in the early 20th century and encompassing concepts like indexing, information extraction, processing and organization of information, knowledge management, etc. In this chapter, we will review the historical development of the field, focusing on key milestones in each period.

1.1 Stages of Development of Knowledge Representation and Information Retrieval Systems

The history of knowledge representation and information retrieval systems is relatively short yet characterized by rapid development. It is viewed as a phase of clarifying this field, with information retrieval undergoing four key stages, starting from increased demand for information to the current era of networking.

1.1.1 Phase of Increasing Demand (Early 1940s to Early 1950s)

World War II accelerated the pace of development in science and technology, significantly contributing to the emergence of knowledge representation and information retrieval. The war resulted in an enormous production of documents and reports that recorded the outcomes of research and development activities, especially in the arms manufacturing industry and operations management. This vast quantity of documentation necessitated new methods for document processing to access contained information, as humanity had not faced such complexity before.

The overwhelming volume of documents led to the realization that more efficient methods for information representation and organization were essential, particularly in chemistry, biology, and manufacturing. For example, the field of biochemical publishing produced approximately two million documents annually (Hiemstra, 2009).

1.1.2 Accelerated Growth (1950s to 1980s)

This period is considered the golden age for the growth and development of knowledge representation and information retrieval, marked by the introduction of computers in this field from 1957-1959 when Peter Hans Luhn utilized punched cards for processing and matching keywords.

The rise of online information retrieval systems in the 1960s and 1970s facilitated the transition from manual information retrieval systems to online systems. Notable enhancements during this period included availability of online databases, result filtering, automatic synonym merging, Boolean logic, and focused searches in specific sources.

1.1.3 Phase of Clarification (1980-1990)

Despite earlier descriptions suggesting that information retrieval systems were designed to meet diverse and changing needs of users, these systems were not structured to allow users to search independently without training or support from information specialists (Mediators). The process of searching using these systems was often costly due to various fees involved, such as telecommunication charges and database subscriptions. Users often required intermediaries to conduct searches on their behalf.

1.1.4 Era of Networking (1990s to Present)

Until the early 1990s, information retrieval systems operated in a centralized manner, where databases were managed from a single location. Access to multiple information retrieval systems necessitated contacting each database individually. With the emergence of networked information and its proliferation, new search paradigms emerged, such that users can access databases simultaneously using the network infrastructure.

1.2 Core Concepts

This book focuses on four essential concepts: the information pyramid, knowledge representation, information retrieval, and the digital age. Each concept encompasses various synonyms that can be interpreted or understood in different ways and contexts. Below, we clarify these concepts:

1.2.1 Information Pyramid

Many researchers have explored the information pyramid, identifying its components, which include data, information, knowledge, and wisdom.

  • Data: Raw, unstructured facts that can be quantitative or qualitative.

  • Information: Data that has been processed and put into a context that conveys meaning for decision-making.

  • Knowledge: Information that has been understood and applied in specific contexts.

  • Wisdom: The ability to make sound judgments based on knowledge and insight.

1.2.2 Information Representation

Regardless of the form of information, there is a crucial need for its representation before it can be retrieved. Representation involves deriving a set of data points from documents for identification and differentiation. This process typically involves several operations, including extraction, indexing, classification, summarization, and abstraction.

1.2.3 Demand and Retrieval

The field of information demand can be viewed as a broad subject covering both representation and retrieval aspects. Retrieval focuses on the availability of information, while demand centers on the user's engagement in the information-seeking activity.

1.2.4 Digital Age

The distinction between "digital" and "analog" terms ties back to the usage of electronic technology. Digital technology deals with data assembly and processing in binary format (1s and 0s), while analog technology involves electrical signals that vary in frequency.

1.3 Related Concepts

1.3.1 Organizing Information

Organizing information involves establishing a context that allows access to information swiftly and easily. This organization is typically attained through tools that facilitate the dissemination of information, including bibliographies, indexes, directories, search engines, and more.

1.3.2 Information Retrieval

Information retrieval is defined as the process of searching for a collection of document alternatives, where systems are designed specifically to facilitate this searching of intellectual outputs.

1.3.3 Database Systems

Databases are vital components of knowledge representation and information retrieval systems, containing organized and formatted data that can be utilized in retrieval processes.