Information Retrieval and Representation
Models for Information Retrieval
Factors Affecting Internet-Based Service Search
- Volatility: Information changes rapidly.
- Multimedia: Combines various media formats.
- Incomplete: Always evolving content.
- Interactive: Allows user engagement.
- Personalized: Tailored to individual preferences.
- Accessibility: Availability considerations.
- Confidentiality: Privacy and usage restrictions.
- Non-Traditional Structures: Deviation from conventional formats.
- Convertibility: Adaptable content.
- Copying: Easy duplication with permission.
- Magnification: Enhanced ranking in search results.
- Challenges: Require storing and retrieving information.
Internet Structure
- Interconnected Network: All computers are linked via a backbone.
- Access: Through cables, fiber optics, satellites.
Key Components
- Service Provider Computers: Host content in various settings.
- Communication Protocols: TCP/IP, HTTP, FTP.
- Service Applications: Browsers, email clients.
- Content and Information: Data available to users.
- Search Tools: Directories and search engines.
Search Tool
Directories
- Subject-based categorization of websites.
Search Engines
Most important tools for accessing information.
Operate at database and internet levels.
Match user needs with available documents.
- Crawler: Identifies and introduces URLs to the knowledge base.
- Algorithm: Determines website visitation, content selection, and delivery schedule.
- Knowledge Base: Stores gathered information.
- Indexer: Selects meaningful parts of a document for matching search terms.
- Thesaurus: Provides a common language for translating words.
- Searcher: Executes user commands.
- Parses query
- Transforms words into identifiers
- Determines search list
- Searches for word matches with documents
- Calculates rank of documents
- Prepares and displays the list
- Allows for query iteration
- User Interface: Allows the user to formulate request and obtain response.
Controllers: Evaluate interactions and maintain system stability.
Google Search Engine Algorithms
Penguin: Identifies and penalizes spam links.
Panda: Filters out copied content.
Hummingbird: Analyzes search queries to improve results.
Pigeon: Provides local search results.
PayDay Loan: Targets sites with irrelevant keywords
Page Authority: Evaluates website authority.
Caffeine: Considers social media activity.
Zebra: Assesses the quality of online stores.
Mobilegeddon: Ranks mobile-friendly sites higher.
Meta Search Engines
- Send user queries to multiple search engines.
- Return results based on user-defined criteria.
- Offer customization.
Advantages
- Access to diverse search engine capabilities.
Disadvantages
- Potential for overwhelming results.
- Redundancy.
- Inconsistencies.
Examples
MetaCrawler, Dogpile, Mamma, IxQuick, Kartoo, Ithaki, Seekz, iBoogie, Zuula, inCrawler, WindSeek, Seek2Day, ez2Find, Vroosh, qkSearch, TurboScout, FinQoo, Polymeta, Unabot, vPinpoint, Draze, SearchSalad, Clusty, AllPlus
Features
- Customization of searches.
- Separation of overlapping items and spam.
- Standardization of heterogeneous information.
Information Retrieval in Integrated Searches
- Global Search
- Federated Search
In federated search, a comprehensive engine, through a joint user interface, enables the search of various types of information databases. In this type of search, users only deal with one user interface, and it is not necessary to learn the search facilities of different databases.
Applications and Functions of Information Retrieval
- Individual Level: Address daily needs via retrieval systems.
- Organizational Level: Managers and staff obtain information for internal purposes and stakeholders.
- Supra-Organizational Level: International databases are used.
In interactive machine applications of information retrievals systems, information retrieval becomes a part of a smart system and helps to accomplish assigned tasks.
Example is given of a smart urban monitoring system in which, upon a car accident, the smart system searches the car’s plate number in the relevant database, retrieves and provides to the system contact information of the driver.
Role of Information Specialists
- Designers
- Liaisons
- Users of retrieval systems
Key Skills
- Understanding information structure.
- Formulating search queries.
- Proficiency with database features.
Information specialists also act as consultants alongside designers and trainers alongside the user.
Added Value through Information Retrieval
- Increased application through diverse user retrieval.
- Integration of disparate information.
- Multiple format retrieval (abstract, full text, etc.).
- Semantic content connectivity.
Information Representation
- Data must be represented in various formats.
- Indexing a document requires minimal information from a source.
Challenges
- Full representation of documents can overwhelm users.
- High levels of irrelevant information.
Solutions
- Topic analysis
- Techniques for information representation.
What is Representation?
Process of extracting content from information carriers and expressing it in one or more information carriers.
Process Steps
- Content Extraction.
- Content Translation.
Forms
- Human representation.
- Automated machine representation.
- Human plus Automated machine representation.
Information Representation Categories:
Based on:
- Work process.
- Source of information.
- Goal.
Based on Process
- Manual: Human agent performs.
- Machine: Automated processing.
Sources of Information
- Tangible: Concrete resources.
- Intangible: Radio signals.
Goals
For Saving and For Searching. The question for saving are, What values do they store up? What connections do they have?
- Pre-coordinate: Preparing structure.
- Post-coordinate: Combining terms during retrieval.
- Extraction Indexing.
- Assignment Indexing.
- Surface and Comprehensive Representation.
- In-depth and Specialized Representation.
Information Retrieval
In information retrieval systems, information representation is needed in two stages: the storage stage and the retrieval stage.
Purposes:
- Determine the value of data.
- Connect to the data set.
- Create a content view for incorporation into the database.
It has important functions, such as:
- Subject area determination.
- Improve the retrieval system in the professional areas.
- Identify focus points of information resources.
- Data exchange.
- Subject Documentation.
Information Retrieval in the Internet Environment
- Volume.
- Variety.
- Diversity of producers.
- Diversity of systems.
- Linguistic differences.
- Lack of vocabulary control.
- Natural language problems.
- Ephemeral content.
- Unstructured nature of information sources.
Pre-coordinate and Post-coordinate
- Whether with pre-coded key words or concepts in format, everything needs to be related in structure.
Indexing
Is the best-known process of representing information. Is a process of extraction.
Types of Indexes
- Book indexes.
- Article indexes.
- Subject indexes.
- Citation indexes.
Index Categories:
- Based on Structure and Shape.
- Thematic indexes
- Rotational indexes
Theoretical Perspectives on Indexing
- Logical theoretical perspective
- Experimental theoretical perspective
- Historical and hermeneutic theoretical perspective
- Pragmatic and critical theoretical perspective
Indexing Rules
- Index what is valuable in the main source.
- Choose the most famous phrase as an entry.
- Use standard devices for selecting phrases and words.
- Create alphabetical divisions for the section, if you possible.
Depth of Indexing
- Comprehensive Indexing.
- Specialized indexing.
During which process, you:
- Study the original text
- Extract content core.
- Nature of content
- Allocate wording & Standardize.
Core Elements:
- References Identifier
Three reference types:
- Equivalent phrases
- Connected phrases
- Showing hierarchies between two phrases
Identifiers are a major component of the entry, referring to the content in the listing.
With standards through:
- Standard tools.
- Vocabulary.