1/17
A comprehensive vocabulary set covering the three pillars of web mining, layout segmentation techniques, link structure algorithms, and distributed data mining workflows.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Web Mining
The application of data mining techniques to extract useful patterns and knowledge from the World Wide Web.
Web Content Mining
A domain of web mining focused on extracting data from inside web pages, such as text, images, and HTML.
Web Structure Mining
A domain of web mining focused on the link graph and how pages connect to one another.
Web Usage Mining
A domain of web mining focused on analyzing user behavior through logs, clicks, and browser cookies.
DOM-Based Mining
A layout mining technique that uses the Document Object Model tree to find blocks, though it is often sensitive to coding noise from tags used purely for visual styling.
VIPS (Vision-based Page Segmentation)
A layout mining approach that uses visual cues like font size, color, and background to identify semantic blocks as a human would.
PageRank
An algorithm that treats a link as a vote of confidence, calculating the probability that a person randomly clicking links will arrive at a particular page.
HITS Algorithm (Hyperlink-Induced Topic Search)
A link structure mining algorithm that categorizes web pages into Authorities and Hubs.
Authorities
In the HITS algorithm, these are the definitive sources on a specific topic, such as an official website for a product.
Hubs
In the HITS algorithm, these are directories or lists that point users toward many high-quality authority pages.
Feature Extraction
A method for mining multimedia data by analyzing actual pixels, such as identifying a high blue color value to find images of an ocean.
Semantic/Metadata Mining
A method for mining multimedia data by reading text associated with the media, including captions, filenames, or Alt tags.
Automatic Classification
The process of using Supervised Learning to automatically sort web pages into categories like News, Sports, or Travel based on trained keywords.
Preprocessing (Web Usage Mining)
The phase of cleaning usage data, which includes removing clicks made by automated crawlers or bots to prevent skewed results.
Pattern Discovery
The use of association rules to identify specific user behaviors, such as discovering that 80% of people who visit a login page immediately go to a dashboard.
Distributed Data Mining (DDM)
An approach for mining massive data stored across different locations that focuses on moving the code to the data rather than moving the data to the code.
Knowledge Integration
A step in the DDM workflow where individual results or models from local processing are sent to a central coordinator.
Global Model
The final stage of the DDM workflow where the coordinator merges local results into one massive, accurate master model.