Web Mining and Distributed Data Mining Study Guide

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/17

flashcard set

Earn XP

Description and Tags

A comprehensive vocabulary set covering the three pillars of web mining, layout segmentation techniques, link structure algorithms, and distributed data mining workflows.

Last updated 6:03 AM on 5/15/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

18 Terms

1
New cards

Web Mining

The application of data mining techniques to extract useful patterns and knowledge from the World Wide Web.

2
New cards

Web Content Mining

A domain of web mining focused on extracting data from inside web pages, such as text, images, and HTML.

3
New cards

Web Structure Mining

A domain of web mining focused on the link graph and how pages connect to one another.

4
New cards

Web Usage Mining

A domain of web mining focused on analyzing user behavior through logs, clicks, and browser cookies.

5
New cards

DOM-Based Mining

A layout mining technique that uses the Document Object Model tree to find blocks, though it is often sensitive to coding noise from tags used purely for visual styling.

6
New cards

VIPS (Vision-based Page Segmentation)

A layout mining approach that uses visual cues like font size, color, and background to identify semantic blocks as a human would.

7
New cards

PageRank

An algorithm that treats a link as a vote of confidence, calculating the probability that a person randomly clicking links will arrive at a particular page.

8
New cards

HITS Algorithm (Hyperlink-Induced Topic Search)

A link structure mining algorithm that categorizes web pages into Authorities and Hubs.

9
New cards

Authorities

In the HITS algorithm, these are the definitive sources on a specific topic, such as an official website for a product.

10
New cards

Hubs

In the HITS algorithm, these are directories or lists that point users toward many high-quality authority pages.

11
New cards

Feature Extraction

A method for mining multimedia data by analyzing actual pixels, such as identifying a high blue color value to find images of an ocean.

12
New cards

Semantic/Metadata Mining

A method for mining multimedia data by reading text associated with the media, including captions, filenames, or Alt tags.

13
New cards

Automatic Classification

The process of using Supervised Learning to automatically sort web pages into categories like News, Sports, or Travel based on trained keywords.

14
New cards

Preprocessing (Web Usage Mining)

The phase of cleaning usage data, which includes removing clicks made by automated crawlers or bots to prevent skewed results.

15
New cards

Pattern Discovery

The use of association rules to identify specific user behaviors, such as discovering that 80%80\% of people who visit a login page immediately go to a dashboard.

16
New cards

Distributed Data Mining (DDM)

An approach for mining massive data stored across different locations that focuses on moving the code to the data rather than moving the data to the code.

17
New cards

Knowledge Integration

A step in the DDM workflow where individual results or models from local processing are sent to a central coordinator.

18
New cards

Global Model

The final stage of the DDM workflow where the coordinator merges local results into one massive, accurate master model.