CS22001 - Week 10 Data Mining

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/54

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:45 PM on 5/9/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

55 Terms

1
New cards

What is OLTP?

OnLine Transaction Processing

2
New cards

What is a data warehouse?

A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process

3
New cards

What is subject-oriented data?

Data organised around the major subjects of the enterprise rather than major application areas

4
New cards

What is integrated data?

Application-oriented data from different source systems

5
New cards

What is a common issue with integrated data?

The sources often contain data that is inconsistent. They must be made consistent to present a unified view to users

6
New cards

What is time-variant data?

Data stored over time as a series of snapshots

7
New cards

Where is time-variant data useful?

Trend analysis, predictive modelling, finding patterns in time

8
New cards

What is non-volatile data?

Data that does not change once it is stored

9
New cards

What is OLAP?

OnLine Analytical Processing

10
New cards

How to build a data warehouse?

Extract, transform, load

11
New cards

What are data marts?

A subject-oriented subset of a data warehouse designed for a specific team, department or business function

12
New cards

What do data marts support? (2 with examples)

The analytical requirements of a particular business unit (e.g. sales)

The users who share the same requirements as a particular business process (e.g. property sales)

13
New cards

What is the process of creating data marts top-down?

Sources -> ETL -> Data warehouse -> Each data mart

14
New cards

What is the process of creating data marts bottom-up?

Sources -> ETL for each data mart -> Each data mart -> Data warehouse

15
New cards

What are the benefits of a data warehouse? (4)

Potential high ROI

Competitive advantage

Increased productivity of decision-making

Use data already available in OLTP systems

16
New cards

What are the challenges of a data warehouse?

Handling data from lots of different sources and different formats

Cost - complex design and huge storage requirements

People

17
New cards

What are the four types of analytical operations?

Roll-up, drill-down, slice and dice and pivot

18
New cards

What does roll-up do?

Performs aggregations on the data by moving up the dimensional hierarchy or by dimensional reduction e.g. 4-D sales data to 3-D sales data

19
New cards

What does drill-down do?

The reverse of roll-up. It involves revealing the detailed data that forms the aggregated data; this can be done by moving down the dimensional hierarchy or by dimensional introduction

20
New cards

What is slice-and-dice?

The ability to look at data from different viewpoints; a slice operation selects on one dimension of the data whereas dice uses two or more dimensions

21
New cards

What is an example of a slice and a dice?

A slice of sales (type=’hatchback’), a dice (category=’hatchback and quarter=’Q1’

22
New cards

What does a pivot do?

Rotate the data to provide an alternate view of the same data

23
New cards

T/F pivot affects the data quantity

False, pivot is purely for visualisation. There are no changed to the data quantity or detail

24
New cards

What to OLTP databases generally store?

An up-to-date picture of real-world objects and transactions

25
New cards

T/F data warehouses store neatly formatted data from many sources over time

True

26
New cards

What does OLAP allow us to do?

Pull time-based stats from the warehouse to predict the future

27
New cards

What does data mining allow us to do?

Search in the data for pattern we didn’t know were there - these unexpected patterns may be the most useful

28
New cards

What is data mining?

The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial decisions

29
New cards

Fill in the blanks:
Data mining combines (1) with (2) and (3) to find hidden and unexpected patterns and relationships within large sets of data

database systems, statistics, machine learning

30
New cards

Name 6 applications of data mining

Music/video recommendations
Understanding disease factors

Optimising energy consumption

Fraud detection

Basket analysis

Epidemiology

31
New cards

What are the 4 important data mining operations?

Predictive modelling

Database segmentation

Link analysis

Deviation detection

32
New cards

Which data mining techniques are used in predictive modelling?

Classification, value prediction

33
New cards

What data mining techniques are used in database segmentation?

Demographics clustering, neural clustering

34
New cards

What data mining techniques are used in link analysis?

Association discovery, sequential pattern discovery, similar time sequence discovery

35
New cards

What data mining techniques are used in deviation detection?

Statistics, visualisation

36
New cards

What does predictive modelling use to learn? (2)

Observations to form a model of the important characteristics of a data set

Generalisations of the ‘real world’ and the ability to fit new data into an existing framework

37
New cards

What two data sets are needed for predictive modelling?

Training and testing

38
New cards

What question needs asked during classification?

Which category does this object fit into?

39
New cards

What question needs to be asked during value prediction?

What is the likely future value of this variable?

40
New cards

What are four applications of predictive modelling?

Customer retention management, credit card approval, cross selling, digital marketing

41
New cards

What is the aim of database segmentation?

To partition a database into an unknown number of segments (clusters) of similar records

42
New cards

How does database segmentation work?

It uses unsupervised learning to discover homogeneous sub-populations in a database to improve accuracy of the profiles

43
New cards

What is a disadvantage of database segmentation?

It is less precise than other operations, thus less sensitive to redundant or irrelevant features

44
New cards

What could clusters be based on in database segmentation? (5)

Behaviour, demographics, location, financial status, symptoms

45
New cards

What are three applications of database segmentation?

Customer profiling, direct marketing, personal healthcare

46
New cards

What is the aim of link analysis?

To establish links (i.e. associations) between records (or sets of records) in a database

47
New cards

What are the three specialisations in link analysis?

Associations discovery, sequential pattern discovery, similar time sequence discovery

48
New cards

What are three applications of link analysis?

Product affinity analysis, direct marketing, stock price movement

49
New cards

What does associations discovery (link analysis) do?

Finds ‘if-then’ rules for relationships in a data set. There are known as association rules

50
New cards

What does sequential pattern discovery (link analysis) do?

Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over time

51
New cards

What is deviation detection?

Data management that identifies data points, patterns, or events that differ significantly from the norm (often called outliers)

52
New cards

How can deviation detection be performed?

Using statistics and visualisation techniques, or as a by-product of other data mining operations

53
New cards

What are three applications of deviation detection?

Fraud detection in credit cards and insurance claims, quality control, defects tracing

54
New cards

How can graph databases find the shortest path between two nodes?

Breadth-first search

55
New cards

A company wants to look at sales for each of its product categories over the last 5 years. Which OLAP operation is relevant here?

Slice and dice