1/54
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is OLTP?
OnLine Transaction Processing
What is a data warehouse?
A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process
What is subject-oriented data?
Data organised around the major subjects of the enterprise rather than major application areas
What is integrated data?
Application-oriented data from different source systems
What is a common issue with integrated data?
The sources often contain data that is inconsistent. They must be made consistent to present a unified view to users
What is time-variant data?
Data stored over time as a series of snapshots
Where is time-variant data useful?
Trend analysis, predictive modelling, finding patterns in time
What is non-volatile data?
Data that does not change once it is stored
What is OLAP?
OnLine Analytical Processing
How to build a data warehouse?
Extract, transform, load
What are data marts?
A subject-oriented subset of a data warehouse designed for a specific team, department or business function
What do data marts support? (2 with examples)
The analytical requirements of a particular business unit (e.g. sales)
The users who share the same requirements as a particular business process (e.g. property sales)
What is the process of creating data marts top-down?
Sources -> ETL -> Data warehouse -> Each data mart
What is the process of creating data marts bottom-up?
Sources -> ETL for each data mart -> Each data mart -> Data warehouse
What are the benefits of a data warehouse? (4)
Potential high ROI
Competitive advantage
Increased productivity of decision-making
Use data already available in OLTP systems
What are the challenges of a data warehouse?
Handling data from lots of different sources and different formats
Cost - complex design and huge storage requirements
People
What are the four types of analytical operations?
Roll-up, drill-down, slice and dice and pivot
What does roll-up do?
Performs aggregations on the data by moving up the dimensional hierarchy or by dimensional reduction e.g. 4-D sales data to 3-D sales data
What does drill-down do?
The reverse of roll-up. It involves revealing the detailed data that forms the aggregated data; this can be done by moving down the dimensional hierarchy or by dimensional introduction
What is slice-and-dice?
The ability to look at data from different viewpoints; a slice operation selects on one dimension of the data whereas dice uses two or more dimensions
What is an example of a slice and a dice?
A slice of sales (type=’hatchback’), a dice (category=’hatchback and quarter=’Q1’
What does a pivot do?
Rotate the data to provide an alternate view of the same data
T/F pivot affects the data quantity
False, pivot is purely for visualisation. There are no changed to the data quantity or detail
What to OLTP databases generally store?
An up-to-date picture of real-world objects and transactions
T/F data warehouses store neatly formatted data from many sources over time
True
What does OLAP allow us to do?
Pull time-based stats from the warehouse to predict the future
What does data mining allow us to do?
Search in the data for pattern we didn’t know were there - these unexpected patterns may be the most useful
What is data mining?
The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial decisions
Fill in the blanks:
Data mining combines (1) with (2) and (3) to find hidden and unexpected patterns and relationships within large sets of data
database systems, statistics, machine learning
Name 6 applications of data mining
Music/video recommendations
Understanding disease factors
Optimising energy consumption
Fraud detection
Basket analysis
Epidemiology
What are the 4 important data mining operations?
Predictive modelling
Database segmentation
Link analysis
Deviation detection
Which data mining techniques are used in predictive modelling?
Classification, value prediction
What data mining techniques are used in database segmentation?
Demographics clustering, neural clustering
What data mining techniques are used in link analysis?
Association discovery, sequential pattern discovery, similar time sequence discovery
What data mining techniques are used in deviation detection?
Statistics, visualisation
What does predictive modelling use to learn? (2)
Observations to form a model of the important characteristics of a data set
Generalisations of the ‘real world’ and the ability to fit new data into an existing framework
What two data sets are needed for predictive modelling?
Training and testing
What question needs asked during classification?
Which category does this object fit into?
What question needs to be asked during value prediction?
What is the likely future value of this variable?
What are four applications of predictive modelling?
Customer retention management, credit card approval, cross selling, digital marketing
What is the aim of database segmentation?
To partition a database into an unknown number of segments (clusters) of similar records
How does database segmentation work?
It uses unsupervised learning to discover homogeneous sub-populations in a database to improve accuracy of the profiles
What is a disadvantage of database segmentation?
It is less precise than other operations, thus less sensitive to redundant or irrelevant features
What could clusters be based on in database segmentation? (5)
Behaviour, demographics, location, financial status, symptoms
What are three applications of database segmentation?
Customer profiling, direct marketing, personal healthcare
What is the aim of link analysis?
To establish links (i.e. associations) between records (or sets of records) in a database
What are the three specialisations in link analysis?
Associations discovery, sequential pattern discovery, similar time sequence discovery
What are three applications of link analysis?
Product affinity analysis, direct marketing, stock price movement
What does associations discovery (link analysis) do?
Finds ‘if-then’ rules for relationships in a data set. There are known as association rules
What does sequential pattern discovery (link analysis) do?
Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over time
What is deviation detection?
Data management that identifies data points, patterns, or events that differ significantly from the norm (often called outliers)
How can deviation detection be performed?
Using statistics and visualisation techniques, or as a by-product of other data mining operations
What are three applications of deviation detection?
Fraud detection in credit cards and insurance claims, quality control, defects tracing
How can graph databases find the shortest path between two nodes?
Breadth-first search
A company wants to look at sales for each of its product categories over the last 5 years. Which OLAP operation is relevant here?
Slice and dice