1/395
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What do ACID properties do?
ensure that the transaction get executed successfully and its effect permanently stored in the database.
• If the transaction is rolled back, it must return the database to its last consistent state before the update.
Describe situations suitable for data warehousing
Strategic Planning
• Strategic planning is a review and planning process that is undertaken to make thoughtful decisions about an organization's future in order to ensure its success.
By following a strategic planning process, an organizations can improve business outcomes and avoid taking on unanticipated risks due to lack of foresight.
• One key item the organization would need to plan is data.
• A data warehouse provides data necessary for this
Business Modelling
At its simplest, a business model is a specification describing how an organization fulfills its purpose. All business processes and policies are part of that model. • A business model answers the following questions: Who is your customer, what does the customer value, and how do you deliver value at an appropriate cost?
• Data in a data warehouse largely influence how a business is modeled
Explain the need for ETL processes in data warehousing.
• ETL is ideal when the
data has to be integrated from different source systems
source system have data in different formats
process has to be repeated severally
Associations
• In association, a pattern is discovered based on a relationship between items in the same transaction.
• That's is the reason why association technique is also known as relation technique.
• The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together
Describe situations that benefit from
data mining.
• Database analysis and decision support • Market analysis and management
• target marketing, customer relation management, market basket analysis, cross selling, market segmentation
• Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality control, competitive analysis
• Fraud detection and management
Difference between clustering and classification
• Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Forecasting
• This discovers relationships between independent variables and the predicted variables from past occurrences, and exploiting them to predict the unknown outcome.
• For instance, the prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable.
Sequential Patterns
• Sequential patterns analysis seeks to discover or identify similar patterns, regular events or trends in transaction data over a business period.
• In sales, with historical transaction data, businesses can identify a set of items that customers buy together at different times in a year. Then businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past.
Classifications
• Classification is a classic data mining technique based on machine learning (computer systems that can learn from data).
• Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups.
• Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics.
• In classification, the software developed can learn how to classify the data items into groups.
Data
Collection of raw facts and figures
Information
This is processed data within a context. Processing may involve sorting, selection, arithmetic manipulations, interpretation, summarizing
Differences between data and information with examples
Data:
Information:
Database
a collection of data and information that is organized so that it can easily be accessed, managed, and updated.
Information System
collection of technical and human resources providing storage, computing, distribution, and communication for the information an organization needs.
Database state
the data in a database at a particular time
Why are databases beneficial
When designed properly, databases help to:
Database transaction
A logical unit of work in a database performed by a DBMS on a database which reads or updates the contents of a database.
Aggregation
A database operation that summarizes multiple rows into a single row.
Eg. Counting up the number of rows with the same name is a "count" aggregation.
What does an information system consist of?
software, hardware, people, data, and procedures. Thus, a database is a component of an information system
How do transactions maintain data consistency
• During the transaction there would be instances where certain values would be in an inconsistent state.
• For example when the money to be paid has been deducted but not yet given to the customer, his balance is not correct.
• To ensure integrity and consistency, COMMIT and ROLLBACK commands are used
• If a transaction completes successfully, it is said to have committed i.e the necessary updates are made.
• The database reaches a new consistent state.
• On the other hand, if the transaction does not execute successfully, the transaction is aborted.
• If a transaction is aborted, the database must be restored to the consistent state it was in before the transaction started. Such a transaction is rolled back or undone.
Process of transactions maintaining data consistency
The general way transactions are executed is shown
BEGIN TRANSACTION
do task 1
do task 2
do task 3
COMMIT ON ERROR
ROLLBACK
END TRANSACTION Changes are committed if all tasks 1,2,3 are successful. The transaction is rollback if any of the tasks fails. This ensures the database is always in a consistent state.
Data integrity
maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data.
Data concurrency
when a database has the ability to allow multiple users to carry out transactions on a database at one time
Eg. When a particular user can view a file, like a srpeadsheet, but not edit the data
ACID properties of a transaction
Atomicity, Consistency, Isolation, Durability
Atomicity
A transaction is an indivisible unit that is either performed in its entirety or is not performed at all. All tasks must succeed together or fail together.
Consistency
The results of a transaction must conform to existing constraints in the database. The database must always be left in a valid state after a transaction.
• It is the responsibility of both the DBMS and the application developers to ensure consistency.
Isolation
• Transactions execute independently of one another. The partial effects of incomplete transactions should not be visible to other transactions.
• It is the responsibility of the concurrency control subsystem to ensure isolation.
Durability
The effects of a successfully completed transaction must be permanently recorded in the database and must not be lost because of a subsequent failure.
• It is the responsibility of the recovery subsystem to ensure durability
Database operations are…
retrievals (queries), Updates (modification)
Retrievals/queries
This involves selecting fields and records that satisfy the needs of a particular user.
Updates/modifictations
This changes the state of the database. There are three operations which results in changes to databases:
• Insert: used to add one or more tuples (records) to a relation (table)
• Update: used to change the values of some attributes in existing tuples.
• Delete: used to remove/delete records from relations.
Purposes of database transactions
Provide reliable units of works that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.
• To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.
What is the role of data validation?
check on input data to ensure that it is reasonable.