Databases

Studied by 10 people

5.0(1)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

1 / 98

There's no tags or description

Looks like no one added any tags here yet for you.

99 Terms

Entity Relationship Diagram (ERD)

A diagram that depicts an entity relationship model's entities, attributes, and relations.

New cards

Normalisation

The process of minimising redundancy from a relation or set of relations. Redundancy in a relation may cause insertion, deletion and update anomalies.

New cards

Relation

table in a relational database

New cards

What are the three stages in normalisation?

First Normal Form (1NF), Second Normal Form (2NF) and Third Normal Form (3NF) (This is a direct process, cannot go to 2NF without 1NF)

New cards

First Normal Form (1NF)

A table is in first normal form if it contains no repeating attributes or groups of attributes. All attributes must be atomic - a single attribute cannot consist of two data items such as firstname and surname. This would make it difficult or impossible to sort on surname.

New cards

Second Normal Form (2NF)

A table is in second normal form (2NF) if it is in first normal form and contains no partial dependencies. It is a table that is in First Normal Form with every non-primary-key attribute fully functionally dependent on the primary key. A relation with a single-attribute (column) primary key is automatically in at least 2NF.

New cards

Third Normal Form (3NF)

A relation that is in Second Normal Form and in which no non-primary-key attribute is transitively dependent on the primary key.

New cards

Transitive Dependency

A transitive dependency occurs when one attribute depends on a second attribute, which depends on a third attribute. Deletions in a table with such a dependency can cause unwanted information loss.

New cards

Structured Query Language (SQL)

A declarative language used for querying and updating tables in a relational database.

New cards

SELECT

The SELECT statement is used to extract fields from one or more tables.

New cards

Syntax of SELECT

SELECT FROM WHERE ORDER BY

New cards

ORDER BY

The results of an SQL query can be sorted in ascending or descending sequence. ASC: ascending, DESC: descending Ascending order is assumed.

New cards

Wild Card

: (selects from) all fields

New cards

Other SQL Statements

BETWEEN: between an inclusive range IN: specify multiple possible values for a column LIKE: with matching characteristics AND/OR/NOT

New cards

Semicolon

Some database systems require a semicolon at the end of each SQL statement.

New cards

What is the role of database administrators?

Responsible for accurately and securely using, maintaining and developing computerised databases within a wide range of public and private sector organisations.

New cards

Why are database recovery techniques important?

Databases are prone to failures due to inconsistency, network failure, errors or any kind of accidental damage. So database recovery techniques are highly important to bring a database back into a working state after a failure.

New cards

What are four different database recovery techniques?

Mirroring Recovery using Backups Recovery using Transaction logs Shadow paging

New cards

Mirroring

Two complete copies of the database are maintained online on different stable storage devices.

New cards

When is mirroring used?

Mostly used in environments that require non-stop fault-tolerant operations.

New cards

What are the two types of backups?

Immediate backup Archival backup

New cards

Immediate Backup

Kept in a floppy disk, hard disk or magnetic tapes. These come in handy when a technical fault occurs in the primary database such as a system failure, disk crash, or network failure. Damage due to virus attacks are repaired using the immediate backup.

New cards

Archival Backup

Kept in mass storage devices such as magnetic tape, CD-ROMs, Internet Servers etc. They are very useful for recovering data after a disaster such as fire, earthquake, flood etc.

New cards

Where should archival backups be kept and why?

Archival Backup should be kept at a different site other than where the system is functioning. Archival Backup at a separate place remains safe from thefts and intentional destruction by users or staff.

New cards

Recovering data using transaction logs

Step 1: The log searches for all the transactions that have recorded a [start transaction, ''] entry, but haven't recorded a corresponding [commit, ''] entry.. Step 2: These transactions are rolled back. Step 3: Changes to the database which are carried out by transactions that have recorded a [commit, ''] entry in the log, are recorded. These changes will follow to undo their effects on the database.

New cards

Recovering data using shadow paging

In shadow paging, a database is divided into several fixed-sized disk pages, say n, which is created in a 'current directory'. It has n entries with each entry pointing to a disk page in the database. The current directory is transferred to the main memory. When a transaction begins executing, the current directory copies into a shadow directory. The shadow directory saves on the disk. During the transaction execution, all the modifications are made on the current directory and the shadow directory is never modified.

New cards

Integrated database

The term integrated database is used to describe two different database structures: connection of multiple databases, or a database built into another application tool.

New cards

Connecting multiple databases into an integrated database

Connections between the different databases must be made and tested from a variety of perspectives to ensure that the logic is sound. The different databases can also exist in different formats or programs. This adds to the complexity but may be achieved using advanced programming logic and powerful resources.

New cards

Databases in stock control

The database holds details and quantities of the products you stock, together with information on product suppliers.

New cards

Using databases in real life

Stock Control Police Records Health Records Employee Data

New cards

Data Discovery

Involves discovering which data sets exist in the organisation, which of them are business critical and which contains sensitive data that might be subject to compliance regulations.

New cards

Data Loss Prevention (DLP)

A set of strategies and tools that can be used to prevent data from being stolen, lost, or accidentally deleted.

New cards

Snapshots

Preserves the entire state and data of the system at the time it is taken. Similar to a backup, but is a complete image of a protected system, including data and system files. Can be used to restore an entire system to a specific point in time.

New cards

Replication

A technique for copying data on an ongoing basis from a protected system to another location. This provides a living up-to-date copy of the data, allowing not only recovery but also immediate failover to the copy if the primary system goes down.

New cards

Firewall

Utilities that enable you to monitor and filter network traffic Can be used to ensure that only authorised users are allowed to access or transfer data.

New cards

Authentication and Authorisation

Controls that help verify credentials and assure that user privileges are applied correctly. These measures are typically used as part of an identity and access management (IAM) solution and in combination with role-based access controls (RBAC).

New cards

Encryption

Process of converting readable data into unreadable characters to prevent unauthorised access.

New cards

Endpoint Protection

Protects gateways to your network, including ports, routers, and connected devices. Endpoint protection software typically enables you to monitor your network perimeter and filter traffic as needed.

New cards

Data Erasure

Limits liability by deleting data that is no longer needed. This can be done after data is processed and analysed or periodically when data is no longer relevant.

New cards

Disaster Recovery

A set of practices and technologies that determine how an organisation deals with a disaster, such as a cyber attack, natural disaster, or large-scale equipment failure. Typicially involves setting up a remote disaster recovery site with copies of protected systems, and switching operations to those systems in case of disaster.

New cards

Data Protection Act 2018

Controls how your personal information is used by organisations, businesses or the government. It is the UK's implementation of the General Data Protection Regulation (GDPR).

New cards

The Data Protection Act 2018 states that information must be...

Used fairly, lawfully and transparently Used for specified, explicit purposes Used in a way that is adequate, relevant and limited to only what is necessary Accurate and, where necessary, kept up to date Kept for no longer than necessary Handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage

New cards

The Computer Misuse Act (1990) makes the following illegal:

Unauthorised access to computer material e.g. hacking Unauthorised access to computer materials with intent to commit a further crime e.g. viruses Unauthorised modification of data e.g. electronic vandalism Making, supplying or obtaining anything which can be used in computer misuse offences

New cards

Data Matching

The process of comparing two different sets of data and matching them against each other.

New cards

What is the purpose of data matching?

The purpose of the process is to find the data that refers to the same entity. Many times the data comes from two or more different sets of data and have no common identifiers. But data matching is also useful to detect duplicate data within a database.

New cards

What are the two ways of linking data?

Deterministic record linkage, based on several matching identifiers. Probabilistic record linkage, based on the probability of several identifiers matching.

New cards

What is the most common way of linking data?

The most common is probabilistic record linking as deterministic linking tends to be too inflexible.

New cards

The process of probabilistic data matching

Standardise data Pick attributes unlikely to change Sort data into blocks Match via probabilities Assign value to matches Summarise to get total weight

New cards

Data Mining

The process of analysing data to extract information not offered by the raw data alone. By using software to look for patterns in large batches of data, businesses can learn more about their customers to develop more effective marketing strategies, increase sales and decrease costs.

New cards

What does data mining depend on?

Effective data collection, warehousing, and computer processing.

New cards

What is an ethical concern of data mining?

Users are often unaware of the data mining happening with their personal information, especially when it is used to influence preferences.

New cards

Object-oriented Database

A database that operates by storing data in objects rather than in tables.

New cards

Network Database

A type of database model wherein multiple member records or files can be linked to multiple owner files and vice versa.

New cards

Spatial Database

A database that is enhanced to store and access spatial data or data that defines a geometric space. These data are often associated with geographic locations and features, or constructed features like cities. Data on spatial databases are stored as coordinates, points, lines, polygons and topology.

New cards

Multi-dimensional Database

Created from multiple relational databases. While relational databases allow users to access data in the form of queries, multidimensional databases allow users to ask analytical questions related to business or market trends.

New cards

Where are object-oriented databases commonly used?

Applications that require high performance, calculations, and faster results. Some of the common applications that use object databases are real-time systems, architectural & engineering for 3D modeling, telecommunications, and scientific products.

New cards

Advantages of object-oriented databases

Provide persistent storage to objects Faster data access and better performance than a RDMS

New cards

Disadvantages of object-oriented databases

Object databases are not as popular as RDBMS. It is difficult to find object DB developers. Not many programming languages support object databases. RDBMS have SQL as a standard query language. Object databases do not have a standard. Object databases are difficult to learn for non-programmers.

New cards

Data Warehouse

An enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. A data warehouse can store both current and historical data in one place and is designed to give a long-range view of data over time, making it a primary component of business intelligence.

New cards

Advantages of a data warehouse

Consolidated data from many sources Historical data analysis Data quality, consistency, and accuracy Separation of analytics processing from transactional databases, which improves performance of both systems

New cards

Data Lake

A storage repository that holds a vast amount of raw data in its original format until the business needs it.

New cards

Schema

An abstract design that represents the storage of data in a database.

New cards

Data Mart

A data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business.

New cards

Why is data warehousing time dependent?

The data stored in the data warehouse is to support decision making - usually to make predictions and projections. The availability of new data can invalidate previous decisions. To combat this it is important that the data in a data warehouse is constantly updated to reflect the most recent data values.

New cards

How is data in a warehouse is updated in real time?

Data is refreshed from data in operational systems

New cards

ETL

A process in Data Warehousing and it stands for Extract, Transform and Load. It is a process in which an ETL tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into the Data Warehouse system.

New cards

Extract in ETL

Data from various source systems is extracted which can be in various formats. It is important to extract the data from various source systems and store it into the staging area first and not directly into the data warehouse because the extracted data is in various formats and can be corrupted. Loading it directly into the data warehouse may damage it and rollback will be much more difficult.

New cards

Transformation in ETL

A set of rules or functions are applied on the extracted data to convert it into a single standard format. It may involve the following processes/tasks: Filtering - loading only certain attributes into the data warehouse. Cleaning - filling up the NULL values with some default values, mapping U.S.A, United States, and America into USA, etc. Joining - joining multiple attributes into one. Splitting - splitting a single attribute into multiple attributes. Sorting - sorting tuples on the basis of some attribute (generally key-attribute).

New cards

Loading in ETL

The transformed data is loaded into the data warehouse.

New cards

Different types of patterns in data

Periodic Patterns Associative Patterns Abnormal Patterns Structural Patterns Chaotic Patterns

New cards

Periodic Patterns

Patterns that are seen repeating themselves after a certain lapse of time.

New cards

Associative Patterns

These are like Bread and Butter/Knife and Fork, i.e. co-occurring groups of things that make more sense with each other. The members of this pattern are complementary to each other.

New cards

Abnormal Patterns

Occurs when the data has a clear deviation from normal behaviour, an unexpected pattern appears between expected patterns and its appearance is not periodic.

New cards

Structural Patterns

Like pathfinding in graphs or cluster identification. An example would be low-cost residences tend to occur in suburbs whereas downtown has higher costing apartments.

New cards

Chaotic Patterns

Where the patterns appear but have no definitive characteristic related to time/space/frequency.

New cards

Advantages of data mining

It helps companies gather reliable information It's an efficient, cost-effective solution compared to other data applications Data mining uses both new and legacy systems It helps businesses make informed decisions It helps data scientists easily analyze enormous amounts of data quickly Data scientists can use the information to detect fraud, build risk models, and improve product safety It helps data scientists quickly initiate automated predictions of behaviors and trends and discover hidden patterns

New cards

Types of predictive modelling

Classification model Clustering model Forecast model Outliers model Time series model

New cards

Classification model

Considered the simplest model, it categorizes data for simple and direct query response. An example use case would be to answer the question "Is this a fraudulent transaction?"

New cards

Clustering model

This model nests data together by common attributes. It works by grouping things or people with shared characteristics or behaviors and plans strategies for each group at a larger scale. An example is in determining credit risk for a loan applicant based on what other people in the same or a similar situation did in the past.

New cards

Forecast model

Works on anything with a numerical value based on learning from historical data. For example, in answering how much lettuce a restaurant should order next week or how many calls a customer support agent should be able to handle per day or week, the system looks back to historical data.

New cards

Outliers model

This model works by analyzing abnormal or outlying data points. For example, a bank might use an outlier model to identify fraud by asking whether a transaction is outside of the customer's normal buying habits or whether an expense in a given category is normal or not.

New cards

Time series model

This model evaluates a sequence of data points based on time. For example, the number of stroke patients admitted to the hospital in the last four months is used to predict how many patients the hospital might expect to admit next week, next month or the rest of the year.

New cards

Data Segmentation

The process of taking the data you hold and dividing it up and grouping similar data together based on the chosen parameters so that you can use it more efficiently within marketing and operations.

New cards

Advantages of data segmentation

Through customer segmentation you will be able create messaging that is tailored and sophisticated to suit your target market - appealing to their needs better. It allows you to easier conduct an analysis of your data stored in your database, helping to identify potential opportunities and challenges based within it. Enables you to mass-personalise your marketing communications, reducing costs

New cards

Link Analysis

Link analysis is a data analysis technique used in network theory that is used to evaluate the relationships or connections between network nodes. These relationships can be between various types of objects (nodes), including people, organizations and even transactions. Link analysis is often used in search engine optimization as well as in intelligence, in security analysis and in market and medical research.

New cards

What are the three primary purposes of link analysis?

Find matches for known patterns of interests between linked objects. Find anomalies by detecting violated known patterns. Find new patterns of interest (for example, in social networking and marketing and business intelligence).

New cards

Deviation Detection

Data mining technique that identifies outliers, which express deviation from some previously known expectation and norm.

New cards

Logic

The system of operations performed by a computer that underlies the machine's representation of logical operations

New cards

Cascading Delete

occurs when associated child rows are deleted along with the deletion of a parent row

New cards

Database Transaction

A database transaction is a process carried out on a database, which may change its state, for example: moving money between bank accounts

New cards

The importance of durability in a database transaction

Durability in databases is important because it ensures transactions are saved permanently, and do not accidentally disappear or get erased

New cards

Different types of relationships within databases

One to one; One to many; Many to one; Many to many;

New cards

Data dictionary

It is a database that provides meta-data about the database. A data dictionary contains how much space has been allocated for and is currently used, and default values for columns/attributes. They do not contain any actual data from the database, only information for managing it

New cards

Responsibilities of a database administrator

Installing and configuring software, Creating new databases, Ensure database security is implemented to safeguard the data, Back up and recover the database

New cards

Difference between an information system and a database

An information system is the collection of software, hardware, networking infrastructure, human resources and databases that provide the storage, processing and communication of information. A database is a subset of the information system.

New cards

Atomicity (ACID)

Ensures that either all changes to data are performed as if they are a single operation or none of them are performed.

New cards

Consistency (ACID)

Ensures that data is in a consistent state when a transaction begins and when it ends. Ensures that only valid data is written in the database.

New cards

Isolation (ACID)

Ensures that the intermediate state of a transaction is invisible to other transactions

New cards

Durability (ACID)

Ensures that after a transaction successfully completes, changes to data persist and are not undone even in the event of a system failure