Databases

studied byStudied by 8 people
5.0(1)
Get a hint
Hint

Entity Relationship Diagram (ERD)

1 / 98

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

99 Terms

1

Entity Relationship Diagram (ERD)

A diagram that depicts an entity relationship model's entities, attributes, and relations.

New cards
2

Normalisation

The process of minimising redundancy from a relation or set of relations. Redundancy in a relation may cause insertion, deletion and update anomalies.

New cards
3

Relation

table in a relational database

New cards
4

What are the three stages in normalisation?

First Normal Form (1NF), Second Normal Form (2NF) and Third Normal Form (3NF) (This is a direct process, cannot go to 2NF without 1NF)

New cards
5

First Normal Form (1NF)

A table is in first normal form if it contains no repeating attributes or groups of attributes. All attributes must be atomic - a single attribute cannot consist of two data items such as firstname and surname. This would make it difficult or impossible to sort on surname.

New cards
6

Second Normal Form (2NF)

A table is in second normal form (2NF) if it is in first normal form and contains no partial dependencies. It is a table that is in First Normal Form with every non-primary-key attribute fully functionally dependent on the primary key. A relation with a single-attribute (column) primary key is automatically in at least 2NF.

New cards
7

Third Normal Form (3NF)

A relation that is in Second Normal Form and in which no non-primary-key attribute is transitively dependent on the primary key.

New cards
8

Transitive Dependency

A transitive dependency occurs when one attribute depends on a second attribute, which depends on a third attribute. Deletions in a table with such a dependency can cause unwanted information loss.

New cards
9

Structured Query Language (SQL)

A declarative language used for querying and updating tables in a relational database.

New cards
10

SELECT

The SELECT statement is used to extract fields from one or more tables.

New cards
11

Syntax of SELECT

SELECT FROM WHERE ORDER BY

New cards
12

ORDER BY

The results of an SQL query can be sorted in ascending or descending sequence. ASC: ascending, DESC: descending Ascending order is assumed.

New cards
13

Wild Card

  • : (selects from) all fields

New cards
14

Other SQL Statements

BETWEEN: between an inclusive range IN: specify multiple possible values for a column LIKE: with matching characteristics AND/OR/NOT

New cards
15

Semicolon

Some database systems require a semicolon at the end of each SQL statement.

New cards
16

What is the role of database administrators?

Responsible for accurately and securely using, maintaining and developing computerised databases within a wide range of public and private sector organisations.

New cards
17

Why are database recovery techniques important?

Databases are prone to failures due to inconsistency, network failure, errors or any kind of accidental damage. So database recovery techniques are highly important to bring a database back into a working state after a failure.

New cards
18

What are four different database recovery techniques?

Mirroring Recovery using Backups Recovery using Transaction logs Shadow paging

New cards
19

Mirroring

Two complete copies of the database are maintained online on different stable storage devices.

New cards
20

When is mirroring used?

Mostly used in environments that require non-stop fault-tolerant operations.

New cards
21

What are the two types of backups?

Immediate backup Archival backup

New cards
22

Immediate Backup

Kept in a floppy disk, hard disk or magnetic tapes. These come in handy when a technical fault occurs in the primary database such as a system failure, disk crash, or network failure. Damage due to virus attacks are repaired using the immediate backup.

New cards
23

Archival Backup

Kept in mass storage devices such as magnetic tape, CD-ROMs, Internet Servers etc. They are very useful for recovering data after a disaster such as fire, earthquake, flood etc.

New cards
24

Where should archival backups be kept and why?

Archival Backup should be kept at a different site other than where the system is functioning. Archival Backup at a separate place remains safe from thefts and intentional destruction by users or staff.

New cards
25

Recovering data using transaction logs

Step 1: The log searches for all the transactions that have recorded a [start transaction, ''] entry, but haven't recorded a corresponding [commit, ''] entry.. Step 2: These transactions are rolled back. Step 3: Changes to the database which are carried out by transactions that have recorded a [commit, ''] entry in the log, are recorded. These changes will follow to undo their effects on the database.

New cards
26

Recovering data using shadow paging

In shadow paging, a database is divided into several fixed-sized disk pages, say n, which is created in a 'current directory'. It has n entries with each entry pointing to a disk page in the database. The current directory is transferred to the main memory. When a transaction begins executing, the current directory copies into a shadow directory. The shadow directory saves on the disk. During the transaction execution, all the modifications are made on the current directory and the shadow directory is never modified.

New cards
27

Integrated database

The term integrated database is used to describe two different database structures: connection of multiple databases, or a database built into another application tool.

New cards
28

Connecting multiple databases into an integrated database

Connections between the different databases must be made and tested from a variety of perspectives to ensure that the logic is sound. The different databases can also exist in different formats or programs. This adds to the complexity but may be achieved using advanced programming logic and powerful resources.

New cards
29

Databases in stock control

The database holds details and quantities of the products you stock, together with information on product suppliers.

New cards
30

Using databases in real life

Stock Control Police Records Health Records Employee Data

New cards
31

Data Discovery

Involves discovering which data sets exist in the organisation, which of them are business critical and which contains sensitive data that might be subject to compliance regulations.

New cards
32

Data Loss Prevention (DLP)

A set of strategies and tools that can be used to prevent data from being stolen, lost, or accidentally deleted.

New cards
33

Snapshots

Preserves the entire state and data of the system at the time it is taken. Similar to a backup, but is a complete image of a protected system, including data and system files. Can be used to restore an entire system to a specific point in time.

New cards
34

Replication

A technique for copying data on an ongoing basis from a protected system to another location. This provides a living up-to-date copy of the data, allowing not only recovery but also immediate failover to the copy if the primary system goes down.

New cards
35

Firewall

Utilities that enable you to monitor and filter network traffic Can be used to ensure that only authorised users are allowed to access or transfer data.

New cards
36

Authentication and Authorisation

Controls that help verify credentials and assure that user privileges are applied correctly. These measures are typically used as part of an identity and access management (IAM) solution and in combination with role-based access controls (RBAC).

New cards
37

Encryption

Process of converting readable data into unreadable characters to prevent unauthorised access.

New cards
38

Endpoint Protection

Protects gateways to your network, including ports, routers, and connected devices. Endpoint protection software typically enables you to monitor your network perimeter and filter traffic as needed.

New cards
39

Data Erasure

Limits liability by deleting data that is no longer needed. This can be done after data is processed and analysed or periodically when data is no longer relevant.

New cards
40

Disaster Recovery

A set of practices and technologies that determine how an organisation deals with a disaster, such as a cyber attack, natural disaster, or large-scale equipment failure. Typicially involves setting up a remote disaster recovery site with copies of protected systems, and switching operations to those systems in case of disaster.

New cards
41

Data Protection Act 2018

Controls how your personal information is used by organisations, businesses or the government. It is the UK's implementation of the General Data Protection Regulation (GDPR).

New cards
42

The Data Protection Act 2018 states that information must be...

Used fairly, lawfully and transparently Used for specified, explicit purposes Used in a way that is adequate, relevant and limited to only what is necessary Accurate and, where necessary, kept up to date Kept for no longer than necessary Handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage

New cards
43

The Computer Misuse Act (1990) makes the following illegal:

Unauthorised access to computer material e.g. hacking Unauthorised access to computer materials with intent to commit a further crime e.g. viruses Unauthorised modification of data e.g. electronic vandalism Making, supplying or obtaining anything which can be used in computer misuse offences

New cards
44

Data Matching

The process of comparing two different sets of data and matching them against each other.

New cards
45

What is the purpose of data matching?

The purpose of the process is to find the data that refers to the same entity. Many times the data comes from two or more different sets of data and have no common identifiers. But data matching is also useful to detect duplicate data within a database.

New cards
46

What are the two ways of linking data?

Deterministic record linkage, based on several matching identifiers. Probabilistic record linkage, based on the probability of several identifiers matching.

New cards
47

What is the most common way of linking data?

The most common is probabilistic record linking as deterministic linking tends to be too inflexible.

New cards
48

The process of probabilistic data matching

Standardise data Pick attributes unlikely to change Sort data into blocks Match via probabilities Assign value to matches Summarise to get total weight

New cards
49

Data Mining

The process of analysing data to extract information not offered by the raw data alone. By using software to look for patterns in large batches of data, businesses can learn more about their customers to develop more effective marketing strategies, increase sales and decrease costs.

New cards
50

What does data mining depend on?

Effective data collection, warehousing, and computer processing.

New cards
51

What is an ethical concern of data mining?

Users are often unaware of the data mining happening with their personal information, especially when it is used to influence preferences.

New cards
52

Object-oriented Database

A database that operates by storing data in objects rather than in tables.

New cards
53

Network Database

A type of database model wherein multiple member records or files can be linked to multiple owner files and vice versa.

New cards
54

Spatial Database

A database that is enhanced to store and access spatial data or data that defines a geometric space. These data are often associated with geographic locations and features, or constructed features like cities. Data on spatial databases are stored as coordinates, points, lines, polygons and topology.

New cards
55

Multi-dimensional Database

Created from multiple relational databases. While relational databases allow users to access data in the form of queries, multidimensional databases allow users to ask analytical questions related to business or market trends.

New cards
56

Where are object-oriented databases commonly used?

Applications that require high performance, calculations, and faster results. Some of the common applications that use object databases are real-time systems, architectural & engineering for 3D modeling, telecommunications, and scientific products.

New cards
57

Advantages of object-oriented databases

Provide persistent storage to objects Faster data access and better performance than a RDMS

New cards
58

Disadvantages of object-oriented databases

Object databases are not as popular as RDBMS. It is difficult to find object DB developers. Not many programming languages support object databases. RDBMS have SQL as a standard query language. Object databases do not have a standard. Object databases are difficult to learn for non-programmers.

New cards
59

Data Warehouse

An enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources, such as point-of-sale transactions, marketing automation, customer relationship management, and more. A data warehouse can store both current and historical data in one place and is designed to give a long-range view of data over time, making it a primary component of business intelligence.

New cards
60

Advantages of a data warehouse

Consolidated data from many sources Historical data analysis Data quality, consistency, and accuracy Separation of analytics processing from transactional databases, which improves performance of both systems

New cards
61

Data Lake

A storage repository that holds a vast amount of raw data in its original format until the business needs it.

New cards
62

Schema

An abstract design that represents the storage of data in a database.

New cards
63

Data Mart

A data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business.

New cards
64

Why is data warehousing time dependent?

The data stored in the data warehouse is to support decision making - usually to make predictions and projections. The availability of new data can invalidate previous decisions. To combat this it is important that the data in a data warehouse is constantly updated to reflect the most recent data values.

New cards
65

How is data in a warehouse is updated in real time?

Data is refreshed from data in operational systems

New cards
66

ETL

A process in Data Warehousing and it stands for Extract, Transform and Load. It is a process in which an ETL tool extracts the data from various data source systems, transforms it in the staging area, and then finally, loads it into the Data Warehouse system.

New cards
67

Extract in ETL

Data from various source systems is extracted which can be in various formats. It is important to extract the data from various source systems and store it into the staging area first and not directly into the data warehouse because the extracted data is in various formats and can be corrupted. Loading it directly into the data warehouse may damage it and rollback will be much more difficult.

New cards
68

Transformation in ETL

A set of rules or functions are applied on the extracted data to convert it into a single standard format. It may involve the following processes/tasks: Filtering - loading only certain attributes into the data warehouse. Cleaning - filling up the NULL values with some default values, mapping U.S.A, United States, and America into USA, etc. Joining - joining multiple attributes into one. Splitting - splitting a single attribute into multiple attributes. Sorting - sorting tuples on the basis of some attribute (generally key-attribute).

New cards
69

Loading in ETL

The transformed data is loaded into the data warehouse.

New cards
70

Different types of patterns in data

Periodic Patterns Associative Patterns Abnormal Patterns Structural Patterns Chaotic Patterns

New cards
71

Periodic Patterns

Patterns that are seen repeating themselves after a certain lapse of time.

New cards
72

Associative Patterns

These are like Bread and Butter/Knife and Fork, i.e. co-occurring groups of things that make more sense with each other. The members of this pattern are complementary to each other.

New cards
73

Abnormal Patterns

Occurs when the data has a clear deviation from normal behaviour, an unexpected pattern appears between expected patterns and its appearance is not periodic.

New cards
74

Structural Patterns

Like pathfinding in graphs or cluster identification. An example would be low-cost residences tend to occur in suburbs whereas downtown has higher costing apartments.

New cards
75

Chaotic Patterns

Where the patterns appear but have no definitive characteristic related to time/space/frequency.

New cards
76

Advantages of data mining

It helps companies gather reliable information It's an efficient, cost-effective solution compared to other data applications Data mining uses both new and legacy systems It helps businesses make informed decisions It helps data scientists easily analyze enormous amounts of data quickly Data scientists can use the information to detect fraud, build risk models, and improve product safety It helps data scientists quickly initiate automated predictions of behaviors and trends and discover hidden patterns

New cards
77

Types of predictive modelling

Classification model Clustering model Forecast model Outliers model Time series model

New cards
78

Classification model

Considered the simplest model, it categorizes data for simple and direct query response. An example use case would be to answer the question "Is this a fraudulent transaction?"

New cards
79

Clustering model

This model nests data together by common attributes. It works by grouping things or people with shared characteristics or behaviors and plans strategies for each group at a larger scale. An example is in determining credit risk for a loan applicant based on what other people in the same or a similar situation did in the past.

New cards
80

Forecast model

Works on anything with a numerical value based on learning from historical data. For example, in answering how much lettuce a restaurant should order next week or how many calls a customer support agent should be able to handle per day or week, the system looks back to historical data.

New cards
81

Outliers model

This model works by analyzing abnormal or outlying data points. For example, a bank might use an outlier model to identify fraud by asking whether a transaction is outside of the customer's normal buying habits or whether an expense in a given category is normal or not.

New cards
82

Time series model

This model evaluates a sequence of data points based on time. For example, the number of stroke patients admitted to the hospital in the last four months is used to predict how many patients the hospital might expect to admit next week, next month or the rest of the year.

New cards
83

Data Segmentation

The process of taking the data you hold and dividing it up and grouping similar data together based on the chosen parameters so that you can use it more efficiently within marketing and operations.

New cards
84

Advantages of data segmentation

Through customer segmentation you will be able create messaging that is tailored and sophisticated to suit your target market - appealing to their needs better. It allows you to easier conduct an analysis of your data stored in your database, helping to identify potential opportunities and challenges based within it. Enables you to mass-personalise your marketing communications, reducing costs

New cards
85

Link Analysis

Link analysis is a data analysis technique used in network theory that is used to evaluate the relationships or connections between network nodes. These relationships can be between various types of objects (nodes), including people, organizations and even transactions. Link analysis is often used in search engine optimization as well as in intelligence, in security analysis and in market and medical research.

New cards
86

What are the three primary purposes of link analysis?

Find matches for known patterns of interests between linked objects. Find anomalies by detecting violated known patterns. Find new patterns of interest (for example, in social networking and marketing and business intelligence).

New cards
87

Deviation Detection

Data mining technique that identifies outliers, which express deviation from some previously known expectation and norm.

New cards
88

Logic

The system of operations performed by a computer that underlies the machine's representation of logical operations

New cards
89

Cascading Delete

occurs when associated child rows are deleted along with the deletion of a parent row

New cards
90

Database Transaction

A database transaction is a process carried out on a database, which may change its state, for example: moving money between bank accounts

New cards
91

The importance of durability in a database transaction

Durability in databases is important because it ensures transactions are saved permanently, and do not accidentally disappear or get erased

New cards
92

Different types of relationships within databases

One to one; One to many; Many to one; Many to many;

New cards
93

Data dictionary

It is a database that provides meta-data about the database. A data dictionary contains how much space has been allocated for and is currently used, and default values for columns/attributes. They do not contain any actual data from the database, only information for managing it

New cards
94

Responsibilities of a database administrator

Installing and configuring software, Creating new databases, Ensure database security is implemented to safeguard the data, Back up and recover the database

New cards
95

Difference between an information system and a database

An information system is the collection of software, hardware, networking infrastructure, human resources and databases that provide the storage, processing and communication of information. A database is a subset of the information system.

New cards
96

Atomicity (ACID)

Ensures that either all changes to data are performed as if they are a single operation or none of them are performed.

New cards
97

Consistency (ACID)

Ensures that data is in a consistent state when a transaction begins and when it ends. Ensures that only valid data is written in the database.

New cards
98

Isolation (ACID)

Ensures that the intermediate state of a transaction is invisible to other transactions

New cards
99

Durability (ACID)

Ensures that after a transaction successfully completes, changes to data persist and are not undone even in the event of a system failure

New cards

Explore top notes

note Note
studied byStudied by 12 people
... ago
5.0(2)
note Note
studied byStudied by 13 people
... ago
5.0(1)
note Note
studied byStudied by 17 people
... ago
5.0(1)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 25 people
... ago
4.0(1)
note Note
studied byStudied by 54 people
... ago
5.0(3)
note Note
studied byStudied by 206 people
... ago
5.0(3)
note Note
studied byStudied by 2 people
... ago
5.0(1)

Explore top flashcards

flashcards Flashcard (50)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (103)
studied byStudied by 46 people
... ago
5.0(1)
flashcards Flashcard (41)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (60)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 67 people
... ago
5.0(1)
flashcards Flashcard (38)
studied byStudied by 12 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 9 people
... ago
4.0(1)
flashcards Flashcard (30)
studied byStudied by 5 people
... ago
5.0(1)
robot