1.3.2 Databases

5.0(1)
studied byStudied by 9 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/86

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

87 Terms

1
New cards
File (table)
A collection of related records which contain the same type of information as all the other records
2
New cards
Record
__***All***__ of the data relating to one item or “thing”. 
3
New cards
Field
Stores one __***item***__ of data in a record
4
New cards
Flat file databases
Usually only has one table (it may have more than one table, but with no links between them)
5
New cards
Advantages of Flat file databases
Easy to set up and understand – no specialist knowledge needed
6
New cards
Disadvantages of Flat file databases
* May have redundant data


* Potential duplication of data


* Harder to update the data due to duplication


* Records are not unique


* Security is poor as if access is gained all data is readily available


* Harder to perform complex queries
7
New cards
Relational databases
This type of database has more than one table which are linked together by primary and foreign keys.
8
New cards
Primary key
Used to ***uniquely identify*** a record
9
New cards

Foreign key

A foreign key is a primary key in one table which is sued as an attribute in another table to provide a link between table.

A foreign key in one table is always linked to a key field in another table.

10
New cards

Compound key

A combination of fields that makes a record unique (but breaks 2 Normal Form)

11
New cards
Secondary key
Made on a field that you would like to be indexed for faster searches

They may not be unique

They are used for sorting or searching data
12
New cards
Advantages of relational data bases
* Data is only stored once – this avoids data duplication and means that data is only stored once.


* Data integrity is maintained


* Easy to modify the format


* Security is better as sections of the data can be protected individually using authorisation


* More “future proof” as it is easier to update and expand the basic structure than a flat file.


* More complex queries can be run over the data
13
New cards
Disadvantages of relational databases
They are difficult to set up
14
New cards
Data integrity
* The correctness of the data over the lifetime of that data.


* This means that the data is correct when it enters the database, as it is queried and updated up until the point it is removed from the system.


* Data must be complete, accurate and consistent.
15
New cards
Entity integrity
Every table must have a primary key which is unique and not empty
16
New cards
Referential integrity 
Foreign keys must point to the primary key of another table but can be empty (meaning that there is no relationship)
17
New cards
Domain integrity
All attributes in the database are related to the overall domain that the database is working on.
18
New cards
Data redundancy
This refers to the unnecessary duplication of data.
19
New cards
Data consistency
* When data is held in more than one file it should be stored in a consistent way.


* A date field could be stored as a text field in one table but as a date/time field in another, meaning the data would be incompatible.


* In a relational database because the attributes of any one entity are contained within one table, there is no risk of it being stored in a different format in a different file
20
New cards
Data independence
* This refers to the separation of the data from the programs that use it. 


* This means that users can view data without having to be aware of the basic structure of the database, and allows them to manipulate it in many different ways.


* User can also not make changes to the underlying structure of the database.
21
New cards
Flat file vs Relational data bases
Flat file

* May have redundant data, which increases the overall size of the database and reduces data integrity


* Harder to update the database design


* No specialist knowledge needed to set one up

Relational Databases

* Data integrity is maintained


* Easier to update the database design


* Provides security features e.g. individual tables can be protected


* Need specialist knowledge to set up and maintain
22
New cards
ERD
Entity relationship diagram
23
New cards
Attributes
Describes the facts, details or characteristics of an entity.

These are the fields within a table
24
New cards
Relationships
The links between entities.

They can be one to one, one to many, or many to many.
25
New cards
One to one relationship
One thing to one thing

e.g A Husband can only have one wife and a wife can only have one husband
26
New cards
One to Many relationship
One thing to many things

e.g A mother can have many children but a child can only have one mother.
27
New cards
Many to many relationships
Many things to many things

e.g A book can be loaned by many customers in a library and a customer can loan many books
28
New cards
How can a many to many relationship be changed to a one to many relationship?
This is usually achieved by adding another entity which acts as a link table and:

* Creates one to many relationships


* Makes the relationship in to 3NF


* Reduces data inconsistencies
29
New cards
What is the standard notation for describing entities and attributes?
* The entity is always written in capital letters


* The primary key is underlined


* Foreign keys are overlined
30
New cards

Normalisation

The process of refining the structure of a database to minimise redundancy and improve integrity

When a database has been normalised, it is said to be in normal form

Helps organise the data efficiency, eliminate redundant data, ensure that only related data are stored in a table

31
New cards
UNF
Un-normalised Form

Data that has not been normalised

It potentially has:

* repeated data
* inconsistent data
32
New cards

1NF - First Normal Form

A table is in this form if it contains no repeating attributes or groups of attributes, data is atomic, AND has a primary key

33
New cards
Atomic data
Information which cannot be logically broken down in to smaller parts.

E.g. Full name **is not** atomic as it can be split in to first name and surname

First name **is** atomic as it cannot be split any further without losing detail.
34
New cards
2NF - Second Normal Form
In 1NF **AND** it contains no partial key dependencies

If a table has a single attribute for a primary key it is usually in 2NF as all of the fields depend on that key. 

It is when there is a compound primary key (a key made up of 2 or more fields) that something needs to be changed.
35
New cards
3NF - Third Normal Form
A table is in this form if data items are dependent on the key, the whole key and nothing but the key

Many-many relationships are not allowed in this form as it causes duplication of data.
36
New cards
Partial key dependencies
Occurs when one primary key determines some other attribute/attributes
37
New cards
Serial Access
* The records are stored chronologically
* New records are appended onto the end of the file
* There is no way to find a particular record except by searching through all of the preceding records
38
New cards
When is Serial file organisation used?
* The order of the data doesn’t really matter
* The total number of records is small
* Data **MUST** be stored in the order it is received
39
New cards
Sequential Access
* Records are stored one after another
* Ordered in a logical sequence based on a key field (usually primary key)
* In order to find a record, the whole file may still need to be searched (like in serial), however, it tends to be faster and more efficient than serial
40
New cards
Name two types of random access methods
* Indexed file
* Hashed file
41
New cards
How do you delete records in a serial file?
* Deleting records in a serial file is complex
* The file is scanned until the record is found
* Everything is copied except for the file you want to delete, the copy then replaces the old list
42
New cards
How do you add/delete a record in a sequential file?
* All records are in order of the key field so you cannot just add a new record to the end of the file
* The file is copied up to the point of insertion, the data is inserted and then the remainder of the file is copied
* The file is copied up to the point of the data to be deleted, this record is skipped and then the remainder of the file is copied
43
New cards
Indexed Sequential File Organisation
* Records are sorted according to a primary key but a separate index is also kept.
* The index is an additional file that contains information about where records are stored and allows them to be accessed directly.
* This could mean that groups of related records are physically stored in groups
44
New cards
How do you add/delete records in an indexed sequential file?
* Every time a record is added or deleted, it can affect the start and end point of the indexed “chunks” and so all indexes from that point onwards need to be updated.
* The index needs to be re-created after each addition or deletion, which is performed as per standard sequential files.
45
New cards
Random (direct) Access
* A **hash algorithm** (a mathematical algorithm) is used on key fields to generate an  address for the file - this calculates a disk address/location of a record based on its key field.


* The algorithm **MUST** produce the same address every time a key is used so that the data doesn’t get lost!
* Sometimes several records are linked to the same location, which means you have to search sequentially through those records
46
New cards
Collision
When too simple a hashing algorithm is used, several records could hash to the same address.
47
New cards

RDBMS

Relational Database Management System

  • Used to maintain data integrity

    Uses SQL which allows you to crate, delete, and alter the database

  • Has a way of documenting all the internal structures that make up the database

  • The ability to view the database from different viewpoints according to the requirements of the user

  • Provide some level of security and access control to the data

48
New cards
Data dictionary
* Used by database managers
* Is a file containing descriptions of the data stored in tables in a database
* Provides metadata (information about data and how it is stored)
* Describes the structure and attributes of data 'items' and includes the names and descriptions of tables and the fields contained in each table, as well as information about the data type, field length etc.
* Used when altering the structure of the database
49
New cards
Field valuation
Presence checks, lookup, range checks, picture check which are applied to a field
50
New cards
Indexes
Any field that has been indexed to improve search speed
51
New cards
Access rights
Which users can change / edit / modify etc. data
52
New cards
Data Manipulation Language (DML)
* Provides a means of manipulating data within the database


* Includes operations such as CRUD (Create, Read, Update, Delete)
* Many systems allow the user to set a task by using ‘Query by Example’

\
Used to:

* access/query/sort/search data


* store/insert data


* change/update/delete data
53
New cards
Query by Example
* Users are presented with a graphical view of the tables and use a number of icons such as 'filter' to manipulate the data.
* With some systems you can drag fields graphically into appropriate areas on the screen to set up the query.
* Behind the scenes, the QBE tool is compiling and running the required DML commands.
* This means that users do not need to have a sophisticated understanding of DMLs  to use the database.
54
New cards

Data Definition Language (DDL)

  • Helps to provide a method of creating a database from scratch

  • Used to write schema (designs) which describes data items to be stored and the relationships between them.

  • Allows tables to be created and primary/foreign keys to be defined.

  • Also defines validation rules

55
New cards
Name three views of data
* Physical view
* Logical view
* User view
56
New cards
Physical view
* This view exposes all the internal details of how the data is stored on the physical computer
* For example the data structures used in memory, the nature of the indexes used for searching, what physical devices are used to store the database and so on.
* At this level the operating system is interacting with the database
57
New cards
Logical view
It shows:

* a model of all the information within the database
* The kind of records it holds
* The type of authorisation and validation methods used.

This is the kind of view the database designer is using to set out the whole database

The data dictionary is produced at this level.
58
New cards
External view
This is the view that sets out what an authorised user can see of the database.

It sets out the data they are allowed to see and their authority level to change the records.
59
New cards
Opening a file
myFile = openRead(“sample.txt”)

x = myFile.readLine()

myFile.close()
60
New cards
Opening a file and printing its contents
myFile = openRead(“sample.txt”)

while NOT myFile.endOfFile()

print(myFile.readLine())

endwhile

myFile.close()
61
New cards
Writing to a file
\
myFile = openWrite(“sample.txt”)

myFile.writeLine(“Hello World”)

myFile.close()
62
New cards
Indexing
* A method used to store the position of each record ordered by a certain attribute.
* This is used to look up and access data quickly.
* The primary key is automatically indexed; however, the primary key is almost never queried since it is not normally remembered.
* This is why secondary keys are used. Secondary keys are indexed to make the table easier and faster to search through on those particular attributes.
63
New cards
Name three ways of capturing data
* Optical Mark Recognition (OMR)
* Optical Character Recognition (OCR)
* Manually entering it
64
New cards

What is exchanging data?

The process of transferring the collected data to a database

65
New cards
Selecting and Managing data
* Selecting the correct data is an important part of data preprocessing
* This could involve only selecting data that fits a certain criteria to reduce the volume of input.
* Collected data can alternatively be managed using SQL to sort, restructure and select certain sections.
66
New cards

Things to know about SQL commands

  • SQL commands are in capital letters

  • Multiple fields are separated by commas

  • Alter and drop are to do with structure

  • Update and delete from are to do with data

67
New cards

How do you add in a new field?

ALTER TABLE tablename ADD columnname datatype

68
New cards

How do you remove an existing field?

ALTER TABLE tablename DROP COLUMN columnname

69
New cards

How do you create a table within a database? (e.g Clubs)

CREATE TABLE Clubs ( ClubID INT NOT NULL, ClubName VARCHAR(50) NOT NULL, PRIMARY KEY (ClubID))

70
New cards

How do you insert data into a database?

INSERT INTO databasename.tablename (field1, field2) VALUES ('1', 'The Pullman Club');

71
New cards

How do you change a record?

UPDATE databasename.tablename Fieldname = ‘data’ WHERE tablename.primarykeyname = “primarykey”

72
New cards

How do you extract a sub-set of the table records?

SELECT * FROM tablename WHERE primarykey>1

73
New cards

How do you return a single record?

SELECT * FROM tablename WHERE primarykey=2

74
New cards

How do you include more than one table in the search?

SELECT fieldname(s) FROM table1

JOIN table2

ON table1.primarykey = table2.foreignkey

75
New cards

How do you delete records?

DELETE FROM tablename WHERE primarykey = 1

76
New cards

Methods of exchanging data

Data can be transferred:

  • Automatically using the DBMS software

  • By typing it in using a customised form

  • Importing it from another type of file

  • Using EDI (Electronic Data Interchange) this is used to transfer data between one computer system and another

  • APIs (application programming interfaces) uses web services to communicate using the HTTP protocol

  • SOAP a standardised protocl that sends messages using HTTP and SMTP

  • JSON (JavaScript Object Notation)

  • XML (Extensible Markup Language) used to separate data from its presentation on web pages

No matter which method is used, validation should occur to ensure that the data is sensible along with verification to ensure that the data entered matches the original

77
New cards

Transaction Processing

A single logical operation on data (e.g a customer booking a cinema ticket online, even though it involves multiple actions)

The database must ensure that it is not possible to only complete part of a transaction.

e.g. if seats have been successfully reserved, but payment fails, the seats should not stay reserved or change status to booked.

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that transactions are processed reliably.

78
New cards

Atomicity

Ensures that a transaction must be processed in its entirety or not at all

It must guarantee that in any situation, including power cuts or hard disk failure, it is not possible to only process part of a transaction.

Transactions should either succeed or fail but never partially succeed.  

79
New cards

Consistency

The transaction should only change the database according to the rules of the database in order to maintain its integrity

Changes to data in one table must take into account data in linked tables

This means that it will not be possible to create or delete a record if related records exist in linked tables e.g.

  • an employee cannot be deleted if they have salary details in another table

  • you can’t create a salary for an employee that doesn’t exist

80
New cards

Isolation

This property ensures that concurrent execution of transactions leads to the same results as if the transactions were processed one after the other.

i.e. Each transaction should not overwrite other transactions that are being processed at the same time

This is particularly relevant in a multi-user database where each transaction must not affect/overwrite other transactions being processed at the same time

81
New cards

Durability

Ensures that once a transaction has been committed, it will remain so, even in the result of a power cut.

e.g. in an online cinema ticket purchase, it should not be possible for the number of seats sold to be updated if the credit card transaction fails

As each part of the transaction is completed, it is held in a buffer on disk until all elements of the transaction have been completed.

Only then will the changes to the database be made.

82
New cards

Multi-user Database

These need to allow for concurrent processing

If multiple users need to read the same data, it generally isn't a problem, but allowing multiple users to simultaneously update or  write to a database table may cause one of the updates to be lost unless measures are taken to prevent it.

When an item is updated, the entire record will be copied into the user’s own local memory (on their local device).

When the record is saved, the data in memory is then rewritten in its entirety.

This can lead to unexpected results

83
New cards

Record locking

In its simplest form, the record is “locked” when the first user obtains the data and cannot be accessed by any other user until it is “unlocked” (saved).

However, this can cause problems…

If two users try to access a single record at the same time a situation can arise where neither can proceed, known as deadlock.

The DBMS (Database Management System) must recognise if this happens and must deal with the deadlock.

84
New cards

Serialisation

When two or more transactions are executed concurrently, the effect should be the same as if they had been executed serially i.e. one after the other.

It ensures that transactions do not overlap in time and therefore cannot interfere with each other or lead to updates being lost.

A transaction cannot start until the previous one has finished and this can be implemented using:

  • Timestamp ordering

  • Commitment ordering

85
New cards

Timestamp Ordering

Whenever a transaction starts, it is given a timestamp, so that it two transactions affect the same object, the transaction with the earliest timestamp should be applied first.

Each transaction is given a read timestamp and a write timestamp.

When a transaction starts, it reads a record of data, causing it to be read timestamped.

When the updated data is written back, the read timestamp is checked. 

If it is not the same in the database as the current record in memory steps are taken to solve the problem

86
New cards

Commitment Ordering

Transactions are ordered in terms of their dependencies on one another as well as the time they were initiated.

This involves evaluating the impact of the transactions before they are executed i.e. which data will be updated and how that will potentially affect other transactions

It can be used to prevent deadlock by blocking one request until another is completed.

Transactions are not completed in time order, but in commitment order

87
New cards

Redundancy

This could be a RAID(Redundant Array of Inexpensive Disks) setup or mirroring servers.

In its simplest(!) form, these organisations maintain two or three identical systems in different geographical locations, so that every transaction is written to two or three storage facilities.

This hardware redundancy protects against the loss of data in the event of power failure or other disasters.

If one system fails, one of the backup systems automatically takes over and processing continues.

If part of a database is lost, it can be recovered from one of the other versions.