1/60
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Lossy Compression
Non-essential data is permanently removed, for example, different shades of the same colour in an image or frequencies of sound outside the range of human hearing
Examples of lossy compression: .JPG, .MP3 and .MP4
Lossless Compression
Patterns in the data are spotted and summarised in a shorter format without permanently removing any information. Using this pattern information, a new file can be replicated exactly without any loss of data.
No data is lost, exactly the same as original
Examples of lossless compression: .ZIP
Why do we compress data?
Reducing the amount of data to send/store ensures that:
•Data is sent more quickly
•Less bandwidth is used as transfer limits may apply
•Buffering on audio and video streams is less likely to occur
•Less storage is required
Run Length Encoding (RLE)
A basic method of compression that summarises consecutive patterns of the same data
Works well with image and sound data where data could be repeated many times
RLE records one example of a piece of data and how many times it consecutively repeats.
Dictionary compression
Spots regularly occurring data and stores it separately in a dictionary
The reference to the entry in the dictionary is stored in the main file thereby reducing the original data stored
Splits a phrase up into repeated words or characters
Even though the dictionary produces additional overheads the space saving negates this problem
Encryption
A way of making sure data cannot be understood if you don’t possess the means to decrypt it
Plaintext of a message sent is encrypted using a cipher algorithm and key into equivalent ciphertext
When received, the ciphertext is decrypted back to plaintext using the same or different key
Two methods at the opposite end of the security spectrum are the Caesar cipher (low security level) and the Vernam cipher (high security level)
Caesar Cipher
Most basic type of encryption and the most insecure
Letters of the alphabet are shifted by a consistent amount
Can be easily decrypted by knowing the key (can be helped by using frequency analysis) or using a brute force method of attack
Brute Force attack
A brute force attack attempts to apply every possible key to decrypt ciphertext until one works
Frequency Analysis
Letters are not used equally often
In English, E is by far the most common letter, followed by T, A, O, I, N, S, R, then H
Other letters like Z, J, K, Q, X are fairly rare
Vernam Cipher
The encryption key, also known as the one-time pad, is the only cipher proven to be unbreakable
The key must be:
•a truly random sequence greater or equal in length than the plaintext and only ever used once
•Shared with the recipient by hand, independently of the message and destroyed immediately after use
How to make the one-time pad of a Vernam Cipher “truly random“?
The one-time pad must be truly random, generated from a physical and unpredictable phenomenon. Sources may include: atmospheric noise, radioactive decay, the movements of a mouse or snapshots of a lava lamp
A truly random key will render any frequency analysis useless as it would have a uniform distribution
Computer generated ‘random’ sequences are not actually random- they are generated from an algorithm. Therefore, if you know the algorithm behind the sequence, you would be able to predict the next character/number.
Symmetric encryption (private key encryption)
The same key is used to encrypt and decrypt data
This means that the key must also be transferred to the recipient.
Security problem: if you know the key used to encrypt the message , you can decrypt the message. The key can be intercepted as easily as the ciphertext message
Asymmetric encryption
Uses two separate but related keys
One key, known as the public key, is made public so that others wishing to send you data can use this key to encrypt it
The public key cannot decrypt the data
A private key, known only to you, is used to decrypt the data
Use of hashing in encryption
To verify a user’s password, the software applies the hash function to the user input and compares the hashed result with the one stored
This is useful for storing encrypted PINs and passwords so that they cannot be read by a hacker
Flat file database
Simplest kind of database- consisting of information about a single entity (table).
Relational Database
A database that organises data into multiple entities, using primary and foreign keys to connect related data. This reduces data redundancy.
Tables in a relational database are commonly referred to as relations
Entity
A category of object, person, event or thing of interest about which data needs to be recorded (represented as a table).
We commonly use the convention of giving each table name the prefix tbl
e.g. tblCustomer
Most databases hold data about several entities.
Field
A single piece of data in a record
Record
A group of related fields, representing one data entry
Attribute
Characteristics of the entity; Categories about which data is collected
Entity Descriptions
Format in which entities are written in database design (primary key must be underlined):
e.g. if entity is Customer:
Customer(custID, title, firstname, surname. email)
foreign keys are shown in italics:
Subscription (subID, startDate, endDate, custID)
Primary Key
Unique identifier for a particular record. If there is no natural attribute for the primary key, one should be introduced/ created.
Automatically indexed.
Foreign Key
A field in a table that refers to the primary key in another table. Used to link tables and create relationships.
Composite Primary Key
A primary key made out of two or more attributes to uniquely identify a record.
Both attributes are underlined in entity descriptions.
e.g. OrderLine(OrderNumber, OrderLine, ProductID)
Secondary Key
A field or fields other than the primary key that are indexed for faster lookups.
Relationships between entities
Three possible ways in which two entities may be related:
One-to-one
One-to-many
Many-to-many
A foreign key always goes on the “many” side of a relationship
Why should there be no many-to-many relationships in directly linked tables?
In a normalised database, there should be no many-to-many relationships between directly linked tables, as this would mean there would be several fields for the same record, which is impractical.
You cannot easily extract information from this table.
Therefore, we create an extra link table in the middle of these two entities to organise the data practically.
Entity Relationship Diagrams (E-R Diagrams)
A graphical way of representing the relationships between entities.
Referential Integrity
Referential integrity means that no foreign key in one table can reference a non-existent record in a related table
e.g. it should not be possible to add a subscription for a customer with custID C1 if there is no record for customer C1
A technique for ensuring data integrity in a relational database. It refers to the accuracy and consistency of data within a relationship.
A way to maintain referential integrity would be enforcing a cascade delete restraint on the primary key relationship between the tables. So now if we delete a record from one table, any associated rows from the other tables.
Referential integrity can be implemented to prevent us from adding a record to a table if there isn't a matching record to be linked to it in the other tables.
Normalisation
A process used to come up with the best possible design for a database
Tables should be organised so that data is not duplicated in the same table or in different tables (data redundancy)
The structure should allow complex queries to be made
Three stages in normalisation
First Normal Form (1NF)
A table that contains no repeating attributes/groups of attributes.
All attributes must be atomic: a single attribute cannot consist of two data items.
Second Normal Form (2NF)
A table that is in first normal form and contains no partial dependencies (can only occur if primary key in a table is a composite key).
Third Normal Form (3NF)
A table in second normal form and contains no non- key dependencies.
“All attributes are dependent on the key,
the whole key and nothing but the key”
Advantages of normalisation
Easier to maintain and change a normalised database
No unnecessary duplication of data (redundancy)
Data integrity is maintained – if a person changes address, for example, the update needs to be made only once to a single table, therefore no possibilities of having inconsistencies in the database
Having smaller tables with fewer fields means faster searches and savings in storage
Conditions in SQL
SQL SELECT
SELECT attribute, attribute, … or *
FROM table
WHERE condition
ORDER BY attribute ASC/DESC
SQL INSERT INTO
INSERT INTO table or table(specific attribute, specific attribute, …)
VALUES attribute, attribute, …
SQL DELETE FROM
DELETE FROM table
WHERE condition
SQL UPDATE
UPDATE table
SET attribute
WHERE condition
SQL Data Types
SQL CREATE TABLE
CREATE TABLE tablename
(
attribute DATATYPE (NOT NULL) PRIMARY KEY
attribute DATATYPE (NOT NULL)
…
)
SQL ALTER TABLE ADD column
ALTER TABLE table
ADD attribute DATATYPE …
SQL ALTER TABLE DELETE column
ALTER TABLE table
DROP attribute
SQL MODIFY column/ datatype of column
ALTER TABLE table,
MODIFY COLUMN attribute DATATYPE …
Methods of Data Capture
Manual methods: forms filled out by hand/ data is typed manually
Automated methods include:
•smart card readers
•barcode readers
•scanners
•optical mark recognition (OMR)
•optical character recognition (OCR)
•magnetic ink character recognition
•sensors
How can data be transferred into a database?
Automatically, using DBMS software
By typing it in using a customised form
Importing it from a spreadsheet or file
Using EDI (Electronic Data Interchange)
EDI (Electronic Data Interchange)
EDI is the computer to computer exchange of documents done electronically in a special standardised way using software.
EDI translation software may be used to translate the EDI format so the data can be input directly to a company database
An application where EDI is a suitable method of data exchange
Uber Eats- a lot of different people need a lot of different documentation. EDI ensures correct data, consistent format to exchange updates in real time.
Streamlines the operations
What is a transaction?
A single logical operation that may involve multiple operations on a database.
What is ACID?
A set of properties (Atomicity, Consistency, Isolation, Durability) that ensure that the integrity of the database is maintained under all circumstances.
It guarantees that transactions are processed reliably even in case of errors.
Atomicity
Requires that a transaction is processed in its entirety or not at all.
Cannot be partly processed in any situation, including power cuts or hard disk crashes.
Consistency
No transaction can violate any of the defined validation rules. Referential integrity is always upheld.
Isolation
Ensures that concurrent execution of transactions leads to the same result as if transactions were processed one after the other.
Crucial in multi user databases, as one transaction is not affected by another user.
Durability
Once a transaction has been committed, it will remain so, even in the event of a power cut.
As each part of a transaction is completed, it is held in a buffer on disk until all elements of the transaction are completed. Only then will the changes to the database tables be made. This ensures that committed transactions survive system failures.
Potential problems in a multi-user database
Allowing multiple users to simultaneously access a database could potentially cause one of the updates to be lost.
To prevent this, two methods are used: record locking and serialisation.
Record Locking
Record locking prevents simultaneous access to objects in a database in order to prevent updates being lost or inconsistencies in the data arising.
E.g. if User A is trying to carry out a transaction on a particular record, other users cannot access that record until User A has finished their transaction.
Explain how record locking can cause deadlock.
Record locking can cause deadlock when two or more users hold locks on separate records and each user requests access to the record held by the other. This scenario leads to a situation where neither user can proceed, as they are waiting for the other to release their lock, known as deadlock.
E.g. A trying to carry out transaction to B; B trying to carry out transaction to A.
Both are being record locked.
Serialisation
A method of controlling access to a database that ensures transactions are executed in a sequential order, preventing conflicts and maintaining data integrity.
Serialisation ensures that transactions do not overlap in time and therefore cannot interfere with each other or lead to updates being lost.
Two examples: Timestamp Ordering and Commitment Ordering
Timestamp Ordering
When a user tries to save an update, if the read timestamp is not the same as it was when they started the transaction, the DBMS knows another user has accessed the same object
The transaction will be cancelled and a message
“Update unsuccessful” sent to the user
Commitment Ordering
This technique ensures that no transactions are lost if two clients are simultaneously trying to update a record
Transactions are ordered in terms of their dependencies on one another as well as the time they were initiated
It can be used to prevent deadlock by blocking one request until another is completed.
Data Integrity
The process of maintaining the consistency of the database.
The data store must reflect the reality that it represents.