2.5 Databases and distributed systems

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/65

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

66 Terms

New cards

What are databases?

Organised collections of data, which allow for easy storage, management, and retrieval of data

New cards

What is it important for databases to have?

Data redudancy - The same piece of information is stored multiple times in different rows (This wastes storage, reduces integrity, and increases chance of inconsistencies as data is changed)

Data consistency - Data is accurate in all stored areas (Data doesnt conflict as it gets updated which would lead to unreliability and poor integrity of data)

Data integrity - Accuracy and reliability of data (no errors/duplicates/missing data) (Unreliabilitys can spread reducing integrity and can lead to communication errors)

New cards

What is the structure of a flatfile database

Data is stored in rows and columns
Each row represents one records
Each column is a field
All data is in one table

New cards

What is referal integrity?

A validation method for databases that is used when linking tables a record is required to exist in both. (For example primary and foreign keys and these prevent records being added that are not part of the relationship)

It ensures connecting data exists and you cannot delete data that is connected to other data

New cards

What are relational databses

They were introduced to solve the problem with flatfile

They have multiple tables linked together with primary keys

Characterisitcs

-Entities (each table in the database)

-Primary key (unique identifier for each record)

-Foreign key (these keys link tables)

-Composite key (Primary key of 2 or more fields)

No two records can have the same primary key

New cards

What are benefits of electronic databases?

Easier to add/delete/modify data
Data is easily backed up/copied
Acessibile by multiple people in diff locations at the same time
Less mistakes (spelling and copying error etc)
Storage is smaller physcially

New cards

What are types of database relationships?

One-to-One (Like person to student ID as one field represents one other field)

One-to-Many (Like School to student

Many-to-Many (Like student to classes but will create reduancies so will need to add a new table)

New cards

Whats an ERD

Shows how data is organised with entities, atribute and relationships (Like a shop can have many sales)

New cards

What is indexing

Used to speed up data retrieval by indexing a column by associating an index with certain characters (Like names starting with T)

New cards

Pros and cons of indexing?

+Faster retrieval

+Efficient sorting/filtering

+Improve performance

-Overheads

-More storage

-Needs updating with changes

New cards

What is normalistation?

Organising data in a relational database to reduce redudancy and improve data integrity.

There are 3 Levels (1NF, 2NF, 3NF)

New cards

What is 1NF and its Rules

1s Normalised Form.

It ensures data in a table is organised clearly and efficiently so each value is atomic (cannot be split)

Rules:
-Each column contains 1 value (no lists)

-Each record is unique (each row has primary key)

-Each field is dependant on the primary key

New cards

What is 2NF and its rules

2nd normalised form.

Rules:

-It is already in 1NF

-All-non key attributes depend on the whole primary key (Not on a composite)(For example spliting an order table with each product having a product ID and its details so you dont have to repeat them)

New cards

What is 3NF Rules

-Must be in 2NF

-Every non-key attribute depends on the primary key and not a non-key attribute

This removes repeated data by creating new tables

New cards

What is a data dictionary?

A document or table that describes the structure of a database (contains information about how data should be)

Its a reference that says:

-What data exists

-Where it exists

-How its used

New cards

What will a data dictionary store for each field?

-Field name

-Data type

-Size

-Constraints (Like if its a key and if it should contain letter ‘s’)

New cards

Why are data dictionaries important?

It outlines rules to allow for accuracy and consistency of data.

New cards

How is validation used in a database?

It is designed into a data dictionary when creating the table. Restrctions can be set during declaration with constraints

New cards

How would i create a new column in a table?

You would:
-Set the data type (like an int would reject a string)

-Reject null values (NOT NULL or instead just keep them)

-Set length checks (truncate data)

New cards

How would I format check in a database?

The database does not do that so it needs to be separately implemented

New cards

What is data redudancy?

When there are duplicate data in different tables (this compromises consistency as data may not be updated everywhere and also wastes storage)

New cards

What is data consistency?

The accuracy of data (all occurances of the same field data are the same value)

New cards

What are the benefits of a relational database?

Avoids redundancy
Easily edited
Supports complex queries
Consistnet records
Data is easily added and deleted

New cards

What is SQL

Structures Query Language. Is a general purpose language used to manipulate data in databases and tables

New cards

Create table query?

make sure to define NOT NULL, primary keys, foreign keys ,etc

With structure:

New cards

What is numeric data type for create table

numeric(x,y)

x is total digits

y is digits to right of decimal (use 2 for money)

New cards

Insert Into [Table] Values[]

New cards

select query

New cards

where… IN

e.g. WHERE name IN (‘Pete’,’steve’)

instead of where name == pete

New cards

AND

where name = ‘peter’ and age = 12

New cards

Similar to and but can allow one or both conditions

New cards

ORDER BY

ASC

DESC

New cards

BETWEEN

BETWEEN ID = 1 AND ID = 2

New cards

UPDATE query

New cards

DELETE query

New cards

Join query

can use where instead of join on

New cards

GROUP BY [Field];

You would use this to group data that is associated (such as associating all the cash payments associated with one person and grouping it by cash payments to add all up)

New cards

Make sure to end with semi colon

New cards

What is data?

Facts and figures that represent data (comes in various forms such as images, video, text, audio, video)

New cards

What is big data?

Extremely large data sets that can be used to analyse computationally to reveal patterns/trends in behaviour (like consumer purchasing patterns)

New cards

What are the characterisitcs of big data?

Volume - large amounts of data too big to fit on a single server
Velocity - data has to be accessed an processed quickly (a query to repsonse needs to be about 1ms)
Variety - in many different formats (like strucutred CSVs, plain text, video, images etc)
Veracity - data is quality, accurate and trustworthy (reliable to use for analysis)

New cards

What is a key issue with big data analysis?

The lack of structure (millions of sources) mean that it is hard to extract important data

New cards

Where is big data used?

Hospitals - analyse to predict illnesses, treatments, A&E demand
Education - identify at risk students, improve curriculum
Weather predictions - better

New cards

What is data warehousing? Issue?

A central repository where data is stored from multiple sources and in multiple formats.

It is time consuming and expensive to build, design, and maintain

Data may become obselete or inaccurate

New cards

What do data warehouses do to improve efficiency?

Optimise data for querying and reporting for predictive analysis

New cards

What is datamining?

Looking for patterns and information within large data sets (identify relationships between different sets of data). This requires large amounts of storage, processing power, and complex algorithms so can be expensive.

New cards

What are the main purposes of data mining?

To categorise data and give it context
To model data in a way that can be used for predictions

New cards

What is predictive analysis?

The process of using data held in a data warehouse ahnd apply statistical methods and algorithms to predict the likelihood of an event happening

New cards

How is predictive analysis different to data mining?

Data mining looks at relationships between data (correlation, etc) and identifies patterns. Whereas pred analysis makes predictions with that data

New cards

What is a distributed database

Data is stored across multiple different sites/locations but to the user it appears as one

New cards

What are benefits of distributed systems?

Reduce data each server has
Improve performance (faster local access)
Queries handled locally (less data sent on network so faster)
Scalable (paritions can be added)
Added security as sensitive data can be stored in one location

New cards

Drawbacks of distributed systems

Can be more expensive with more servers
If a server fails then data may be lost (more servers to fail)
Can be complex to handle transactions

New cards

What is a data transaction?

A single database operation such as insertion/deletion. It must be completed as one action in a distributed system as errors can occur (like transaction in payment system)

New cards

What is a two phase algorithm?

Used in distributed systems to ensure transactions are properly handled. (Agreements are met before a commitment)

Phase 1: Prepare

Phase 2: Commit

New cards

What is the steps of Phase 1?

One site acts as a coordiantor
The coordiantor sends a message to all participating sites to find out if a transaction can be done
Each participant
1. Tries to peform the operation (like inserting data)
2. Votes to commit if there are no issues)
3. Votes abort if an issues occurs
The coordiantor collects all votes

New cards

What is Phase 2 steps?

If all participants vote to commit the coordiantor will send a message to every server. Then each participant commits to the transaction
If any particiapnt votes abort the coordiator sends a message to every server to abort. Then all participants undo the action

New cards

What is distributed processing?

Work (processing) of data is shared accross several application servers instead of one single machine doing it all. (A database server will store required data and an application server will communicate with the database server to process data when required)

New cards

How can application servers be used?

Multiple applications servers can use the same database server to improve speeds (like one server handling an transaction and one server updating stock for the same database)

New cards

Benefits of separating data from processing?

Application servers have smaller storage requirements
Server redundancy + disaster recovery - if the database goes down it can be replaced by a backup server without changes to the application server
Additional security - application servers dont contain data

New cards

What are the 3 database level?

Application

Conceptual

Physcial (Schema)

New cards

What is the application level?

It defines how data is presented to individual users. Different users can have different views of data depending on access rights and needs.

e.g. Student VS Teacher a grade spreadsheet

New cards

What is the conceptutal level?

It defines the structure and organisation of data for the whole database. It describes what data exists and how it is related.

-Focus is on design oof database (relationships, tables, fields)

-ERD, Data dicitionaries are used to design this level

New cards

What is the physical level?

Deals with storage of data (How and Where data is stored on disk or memory (sever)).

This isnt concerned with meaning just efficiency and performance

New cards

What is data independance?Example?

You can change the way data is stored or organised in a database (modifying schema/physical) without affecting how it is used (if we modify collumns location user view shouldnt change and application server programs dont need to be rewritten if structure changes)

Changing one view in a 3 view system (attendence, grades, payslips) should not affect the other views so users wouldnt notice chages

New cards

When are conceptual level changes required?

modifications of the conceptual (logical) level are necessary whenever the conceptual structure of the databse is changed

New cards

What is a database view?

A virtual table that looks and behaves like a table but doesnt store the data. It just shows data from a query