2.5 Databases and distributed systems

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/65

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

66 Terms

1
New cards

What are databases?

Organised collections of data, which allow for easy storage, management, and retrieval of data

2
New cards

What is it important for databases to have?

Data redudancy - The same piece of information is stored multiple times in different rows (This wastes storage, reduces integrity, and increases chance of inconsistencies as data is changed)

Data consistency - Data is accurate in all stored areas (Data doesnt conflict as it gets updated which would lead to unreliability and poor integrity of data)

Data integrity - Accuracy and reliability of data (no errors/duplicates/missing data) (Unreliabilitys can spread reducing integrity and can lead to communication errors)

3
New cards

What is the structure of a  flatfile database

  • Data is stored in rows and columns

  • Each row represents one records

  • Each column is a field

  • All data is in one table

4
New cards

What is referal integrity?

A validation method for databases that is used when linking tables a record is required to exist in both. (For example primary and foreign keys and these prevent records being added that are not part of the relationship)

It ensures connecting data exists and you cannot delete data that is connected to other data

5
New cards

What are relational databses

They were introduced to solve the problem with flatfile

They have multiple tables linked together with primary keys

Characterisitcs

-Entities (each table in the database)

-Primary key (unique identifier for each record)

-Foreign key (these keys link tables)

-Composite key (Primary key of 2 or more fields)

No two records can have the same primary key

6
New cards

What are benefits of electronic databases?

  • Easier to add/delete/modify data

  • Data is easily backed up/copied

  • Acessibile by multiple people in diff locations at the same time

  • Less mistakes (spelling and copying error etc)

  • Storage is smaller physcially

7
New cards

What are types of database relationships?

One-to-One (Like person to student ID as one field represents one other field)

One-to-Many (Like School to student

Many-to-Many (Like student to classes but will create reduancies so will need to add a new table)

8
New cards

Whats an ERD

Shows how data is organised with entities, atribute and relationships (Like a shop can have many sales)

<p>Shows how data is organised with entities, atribute and relationships (Like a shop can have many sales)</p>
9
New cards

What is indexing

Used to speed up data retrieval by indexing a column by associating an index with certain characters (Like names starting with T)

10
New cards

Pros and cons of indexing?

+Faster retrieval

+Efficient sorting/filtering

+Improve performance

-Overheads

-More storage

-Needs updating with changes

11
New cards

What is normalistation?

Organising data in a relational database to reduce redudancy and improve data integrity.

There are 3 Levels (1NF, 2NF, 3NF)

12
New cards

What is 1NF and its Rules

1s Normalised Form.

It ensures data in a table is organised clearly and efficiently so each value is atomic (cannot be split)

Rules:
-Each column contains 1 value (no lists)

-Each record is unique (each row has primary key)

-Each field is dependant on the primary key

13
New cards

What is 2NF and its rules

2nd normalised form.

Rules:

-It is already in 1NF

-All-non key attributes depend on the whole primary key (Not on a composite)(For example spliting an order table with each product having a product ID and its details so you dont have to repeat them)

14
New cards

What is 3NF Rules

-Must be in 2NF

-Every non-key attribute depends on the primary key and not a non-key attribute

This removes repeated data by creating new tables

<p>-Must be in 2NF</p><p>-Every non-key attribute depends on the primary key and not a non-key attribute</p><p></p><p>This removes repeated data by creating new tables</p>
15
New cards

What is a data dictionary?

A document or table that describes the structure of a database (contains information about how data should be)

Its a reference that says:

-What data exists

-Where it exists

-How its used

16
New cards

What will a data dictionary store for each field?

-Field name

-Data type

-Size

-Constraints (Like if its a key and if it should contain letter ‘s’)

17
New cards

Why are data dictionaries important?

It outlines rules to allow for accuracy and consistency of data.

18
New cards

How is validation used in a database?

It is designed into a data dictionary when creating the table. Restrctions can be set during declaration with constraints

19
New cards

How would i create a new column in a table?

You would:
-Set the data type (like an int would reject a string)

-Reject null values (NOT NULL or instead just keep them)

-Set length checks (truncate data)

20
New cards

How would I format check in a database?

The database does not do that so it needs to be separately implemented

21
New cards

What is data redudancy?

When there are duplicate data in different tables (this compromises consistency as data may not be updated everywhere and also wastes storage)

22
New cards

What is data consistency?

The accuracy of data (all occurances of the same field data are the same value)

23
New cards

What are the benefits of a relational database?

  • Avoids redundancy

  • Easily edited

  • Supports complex queries

  • Consistnet records

  • Data is easily added and deleted

24
New cards

What is SQL

Structures Query Language. Is a general purpose language used to manipulate data in databases and tables

25
New cards

Create table query?

make sure to define NOT NULL, primary keys, foreign keys ,etc

With structure:

<name><data type (char(50) or int)><Primary key/Foreign key><NOT NULL (optional)>

<p>make sure to define NOT NULL, primary keys, foreign keys ,etc</p><p></p><p>With structure:</p><p>&lt;name&gt;&lt;data type (char(50) or int)&gt;&lt;Primary key/Foreign key&gt;&lt;NOT NULL (optional)&gt;</p>
26
New cards

What is numeric data type for create table

numeric(x,y)

x is total digits

y is digits to right of decimal (use 2 for money)

27
New cards

Insert Into [Table] Values[]

knowt flashcard image
28
New cards

select query

knowt flashcard image
29
New cards

where… IN

e.g. WHERE name IN (‘Pete’,’steve’)

instead of where name == pete

30
New cards

AND

where name = ‘peter’ and age = 12

31
New cards

OR

Similar to and but can allow one or both conditions

32
New cards

ORDER BY

ASC

DESC

33
New cards

BETWEEN

BETWEEN ID = 1 AND ID = 2

34
New cards

UPDATE query

knowt flashcard image
35
New cards

DELETE query

knowt flashcard image
36
New cards

Join query

can use where instead of join on

<p>can use where instead of join on</p>
37
New cards

GROUP BY [Field];

You would use this to group data that is associated (such as associating all the cash payments associated with one person and grouping it by cash payments to add all up)

38
New cards

Make sure to end with semi colon

39
New cards

What is data?

Facts and figures that represent data (comes in various forms such as images, video, text, audio, video)

40
New cards

What is big data?

Extremely large data sets that can be used to analyse computationally to reveal patterns/trends in behaviour (like consumer purchasing patterns)

41
New cards

What are the characterisitcs of big data?

  • Volume - large amounts of data too big to fit on a single server

  • Velocity - data has to be accessed an processed quickly (a query to repsonse needs to be about 1ms)

  • Variety - in many different formats (like strucutred CSVs, plain text, video, images etc)

  • Veracity - data is quality, accurate and trustworthy (reliable to use for analysis)

42
New cards

What is a key issue with big data analysis?

The lack of structure (millions of sources) mean that it is hard to extract important data

43
New cards

Where is big data used?

  • Hospitals - analyse to predict illnesses, treatments, A&E demand

  • Education - identify at risk students, improve curriculum

  • Weather predictions - better

44
New cards

What is data warehousing? Issue?

A central repository where data is stored from multiple sources and in multiple formats.

It is time consuming and expensive to build, design, and maintain

Data may become obselete or inaccurate

45
New cards

What do data warehouses do to improve efficiency?

Optimise data for querying and reporting for predictive analysis

46
New cards

What is datamining?

Looking for patterns and information within large data sets (identify relationships between different sets of data). This requires large amounts of storage, processing power, and complex algorithms so can be expensive.

47
New cards

What are the main purposes of data mining?

  • To categorise data and give it context

  • To model data in a way that can be used for predictions

48
New cards

What is predictive analysis?

The process of using data held in a data warehouse ahnd apply statistical methods and algorithms to predict the likelihood of an event happening

49
New cards

How is predictive analysis different to data mining?

Data mining looks at relationships between data (correlation, etc) and identifies patterns. Whereas pred analysis makes predictions with that data

50
New cards

What is a distributed database

Data is stored across multiple different sites/locations but to the user it appears as one

51
New cards

What are benefits of distributed systems?

  • Reduce data each server has

  • Improve performance (faster local access)

  • Queries handled locally (less data sent on network so faster)

  • Scalable (paritions can be added)

  • Added security as sensitive data can be stored in one location

52
New cards

Drawbacks of distributed systems

  • Can be more expensive with more servers

  • If a server fails then data may be lost (more servers to fail)

  • Can be complex to handle transactions

53
New cards

What is a data transaction?

A single database operation such as insertion/deletion. It must be completed as one action in a distributed system as errors can occur (like transaction in payment system)

54
New cards

What is a two phase algorithm?

Used in distributed systems to ensure transactions are properly handled. (Agreements are met before a commitment)

Phase 1: Prepare

Phase 2: Commit

55
New cards

What is the steps of Phase 1?

  1. One site acts as a coordiantor

  2. The coordiantor sends a message to all participating sites to find out if a transaction can be done

  3. Each participant

    1. Tries to peform the operation (like inserting data)

    2. Votes to commit if there are no issues)

    3. Votes abort if an issues occurs

  4. The coordiantor collects all votes

56
New cards

What is Phase 2 steps?

  1. If all participants vote to commit the coordiantor will send a message to every server. Then each participant commits to the transaction

  2. If any particiapnt votes abort the coordiator sends a message to every server to abort. Then all participants undo the action

57
New cards

What is distributed processing?

Work (processing) of data is shared accross several application servers instead of one single machine doing it all. (A database server will store required data and an application server will communicate with the database server to process data when required)

58
New cards

How can application servers be used?

Multiple applications servers can use the same database server to improve speeds (like one server handling an transaction and one server updating stock for the same database)

59
New cards

Benefits of separating data from processing?

  • Application servers have smaller storage requirements

  • Server redundancy + disaster recovery - if the database goes down it can be replaced by a backup server without changes to the application server

  • Additional security - application servers dont contain data

60
New cards

What are the 3 database level?

Application

Conceptual

Physcial (Schema)

61
New cards

What is the application level?

It defines how data is presented to individual users. Different users can have different views of data depending on access rights and needs.

e.g. Student VS Teacher a grade spreadsheet

62
New cards

What is the conceptutal level?

It defines the structure and organisation of data for the whole database. It describes what data exists and how it is related.

-Focus is on design oof database (relationships, tables, fields)

-ERD, Data dicitionaries are used to design this level

63
New cards

What is the physical level?

Deals with storage of data (How and Where data is stored on disk or memory (sever)).

This isnt concerned with meaning just efficiency and performance

64
New cards

What is data independance?Example?

You can change the way data is stored or organised in a database (modifying schema/physical) without affecting how it is used (if we modify collumns location user view shouldnt change and application server programs dont need to be rewritten if structure changes)

Changing one view in a 3 view system (attendence, grades, payslips) should not affect the other views so users wouldnt notice chages

65
New cards

When are conceptual level changes required?

modifications of the conceptual (logical) level are necessary whenever the conceptual structure of the databse is changed

66
New cards

What is a database view?

A virtual table that looks and behaves like a table but doesnt store the data. It just shows data from a query