1/36
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a database?
An organised collection of tables that store data and are linked with primary and foreign keys
Why use databases instead of storing the data in the application?
It could become corrupted in the application so can be stored remotely so can be accessed anywhere even if there is a problem with the original application.
What is a table?
They contain records (store data about related things)
What is a record?
A collection of data about one item/event
What is a field?
A single data item (within a record)
What is a primary key?
A unique identifier (field) in a table
Explain the 2 types of primary key?
Simple = Made of a single field unique to that record
Compound/Composite = A combination of more than one field to form the primary key
What is a foregin key?
A primary key from another table is placed in the desired table to form a link between them. (Any field thats a primary key in another)
Explain the types of validation check?
Character/Type check = Ensure input is correct data type
Length Check = Checks the number of characters to ensure they meet the limit
Range Check = Checks that the numeric values are within a specific range
Format Check = Checks the data conforms to a set of rules
Existence/Presence check = Ensures data exists
Check digit = for certain inputs (e.g. barcode)
Explain the types of verification checks?
Double entry = Enter data twice and compare
Proof reading = a human reading the input manually
What are relationships (for tables)?
The relationships between tables to retrieve data
Desired relationships = 1:m(any) or M:1
Undesired relationships = M:M (confuses data processing systems so won’t work), 1:1 (used rarely to remove sensitive data from tables)
What is data consistency?
The requirement that the data written to a database must conform to the rules of the database and be valid. Ensures all copies of the same data item match across the database
Define Data redundancy
The unnecessary duplication of data within a database
What is Data integrity?
The correctness of data over time (must be complete, accurate and consistent)
Explain the three types of data integrity?
Entity integrity = every table must have a primary key that’s not empty)
Referential integrity = foreign keys must point to a primary key in another table (can be empty)
Domain integrity = all attributes in the database must be related to the overall domain that the data is working on
Wha is the symbol for unnormalised form?
UNF or 0NF
What is normalisation?
The organisation of data into tables which relate to a single entity (database)
State the benefits of data normalisation.
To reduce redundancy, inconsistency and to make it easier to maintain
State the rules of 1NF
it must have a primary key that’s (prerequisite)
Each field name must be unique and no repeating fields
Each field must contain only one piece of data
All attributes in the table must be dependent on the primary key
Rules for 2NF
Is the table 1NF
Partial dependencies must be removed
All non prime attributes must depend on the primary key
Rule for 3NF
is the table 2NF
Remove transitive dependencies
Ensure all non-prime attributes depend on the primary key only
What are transitive dependencies?
Non-primary key attributes that depend on other non-primary key attributes
What are partial dependencies?
Non-primary key attributes depending on only part of the primary key (with composite keys)
Give summary sql commands
SELECT attribute1, attribute2 FROM table
WHERE attribute1=“text”
CREATE TABLE table (ID integer PRIMARY KEY NOT NULL, attribute varchar(20));
INSERT INTO table VALUES(‘id’, ‘attribute1’);
UPDATE table SET attribute1=‘text’ WHERE attribute2=‘text’
Define Data Dictionary
A document used during the systems analysis design process that defines and describes all the data structures and data elements used within a computerised system. It is part of maintenance documentation, providing a clear reference for developers/maintainers.
Describe the attributes in a database found in data dictionaries
Fieldnames = Unique identifiers for each data item
Data types = The type of data being stored e.g. character, real
Keys/Key fields = Identifiers to link records (primary, foreign, composite)
Validation requirements = Rules/routines applied to ensure the data entered into the system is sensible and accurate
Example Data = Representative sample of the actual data that will be stored in that field
Methods of Access = How the data is retrieved (sequential/direct)
Default value = What’s automatically assigned to data field if no input provided by user
Define Database Management System (DBMS), including an example
The middle man between your application and the data you are storing. e.g. SQL Server Management Studio.
It is a tool to help manage your data
Organise files and control the stored data
Set user permissions to control the additional, editing, updating and removal of data
Allows the users to manipulate the data (run queries using a query language: SQL)
Allows concurrent access to the same database by different users
Set security rules to prevent unauthorised access
Utilise processes to back-up the data regularly and recover data (if a problem occurs)
Sets rules to verify/validate data input to improve the integrity of the data
Monitors/optimises database performance/workflow/system resources, by using tools like indexes and query management
Minimises data duplication through normalisation
Managing the data dictionary
Define Big Data
Large volumes of data in structured (e.g. Databases) and unstructured form (e.g. social media). Data sets that are so large that traditional data processing application software is inadequate to deal with the data sets.
State the three main characteristics/problems of Big data
The volume of the data
The velocity of growth (speed at which new data is generated)
The variety of the different formats the data can take
Describe the term Data Warehouse and its uses
It is a large (centralised) collection of historical/archived data from many sources within an organisation
Uses
It can be used for analysis and reporting to support (long-term management) decision making.
Data mining = Process of finding patterns or trends within that archived data
Predictive analysis = Using the patterns found from data mining to forecast future outcomes
Describe Data Mining
It is the process of analysing a large batch of data (usually from data warehouses) to discern trends, by identifying patterns and connections in data based on what information the user requests or provides.
Define predictive analysis
It assigns a probability (predictive score) for the likelihood that something (e.g. a customer) will behave in a certain way. Being used to assess risks, and predict future events.
Describe Distributed Systems
A distributed system is one where data and processing are distributed across multiple nodes/servers/computers in different geographical locations connected via a network. Although the system is physically separate, it appears to the user as a singe logical unit.
Define Data Independence
The separation of data from the programs that use it. So different users can have different views of the same data, and allowing the data structure to be changed without needing to rewrite he programs that access that data.
Define Distributed Database
A single logical database which is physically spread across multiple computers in different geographical locations, where each database provides part of the overall system, which are all connected via a network. Appearing as a single unified system to the user.
Describe 4 types of distributed database
Partitioned = Data divided into distinct sections, with different sections of data being stored in more than one location
Horizontal = Different records are stored in different sites
Vertical = Different fields of the records are stored in different sites (typically different department offices)
Replicated = Each site has its own replica of the whole database
Describe the advantages and disadvantages of a distributed database
Advantages
Resilient = A problem is one site will not stop the other sites from working
Security = Staff access can be limited to only their section of the database
Network Traffic reduced = Lower bandwidth costs
A single site database still works even if the connection between sites brakes
Scaling is quick = If demand increases, adding an additional node to the distributed database is straight forward.
High performance = The queries and updates are largely local to the sites and so no local bottle necking
Can be cheaper, as several cheaper computers can be used rather than one expensive centralised server. No single site needs massive storage or processing capacity, which reduces initial hardware costs for a single location.
Disadvantages
Needs careful planning to ensure the data transmission is sufficient to cope with bandwidth, and data is not needlessly sent
Difficult to prevent duplication, and so difficult to ensure all copies of the data are identical (data inconsistency)
Security difficult to enforce = There are multiple entry points to a distributed system
Difficult to set up = More difficult to design and implement because data must be correctly partitioned/replicated across the different sites, requiring complex software to manage this.