45 Hours
5 hours
15 hours
10 hours
Computers store data, which can be numeric, text, Boolean, etc., but has no intrinsic meaning to humans.
Data must be interpreted to produce information.
Data becomes information when it is put into a context that gives it meaning.
Example: The numbers 32, 23, 11, 08, 40, 17 are data. They become information when given context:
Home and away scores for 6 soccer teams
Temperatures in Celsius for 6 cities
Ages of the last 6 people through the Eiffel Tower turnstiles
Information = Data + Context
Databases technically store data, not information, but the terminology is used loosely with the assumption that stored data is interpreted within an appropriate context.
Data | Information |
---|---|
Raw facts | Data with context |
No context | Processed data |
Just numbers and/or text | Value-added to data: Summarized, organized, analyzed |
Information System (IS): Combination of information technology and people's activities to support operations, management, and decision-making.
An IS is a group of components that interact to produce information.
Databases are a component within an information system.
Components of Information Systems:
Processes
People
Hardware
Software
Network Communications
Data (Stored in databases)
Saving huge amounts of data.
Sharing data.
Problems with not using a database and instead using spreadsheets or text files:
Size of Data: Spreadsheets are fine for small data, but millions of records slow them down significantly.
Ease of Updating Data: Multiple people can't edit the same file simultaneously.
Accuracy: Data entry errors are possible due to lack of validation.
Security: Text files and spreadsheets are not secure.
Redundancy: Duplication of data is possible, leading to accuracy issues.
Incomplete Data: Not all data might be entered due to a lack of validation, affecting data integrity.
A relational database stores and provides access to related data points.
Based on the relational model, representing data in tables (relations).
Each row is a record/tuple with a unique ID called the Primary key.
Columns/fields hold attributes, establishing relationships among data points.
DataBase Management System (DBMS): System software for creating and managing databases, providing users and programmers a systematic way to create, retrieve, update, and manage data.
Relational Database Management System (RDBMS): A DBMS based on the relational model. Data is stored in tables, which are broken up into fields.
Fields are the columns in a table.
A record is a row of data in a table.
Feature | DBMS | RDBMS |
---|---|---|
Data Storage | Data as files. | Data in tabular form. |
Data Organization | Hierarchical or navigational form. | Tables with primary keys. |
Normalization | Not present. | Present. |
Security | No specific security for data manipulation. | Integrity constraints for ACID properties. |
Relationships | No inherent relation between tables. | Relationships are stored in the form of tables. |
Data Access | Requires uniform methods to access stored information. | Supports tabular structure and relationships for data access. |
Distributed Database Support | Not supported. | Supported. |
Scalability | For small organizations and small data, supports single user. | Designed for large amounts of data, supports multiple users. |
Examples | File systems, XML, etc. | MySQL, PostgreSQL, SQL Server, Oracle, etc. |
Inventory of data elements in a database or data model, with detailed descriptions of its format, data ownership, relationships, meaning, source, and usage.
A syntax similar to a programming language used to define databases and modify data.
DDL commands can create (CREATE), remove (DELETE or DROP), or modify (UPDATE) tables.
DDLs in database applications are a subset of SQL.
Importance:
Helps define the database structure or schema.
DDL commands are interpreted by the DBMS, so changes are saved permanently.
A database model determines the logical structure of a database, including relationships and constraints.
The relational model (table-based) is the most popular.
Results in:
Reduced costs
Higher quality
Clearer scope
Fewer errors
Managed risk
One-to-one
One-to-many
Many-to-many
ER Diagram basics
ER Diagram stands for Entity Relationship Diagram, also known as ERD
displays the relationship of entity sets stored in a database
explains the logical structure of databases
created based on three basic concepts: entities, attributes and relationships.
Why use ER Diagrams?
Helps define terms related to entity relationship modeling.
Preview of how all your tables should connect, what fields are going to be on each table.
Helps to describe entities, attributes, relationships.
ER diagrams are translatable into relational tables which allows you to build databases quickly.
ER diagrams can be used by database designers as a blueprint for implementing data in specific software applications.
The database designer gains a better understanding of the information to be contained in the database with the help of ERP diagram.
ERD Diagram allows you to communicate with the logical structure of the database to users.
ER Diagrams Symbols & Notations Following are the main components and its symbols in ER Diagrams:
Rectangles: represents entity types
Ellipses : Symbol represent attributes
Diamonds: represents relationship types
Lines: It links attributes to entity types and entity types with other relationship types
Primary key: attributes are underlined
An entity set with only simple attributes will require only one table in a relational model:
The attributes of the entity will be the fields/columns of the table.
The key attribute of the entity will become the primary key of the table.
Components of the ER Diagram
Entity: Person, place, object, event or concept about which data is to be maintained
Defines the numerical attributes of the relationship between two entities or entity sets.
1. One-to-One Relationships
2. One-to-Many Relationships
3. Many to One Relationships
4. Many-to-Many Relationships
When a single instance of an entity is associated with a single instance of another entity then it is called one to one relationship. For example, a person has only one passport and a passport given to one person.
When a single instance of an entity is associated with more than one instances of another entity then it is called one to many relationship. For example - a customer can place many orders but an order cannot be placed by many customers.
When more than one instances of an entity is associated with a single instance of another entity then it is called many to one relationship. For example - many students can study in a single college but a student cannot study in many colleges at the same time.
When more than one instances of an entity is associated with more than one instances of another entity then it is called many to many relationship. For example, a can be assigned to many projects and a project can be assigned to many students.
Case 1: Binary relationship with cardinality ratio 1:1 Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’.
Way-01:
AR ( a1 , a2 , b1 )
B ( b1 , b2 )
Way-02:
A ( a1 , a2 )
BR ( a1 , b1 , b2 )
Case 2: Binary relationship with cardinality ratio 1:m Here, two tables will be required.
A ( a1 , a2 )
B ( a1 , b1 , b2 )
Case 3: Binary relationship with cardinality ratio m:1 Here, two tables will be required.
A( a1 , a2 , b1 )
B ( b1 , b2 )
Case 4: Binary relationship with cardinality ratio m:n Here, three tables will be required.
A ( a1 , a2 )
R ( a1 , b1 )
B ( b1 , b2 )
Note that a1,b1 in R represent a composite Primary key and that R is a joint table.
In a one-to-one relationship, one record in a table is associated with one and only one record in another table. For example, in a school database, each student has only one student ID, and each student ID is assigned to only one person.
In a one-to-many relationship, one record in a table can be associated with one or more records in another table. For example, each customer can have many sales orders.
A many-to-many relationship occurs when multiple records in a table are associated with multiple records in another table. For example, a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers.
To avoid this problem, you can break the many-to-many relationship into two one-to-many relationships by using a third table, called a join table. Each record in a join table includes a match field that contains the value of the primary keys of the two tables it joins. (In the join table, these match fields are foreign keys.)
A typical example of a many-to many relationship is one between students and classes. A student can register for many classes, and a class can include many students.
Converting ER Diagrams to Tables in a RDBMS
After designing an ER Diagram,
ER diagram is converted into the tables in relational model.
This is because relational models can be easily implemented by RDBMS like MySQL , Oracle etc.
Following rules are used for converting an ER diagram into the relations/tables.
To set up a join table for a many-to-many relationship:
Data validation
Checking input data to ensure it conforms with the data requirements of the system to avoid data errors.
Data verification
Ensuring the user types in what he or she intends, verifying they don't make a mistake when inputting data (e.g., typing email twice).
Why use data validation and data verification?
Checks for correctness, meaningfulness, and security of data that are input to the system.
Access Rights: Controlled by PINs and passwords, granting read-only, read, or write access.
Audit trails: Records of who accessed data and what changes are made.
Data Locking: Locking protocols for concurrency control.
Data Validation
Data encryption: encryption algorithm transforms data within a database from a readable state into a ciphertext of unreadable characters.
Backups: Clean copy of all data.
Logical unit of work independently executed for data retrieval or updates.
Must exhibit four properties (ACID) to qualify as a transaction.
Defined as operating or occurring at the same time; running parallel
Database management systems (DBMS) concept that is used to address conflicts with simultaneous (same time) accessing or altering of data that can occur with a multi-user system.
Concurrency control, when applied to a DBMS (Database management system), is meant to coordinate simultaneous transactions while preserving data integrity.
If two or more users try to update the contents of a database simultaneously, locks and partitions are put into place to prevent it. Thus enabling greater concurrency.
Atomicity: Either all operations in the transaction happen, or none do.
Consistency: Database must be consistent before and after the transaction
Isolation: Multiple transactions occur independently without interference.
Durability: Changes of a successful transaction occurs even if the system failure occurs.
Query Functions:
Requesting data or information from a database table.
Filtering information to display specified data.
Update Functions:
Modifying existing records in a table.
Data Dictionary (Tool) Management
data storage management
data transformation and presentation
multi-user access control
data integrity management
database communication interfaces
security management
transaction management
Data Definition Language (Tool)
backup and recovery management
Schema refers to the organization of data as a blueprint of how the database is constructed.
Conceptual Schema → External View/Level
Logical Schema → data model (ER)
Physical Schema → Internal Level (Data Dictionary)
Conceptual/External Schema
Describes structure of the whole database.
Describes what data are to be stored and the relationships among the data.
Programmers and database administrators work at this level.
Logical Schema:
Defines the structure of the data itself and the relationships between the various attributes, tables, and entries.
Internal/Physical Schema:
Describes the physical storage structure of the database.
Describes how the database is stored in computer memory.
Used to describe complex low-level data structures in detail.
Table
Record
Field
Primary key
Secondary key
Foreign key
Candidate key
Composite primary key
Join
Data Integrity
Data Reliability
Space
Time
Normalization:
Process of reorganizing data in a database so that it meets two basic requirements:
1. There is no redundancy of data, all data is stored in only one place.
2. Data dependencies are logical, all related data items are stored together.
Important for many reasons, but chiefly because:
1. It allows databases to take up as little disk space as possible,
2. resulting in increased performance.
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Tables in 1NF must adhere to some rules:
Each cell (data item) must contain only a single (atomic) value, that cannot be broken down any further.
Every column (field) in the table must be uniquely named.
Each row/record is unique and has a primary key, there are no records with repeating data.
2NF Rules:
Tables in 2NF must be in 1NF
Non-key attributes must depend on every part of the primary key (full functional dependency). Partial dependencies are NOT allowed.
3NF Rules:
Tables in 3NF must be in 2NF
There are no non-key attributes that depend on another non-key attribute (no transitive functional dependencies)
*
* 1NF has no repeating rows or columns.
* 2NF is based on full functional dependency.
* 3NF involves the removal of transitive dependencies
What is referential integrity?
refers to the relationship between tables
Why is it important in a normalized database?
Because each table in a database must have a primary key, this primary key can appear in other tables because of its relationship to data within those tables.
When a primary key from one table appears in another table, it is called a foreign key.
Prevents the entry of duplicate data
Prevents one table from pointing to a nonexistent field in another table.
Guaranteed consistency between "partnered" tables
Prevents the deletion of a record that contains a value referred to by a foreign key in another table
Prevents the addition of a record to a table that contains a foreign key unless there is a primary key in the linked table
a database can lead to incomplete data being returned, usually with no indication of an error. This could result in records being “lost” in the database, because they’re never returned in queries or reports.
10 hours
Database administrators (DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.
Each end-user has a defined set of jobs/functions to perform in an organization. These jobs and functions usually access multi-tables in the database. The raw view of the data is seldom required by end-users. In fact the raw data needs to be protected at all times from accidental damage. However the end-user is most likely to be using a given set of pre-defined queries on a regular basis. These queries usually end up as reports for middle or higher management and are used in the various decision making processes of the company. Therefore the interaction of a given end-user with a database is pre-defined by their role in the company and based on their role they will have a given ‘view’ of the database. Examples of methods of interaction with the DB:
GUI
API
QBE: Query By Example
SQL
Disaster Recovery
Partial Recovery
Catastrophic Recovery
High Availability
Crash Recovery: DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system should follow some sort of algorithm or techniques to recover lost data.
To see where the problem has occurred, we generalize a failure into various categories, as follows: System Crash, Disk failure and Transaction failure.
external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure. Examples may include operating system errors.
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently. Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.
*
Storage Structure categories:
* Volatile storage
* Non-volatile storage
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt.
*
Reasons for a transaction failure could be:
* Logical error
* System error
*
When a system crashes, it may have several transactions being executed and various files opened for them to modify the data items.
1 Checkpoint
2 Page Shadowing
3 Log Files
CheckPoint (cost & efficiency)
Page Shadowing (cost & efficiency)
Log Files (cost & efficiency)
Data integration involves combining data from various sources and providing users with a unified view of these data in such a way that it is productive to their role in the organization. Commercial companies and scientific organizations are two examples which require data integration.
Centralized Control 2. Data Integration 3. Improved Efficiency 4. Enhanced Data Security
Outline the use of databases in areas such as stock control, police records, health records, employees data Databases can be used in these situations because they can help maintain the privacy and security of the information stored in them. And they can help maintain the integrity of the data so it is consistent and doesn't change.
Stock control
Police records
Health records
Employee data
Suggest methods to ensure the privacy of the personal data and the responsibility of those holding personal data not to sell or divulge it in any way.
Making regular backups of files
Protecting yourself against viruses by running anti-virus software
Using a system of passwords so that access to data is restricted
Safe storage of important files stored on removable disks, eg locked away in a fireproof and waterproof safe
Allowing only authorized staff into certain computer areas
Secure Data Transmission: Use secure communication protocols, such as HTTPS or VPNs, to encrypt data during transmission. This prevents interception or eavesdropping by unauthorized entities.
Avoiding accidental deletion of files by write-protecting disks.
Using data encryption techniques to code data so that it makes no apparent sense.
Data Volume: Large database systems handle vast amounts of data, which can present challenges in terms of storage, processing, and retrieval. Managing and analyzing such large volumes of data requires efficient infrastructure and optimized database design.
Performance Considerations: As databases grow in size, the performance of queries, data retrieval, and data manipulation operations can be impacted. Ensuring efficient indexing, query optimization, and appropriate hardware resources is crucial for maintaining optimal performance.
Scalability: Large database systems need to be scalable to accommodate increasing data volumes and user demands. Scaling may involve adding more hardware resources, such as storage or processing power, or implementing distributed database architectures.
Data Complexity: Large databases often contain diverse and complex data structures, including multiple tables, relationships, and data types. Handling and managing this complexity requires advanced database management systems and skilled database administrators.
Data Security: The security of large databases becomes critical due to the significant amount of sensitive data they store. Implementing robust security measures, such as access controls, encryption, and intrusion detection systems, is essential to protect against unauthorized access and data breaches.
The Data Protection Act controls how your personal information is used by organisations, businesses, or the government.
They must make sure the information is:
used fairly and lawfully
used for limited, specifically stated purposes
used in a way that is adequate, relevant and not excessive
accurate
kept for no longer than is absolutely necessary
handled according to people’s data protection rights
kept safe and secure
not transferred outside the European Economic Area without adequate protection
Privacy Concerns
Data Breaches and Security Risks
Data Ownership and Control
Data Integrity and Trust
Informed Consent
It is important in this day and age, where sudden terrorist activities can completely curtail the lives and livelihoods of large numbers of people that organizations such as National Security Agencies, International Police forces and other governmental agencies directly involved with the security of the borders and nationals of any given country to be able to monitor ‘trends’ in activities of the nationals of that country. This implies that the databases for such organizations must be open for questioning and investigations for the purposes of national security.
the process of comparing two pieces of data by directly matching them against each other, to find any data belonging to the same entity, meaning duplication. An algorithm is used to determine whether two records are a match, a non-match or possibly a match. Data-matching can be useful for a number of purposes:
To find particular types of errors
To detect illegal behavior
To locate individuals
To find out if a given individual is eligible for benefits
To facilitate debt collection
the process of searching through large data sets to find patterns and relationships in the data. It is mainly used in marketing and sales to analyze the patterns of customers and common behavior. For example banks are the first users of data mining technology as it helps them with credit assessment.
Some of the uses of Data-mining within different contexts are:
To enable organizations to create successful and effective sales campaigns
To help companies with targeted marketing plans
To develop products to increase sales and profitability
Security Agencies to analyze calling patterns and hence detect terrorist activities