IM
IINFORMATION MANAGAMENT
PRELIM
DATA - Raw facts, or facts that have not yet been processed to reveal their meaning to the end user.
INFORMATION – The result of processing raw data to reveal its meaning. Information consists of transformed data and facilities decision making.
DATABASE - is a shared, integrated computer structure that houses a collection of the End-user-data and Metadata
END–USER-DATA – is a raw fact of interest to the end user.
METADATA - or data about data, through which the end-user data is integrated and managed.
DATA BASE MANAGEMENT SYSTEM (DBMS) - is a collection of programs that manages the database structure and controls access to the data stored in the database.
Advantages of DBMS:
• Improved data sharing - The DBMS serves as the intermediary between the user and the database. The database structure itself is stored as a collection of files, and the only way to access the data in those files is through DBMS.
• Improved data security - The more users access the data, the greater the risks of data security breaches. A DBMS provides a framework for better enforcement of data privacy and security policies.
• Better data integration - Wider access to well-managed data promotes an integrated view of the organization’s operations and a clearer view of the big picture.
• Minimized data inconsistency - Data inconsistency exists when different versions of the same data appear in different places.
• Improved data access - The DBMS makes it possible to produce quick answers to ad hoc queries.
• Improved decision making - Better-managed data and improved data access make it possible to generate better- information, on which better decisions are based
• Increased end-user productivity - The availability of data combined with the tools that transform data into usable information
TYPES OF DATABASES
• Single-user database – A type of database that supports only one user at a time.
• Desktop database – A single user database that runs on a personal computer.
• Multiuser database – A type of database that supports multiple users at the same time.
• Workgroup database – A type of database that supports a relatively small number of users or a specific department within an organization.
• Enterprise database – A type of database that is used by the entire organization and supports many users across many departments.
• Centralized database – A type of database that supports data located at a single site.
• Distributed database – A type of database that supports data distributed across several different sites.
• Cloud database – A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS.
• General-purpose database – A database that contains a wide variety of data used in multiple disciplines.
• Discipline-specific database – A type of database that contains data focused on specific subject areas.
• Operational database – A type of database designed primarily to support a company's day-to-day operations.
• Analytical database – A type of database focused primarily on storing historical data and business metrics used for tactical or strategic decision making.
Importance of Database Design
Database Design - refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data
Oftentimes the database design does not get the attention it deserves. This can occur for numerous reasons such as:
• Insufficient specifications and/or poor logical data modeling
• Not enough time in the development schedule
• Too many changes occurring throughout the development cycle
• Database design assigned to, or performed by novices
The first step in constructing a physical database should be transforming the logical design using best practices. The transformation consists of the following:
• Transforming entities into tables
• Transforming attributes into columns
• Transforming domains into data types and constraints
• Transforming relationships into primary and foreign keys
• Lengthy development times – The first and most glaring problem with the file system approach is that even the simplest data-retrieval task requires extensive programming.
• Difficulty of getting quick answers – The need to write programs to produce even the simplest reports makes ad hoc queries impossible.
• Complex system administration – System administration becomes more difficult as the number of files in the system expands.
• Lack of security and limited data sharing – Another fault of a file system data repository is a lack of security and limited data sharing.
• Extensive programming – Making changes to an existing file structure can be difficult in a file system environment.
Structural dependence – A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs.
Structural independence – A data characteristic in which changes in the database schema do not affect data access.
Data dependence – A data condition in which data representation and manipulation are dependent on the physical data storage characteristics.
Data independence – A condition in which data access is unaffected by changes in the physical data storage characteristics.
Data redundancy – It exists when the same data is stored unnecessarily at different places. Uncontrolled data redundancy sets the stage for the following:
• Poor data security – Having multiple copies of data increases the chances for a copy of the data to be susceptible to unauthorized access.
• Data inconsistency – Data inconsistency exists when different and conflicting versions for the same data appear in different places.
• Data-entry errors – Data-entry errors are more likely to occur when complex entries are made in several different files or recur frequently in one or more files.
• Data integrity problems – It is possible to enter a nonexistent sales agent's name and phone number into the Customer file, but customers are not likely to be impressed if the insurance agency supplies the name and phone number of an agent who does not exist.
Data Anomalies
• A data abnormality - in which inconsistent changes have been made to a database.
• A data anomaly - develops when not all of the required changes in the redundant data are made successfully.
Database design - focuses on how the database structure will be used to store and manage end-user data.
Data Modeling - the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain.
Basic building blocks for data models are the following:
• Entity – It is a person, place, thing, or event about which data will be collected and stored.
• Attribute – It is a characteristic of an entity.
• Relationship – It describes an association among entities.
o Three (3) types of relationships:
• One-to-one (1:1) relationship
• One-to-many (1:M) relationship
• Many-to-many (M:M) relationship
Hierarchical Model
- It was developed in the 1960s to manage large amounts of data for complex manufacturing projects.
- The model's basic logical structure is represented by an upside-down tree. It contains levels, or segments.
- Segment is the equivalent of a file system's record type.
Network Model
- It was created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard.
o Schema – It is the conceptual organization of the entire database as viewed by the database administrator.
o Subschema – It defines the portion of the database by the application programs that produce the desired information from the data in the database.
o Data Manipulation Language (DML) – It defines the environment in which data can be managed.
o Data Definition Language (DDL) – It allows the database administrator to define the schema components.
Relational Model
- It was introduced in 1970 by E. F. Codd of IBM.
- The relational model represented a breakthrough for both users and designers.
- The foundation of mathematical concepts is known as a relation.
Entity Relationship Model
- It was introduced in 1976 by Peter Chen.
- The graphical representation of entities and their relationships in a database structure quickly became popular, because it complemented the relational data model concepts.
Object-Oriented Model
- Increasingly complex real-world problems demonstrated a need for a data model that more closely represented the real world. In the Object-Oriented Data Model (OODM), both data and its relationships are contained in a single structure known as an object. In turn, the OODM is the basis for the Object-Oriented Database Management System
(OODBMS).
- The OODM is said to be a semantic data model because it indicates meaning.
- The Object-Oriented Data Model is based on the following components:
o An object is an abstraction of a real-world entity
o Attributes describe the properties of an object.
o Objects - that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (methods).
o Classes - are organized in a class hierarchy. The class hierarchy resembles an upside-down tree in which each class has only one parent.
o Inheritance - is the ability of an object within the class hierarchy to inherit attributes.
UML - is a language based on Object-Oriented concepts that describes a set of diagrams and symbols you can use to graphically model a system.
Extensible Markup Language (XML) – A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements.
NoSQL
- It refers to a movement to find new and better ways to manage large amounts of web and sensor-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
- The term seems to have been first used in a computing framework by John Mashey, Silicon Graphics scientist in the 1990s. However, it seems to be Douglas Laney, a data analyst from the Gartner Group, who first described the basic
characteristics of Big Data databases:
o Volume – It refer to the amounts of data being stored.
o Velocity – It refers not only to the speed with which data grows but also to the need to process this data quickly in order to generate information and insight.
o Variety – It refers to the fact that the data being collected comes in multiple different data formats.
Degrees of Data Abstraction
In early 1970s, the American National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC) defined a framework for data modeling based on degrees of data abstraction.
The resulting ANSI/SPARC architecture defines three (3) levels of data abstraction: external, conceptual, and internal
External Model
- It is the end user's view of the data environment.
- It refers to people who use the application programs to manipulate the data and generate information.
- ER diagrams will be used to represent the external views. A specific representation of an external view is known as an external schema.
Conceptual Model
- It represents a global view of the entire database by the entire organization.
- Also known as a conceptual schema, it is the basis for the identification and high-level description of the main data objects.
Internal Model
- It is the representation of the database as "seen" by the DBMS.
- It requires the designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
- Internal schema - depicts a specific representation of an internal model, using the database constructs supported by the chosen database.
Physical Model - operates at the lowest level of abstraction, describing the way data is saved on storage media such as magnetic, solid state, or optical media.