1/51
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Selection
Select rows that satisfy a condition
Projection
Select specific columns from a relation.
Union
Combine results from two relations, removing duplicates
Difference
Return rows in one relation but not in another
Cartesian Product (×)
Combine every row of one relation with
every row of another
Rename
rename the relation or attributes
Normalization
evaluating and correcting table structures to minimize data
redundancies
• Reduces data anomalies
• Assigns attributes to tables based on determination
Denormalization
produces a lower normal form
• Results in increased performance and greater data redundancy
first normal form (1NF)
table format, no repeating groups, and PK identity
second normal form (2NF)
1NF and no partial dependencies
third normal form (3NF)
2NF and no transitive dependencies
Partial dependency
functional dependence in which the determinant is only
part of the primary key
• Assumption: one candidate key
• Straight forward
• Easy to identify
transitive dependency
attribute is dependent on another attribute that is not
part of the primary key
• More difficult to identify among a set of data
• Occur only when a functional dependence exists among nonprime attributes
functional dependence
The attribute B is fully functionally dependent on the attribute A
if each value of A determines one and only one value of B.
Example: PROJ_NUM S PROJ_NAME (read as PROJ_NUM
functionally determines PROJ_NAME)
In this case, the attribute PROJ_NUM is known as the
determinant attribute, and the attribute PROJ_NAME is known
as the dependent attribute
function dependence (generalized definition)
Attribute A determines attribute B (that is, B is functionally
dependent on A) if all (generalized definition) of the rows in the
table that agree in value for attribute A also agree in value for
attribute B.
functional dependence (composite key)
If attribute B is functionally dependent on a composite key A but
not on any subset of that composite key, the attribute B is fully
functionally dependent on A.
relational model
• store related data in tables
• require a schema which defines tables prior
to use
• encourage normalization to reduce data
redundancy
• support table JOINs to retrieve related data
from multiple tables in a single command
• implement data integrity rules
• provide transactions to guarantee two or
more updates succeed or fail as an atomic
unit
• can be scaled (with some effort)
• use a powerful declarative language for
querying
• offer plenty of support, expertise and tools.
document model
• store related data in JSON-like, name-value
documents
• can store data without specifying a schema
• must usually be denormalized so
information about an item is contained in a
single document
• should not require JOINs (presuming
denormalized documents are used)
• permit any data to be saved anywhere at
anytime without verification
• guarantee updates to a single document —
but not multiple documents
• provide excellent performance and
scalability
• use JSON data objects for querying
• are a newer, exciting technology.
Conceptual Design Steps
1 Data analysis and requirements
2 Entity relationship modeling and normalization
3 Data model verification
4 Distributed database design
Conceptual data model:
describes main data entities, attributes, relationships, and
constrains
- Designed as software and hardware independent
Minimum data rule
All that is needed is there, and all that is there is needed
Conceptual Data Model
highly abstract
easily understood
easily enhanced
only “entities” visible
abstract relationships
no software tool is required to define a conceptual data model
Data analysis and requirements
• Designers efforts are focused
- Information needs, users, sources and constitution
• Answers obtained from a variety of sources
- Developing and gathering end-user data views
- Directly observing current system: existing and desired output
- Interfacing with the systems design group
Entity relationship modeling and normalization
All objects (entities, attributes, relations, views, and so on) are defined in a data
dictionary, which is used in tandem with the normalization process
Developing the Conceptual Model Using ER Diagrams
1 Identify, analyze, and refine the business rules
2 Identify the main entities, using the results of Step 1
3 Define the relationships among the entities, using the results of Steps 1 and 2
4 Define the attributes, primary keys, and foreign keys for each of the entities
5 Normalize the entities (remember that entities are implemented as tables in an RDBMS)
6 Complete the initial ER diagram
7 Validate the ER model against the end users’ information and processing requirements
8 Modify the ER model, using the results of Step 7
Data model verification
• Verified against proposed system processes
• Run through a series of tests
Module
information system component that handles specific business function
cohesivity
strength of the relationships among the module’s entities
module coupling
strength of the relationships among the module’s entities
low coupling decreases unnecessary intermodule dependencies
The ER Model verification Process
1 Identify the ER model’s central entity
2 Identify each module and its components
3 Identify each module’s transaction requirements:
• Internal: updates/inserts/deletes/queries/reports
• External: module interfaces
4 Verify all processes against the module’s processing and reporting
requirements
5 Make all necessary changes suggested in Step 4
6 Repeat Steps 2–5 for all modules
Distributed Database design
• Portions of database may reside in different physical locations
• Database fragment: subset of a database stored at a given location
• Ensures database integrity, security, and performance
Validates logical model
• Using normalization
• Integrity constraints
• Against user requirements
Logical Design Steps
1 Map the conceptual model to logical model components
2 Validate the logical model using normalization
3 Validate the logical model integrity constraints
4 Validate the logical model against user requirements
Logical Data Model
presence of attributes for each entity
key attributes
non key attributes
primary key - foreign key relationships
user friendly attribute names
more detailed than conceptual model
database agnostic
bit more effort required to enhance, in comparison to conceptual model
data modeling tools like ERWin or PowerDesigner can be used to create Logical Data Models. This can be automatically converted to a Physical Data Model with the help of these tools.
Physical Data Model
entities referred to as Tables
attributes referred to as Columns
Database compatible table names
Database compatible columns names
Database specific data types
Difficult for users to understand
Significantly more effort required to enhance in comparison to Logical Model
Will include indexes, constraints, triggers & other DB objects
Difficult to port to a different database, once design is finalized
Tools like ERWin and PowerDesigner can help in automatically porting over the Logical Data Model to Physical Data Models of different versions.
Mapping the Conceptual Model to the Relational Model
1 Map strong entities
2 Map supertype/subtype relationships
3 Map weak entities
4 Map binary relationships
5 Map higher-degree relationships
Process of data storage organization and data access characteristics of the
database; ensures integrity, security, and performance
• Define data storage organization
• Define integrity and security measures
• Determine performance measures
Top-down design
starts by identifying the data sets and then defines the data elements for each of those sets
• Involves the identification of different entity types and the definition of each entity’s attributes
Bottom-up design
first identifies the data elements (items) and then groups them together in data sets
• First defines attributes, and then groups them to form entities
Centralized design
process by which all database design decisions are carried out centrally by a small group of people
• Suitable in a top-down design approach when the problem domain is relatively small, as in a single unit or department in an organization
Decentralized design
process in which conceptual design models subsets of an
organization’s database requirements, which are then aggregated into a complete design
• Such modular designs are typical of complex systems with a relatively large number of objects and procedures
Distributed database management system (DDBMS)
– Governs storage and processing of logically related data
– Interconnected computer systems
– Both data and processing functions are distributed among several sites
• Centralized database required that corporate data be stored in a single central site
DDBMS Advantages and Disadvantages
Advantages:
– Data are located near “greatest demand” site
– Faster data access
– Faster data processing
– Growth facilitation
– Improved communications
– Reduced operating costs
– User-friendly interface
– Less danger of a single-point failure
– Processor independence
Disadvantages:
– Complexity of management and control
– Security
– Lack of standards
– Increased storage requirements
– Increased training cost
Distributed processing
– Database’s logical processing is shared among two or more physically independent sites
– Connected through a network
Distributed Database
– Stores logically related database over two or more physically independent sites
– Database composed of database fragments
Characteristics of Distributed Management Systems
• Application interface
• Validation
• Transformation
• Query optimization
• Mapping
• I/O interface
• Formatting
• Security
• Backup and recovery
• DB administration
• Concurrency control
• Transaction management
• Must perform all the functions of centralized DBMS
• Must handle all necessary functions imposed by distribution of data and processing
– Must perform these additional functions transparently to the end user
DDBMS Components
• Must include (at least) the following
components:
– Computer workstations
– Network hardware and software
– Communications media
– Transaction processor (application
processor, transaction manager)
• Software component found in each computer that
requests data
– Data processor or data manager
• Software component residing on each computer that stores and retrieves data located at the site
• May be a centralized DBMS
Levels of Data and Process Distribution
Current systems classified by how process distribution and data distribution supported
Single-Site Processing, Single-Site Data (SPSD)
• All processing is done on single CPU or host computer (mainframe, midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of system
• Typical of most mainframe and midrange computer DBMSs
• DBMS is located on host computer, which is accessed by dumb terminals connected to it
Multiple-Site Processing, Single-Site Data (MPSD)
• Multiple processes run on different computers sharing single data repository
• MPSD scenario requires network file server running conventional applications
– Accessed through LAN
• Many multiuser accounting applications, running under personal computer network
Multiple-Site Processing, Multiple-Site Data (MPMD)
• Fully distributed database management system
• Support for multiple data processors and transaction processors at multiple sites
• Classified as either homogeneous or heterogeneous
• Homogeneous DDBMSs
– Integrate only one type of centralized DBMS over a network
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs over a network
• Fully heterogeneous DDBMSs
– Support different DBMSs
– Support different data models (relational, hierarchical, or network)
– Different computer systems, such as mainframes and microcomputers
Distributed Database Transparency Features
• Allow end user to feel like database’s only user
• Features include:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
• Allows management of physically dispersed
database as if centralized
• Three levels of distribution transparency:
– Fragmentation transparency
– Location transparency
– Local mapping transparency
• Ensures database transactions will maintain distributed database’s integrity and consistency
• Ensures transaction completed only when all database sites involved complete their part
• Distributed database systems require complex mechanisms to manage transactions
– To ensure consistency and integrity