CS 122A - Database Design

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/77

flashcard set

Earn XP

Description and Tags

midterm one practice flashcards

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

78 Terms

1
New cards

Databases

Databases represent the “product” of a larger area called “Data management”. A database is typically a collection of large sets of data which contain relationships between the data stored.

2
New cards

DBMS

A database management system that provides standards for inputting and retrieving data, as well as the organization. It has a standard method of querying and can often manage multiple databases. 

Enables access retrieval, and use of data with relationships. Optimized for better performance. Data protection and crash recovery. (In comparison to file systems)

3
New cards

File systems

Store data as separate entities with no relationship between them. Files are stored on disk. As size increase, access time increases. Data integrity issues. (In comparison to DBMS)

4
New cards

Why DBMS?

Data independence (application is protected from complexity of the data)

Efficient data access

Integrity and security

Uniform data administration (better organization, fine-tuning…)

Concurrent access and recovery

Reduced development time

5
New cards

Data model

A collection of high-level data description constructs.

6
New cards

semantic data model

Abstract, high-level model initially describing the information to store and is the starting point for design.

7
New cards

Relational data model

Schema to describe the data in attributes, type, size. 

Most widely used data model. 

8
New cards

Relation

A table with rows of data and named columns

9
New cards

Schema

Description of the tables and their columns which also contains the name of the table, name of each column, and the type of field. 

Ex:
Student (student ID: string, First name: string, Last name: string)
Provider (
provider ID: string, Provider name: string)
hasProvider (
student ID: string, provider ID: string, Type of plan: string, Start date: date)

10
New cards

DBMS Abstraction

External (view based access) → Conceptual (describes how data is organized) → Physical (how the data is stored through files and indexes)

<p>External (view based access)  → Conceptual  (describes how data is organized) → Physical (how the data is stored through files and indexes)</p>
11
New cards

Data independence

Logical data independence → protection from the changes in the logical structure of data

Physical data independence → protection from the changes in the physical structure of data. 

Why is this important? It allows changes to data without impact on application. 

12
New cards

DML

Data manipulation language encompasses queries. It defines creation, modification, and querying of the data. 

13
New cards

Concurrency & Recovery

Isolation, independent, arbitrary. 

Write ahead log → all changes are written to a log prior to DB modification.

14
New cards

Transaction

A single execution of a user program in a DBMS. Enables concurrent accesses to data, giving the illusion of isolation and locking. Protects against system crashes using the log.

15
New cards

DBMS Components

Query Parser → parse and analyze SQL query. Performs validity check. 

Query optimizer → rewrites the query logically based on how the data is stored. 
Plan executor + Relational Operators → actual run of the query process. Query plan is represented as a tree of relational operators.
Files of records → API to access records which are sets of typed fields. Uses a heap. 

Access methods → Index structure using a B+ Tree. 

Buffer Manager - memory management in which all disk page access go through the buffer pool. Caches pages from files and indices.  

Disk space manager → allocates, deallocates, and read and write pages to disk. Manages the I/O. 

Concurrency Control → Determines the transaction execution using ACID (lock = consistency[valid data] + isolation[transactions aren’t interleaved], log = atomicity[single operation] and durability[record of changes])

Recovery Manager → maintains logs and restores the system after a crash.

System catalog → maintains the information being held. e.g: tables, data statistics, indexes, views, security profiles. 

16
New cards

Requirements Analysis

What are the entities and relationships in the enterprise?

What information about the entities and relationships needs to be stored in the DB?

What are the integrity constraints (preserve the validity and consistency of your data) or business rules? 

17
New cards

Database Design

Conceptual Database Design → high level description of the data to be stored in the DB along with the constraints on the data (ER Model)

Logical Database Design → translation of the Conceptual Database Design into the chosen DBMS (relational DB)

18
New cards

Entity

A distinguishable real-world object which are described by a set of attributes. 

E.g: Car → make, model, year, color

19
New cards

Entity set

A collection of similar entities. All entities in a set have the same set of attributes with an associated domain type (data type). Each entity set has a key, the minimal set of attributes which uniquely identify an entity. 

single attribute = “atomic” key

set of attributes = “composite” or “candidate” key. In which case, one must be designated as the “primary” key.

20
New cards

Name of entity set

Represented in a rectangle

21
New cards

Attribute of entity set

oval, each with a unique name.

22
New cards

Key attribute of entity set

Underline the attribute name

23
New cards

Composite atttributes

Attribute containing sub attributes

24
New cards

Multi-valued attributes vs. single valued

An attribute that can have more than one value for an entity. Represented by a double lined oval. Tiny circle indicates optional participation and can only be used on multi-valued attributes. 

25
New cards

Derived attributes

Calculated value based on other data. Represented by a dotted oval.

26
New cards

ISA hierarchies

Inherited attributes → “IS A”. Uses a triangle. Design with a top-down or bottom-up style.

Every C entity is also considered a D entity.

Ex: C ISA D

27
New cards

Covering constraint

Does every entity be one of the subclasses?

If yes, then every entity must be one of the subclasses.

If no, then can have entities of class and subclass.

E.g: User IS A Student, TA, Professor. If the user has to be one of those, then it is covering.

28
New cards

Overlapping constraint

Can the subclasses contain the same entities?

If yes, then the subclasses overlap each other. 

If no, then it is implicitly disallowed without without specification. 

Ex: A student can also be a TA. 

29
New cards

Relationship (a verb)

An association between two or more entities. Represented by a diamond.

<p>An association between two or more entities. Represented by a diamond. </p>
30
New cards

Relationship set

Collection of similar relationships. Can have their own associated descriptive attributes. 

31
New cards

Entity roles in a relationship set

An entity set can participate in different relationship sets - or in different “roles” in the same set. Role indicators are used to identify the specific entities.

E.g: professor_uid, TA_uid

<p>An entity set can participate in different relationship sets - or in different&nbsp;“roles” in the same set. Role indicators are used to identify the specific entities. </p><p>E.g: professor_uid, TA_uid</p>
32
New cards

Cardinality Constraints

Limit the possible combinations of entities which participate in a relationship set

E.g: 1:1, 1:N, N:N, N:M

A.k.a: one to one, many to 1, many to many

Diagram stuff: A line indicates a relationship. The value on the line indicates the cardinality. A single line indicates partial participation meaning that not all entities in the entity set are involved in the relationship. A double line indicates a participation constraint, totality, or subjectivity. All entities in the entity set must participate in at least one relationship in the relationship set. 

33
New cards

Ternary Relationships

Relationship between 3 entity sets. The relationship key <= entity keys. 

34
New cards

Weak entites

can not exist without another (owner) entity. Can only be uniquely identified along with the primary key of the owner entity. Must have a total participation with a 1:N relationship to the owner (one owner, many weak entities). Dependent identifier is unique only within the context of the owner, the full key is the set (owner key, identifier key).

35
New cards

Aggregation

Ability to create a relationship with another relationship set. Treats a relationship set as an entity set. Represented as by a dotted-rectangle. Why not ternary? → each relationship has its own attributes

36
New cards

Design choices

Should a given concept be modelled as an entity or as an attribute?
Should a given concept be modelled as an entity or as a relationship?

Characterizing relationships: binary or ternary? aggregation? 

37
New cards

Constraints on model

Many data semantics can be captured, but not all. E.g: a course can have at most 5 homework assignments. E.g: the enrollment of the course must be less than the location capacity. 

38
New cards

Entity vs Attribute?

Will the data item have multiple occurrences per entity?

If yes, use a multi-valued attribute or an entity.

Does the data item have a structure of its own & do you need access to these parts?

If yes, use a composite attribute or an entity. 

Is the data item a refer-able entity in the design, e.g: logically separate?

If yes, use entity. 

39
New cards

entity vs. relationship

Keep total # of entities and relationships to a minimum to reduce the relational model. Eliminate redundancy - avoid storing the same information in multiple places. 

40
New cards

binary vs. ternary

Depends on how specific you’d like the database to be. E.g: there are many discussions and TAs per course. Maybe if the TA leads a specific discussion, then a binary may be better.

However, binary is not always better. E.g: supplier parts. There may not be a way to track quantity in the example given. 

<p>Depends on how specific you’d like the database to be. E.g: there are many discussions and TAs per course. Maybe if the TA leads a specific discussion, then a binary may be better. <br><br>However, binary is not always better. E.g: supplier parts. There may not be a way to track quantity in the example given.&nbsp;</p>
41
New cards

Relational Database in depth

Contains a set of relations. A relation, aka a table, consists of two parts. The table, and the schema.

A row is called a tuple and should be a unique entry. columns are fields and attributes. Cardinality of a relational instance = number of tuples/rows. Degree (or arity) of a relational instance = number of columns.

42
New cards

SQL Query Language

SELECT, FROM, WHERE….

Each field/attribute specifies a domain/type. DBMS will enforce it. 

43
New cards

Creating a table

CREATE TABLE User (
uid CHAR(10),

name CHAR(20),

phone INTEGER,

gpa REAL ← double but with single point decimals for closer approximations
);

44
New cards

Dropping table

DROP TABLE User;

All tuples are deleted. schema is deleted. 

45
New cards

Altering table

ALTER TABLE User

ADD enter DATE;

User schema is altered to include a new field/column. Existing tuples will obtain column with null value assigned. 

46
New cards

Insert single tuple

optional to specify column names. Advised to specify to ensure proper placement of data.

INSERT INTO User

(uid, name, phone, gpa, enter)

VALUES 

(…,…,….,…,…);

47
New cards

Delete tuples

DELETE FROM User S

WHERE S.gpa <= ‘3.0’

48
New cards

SQL Integrity Constraints

Conditions that must be true for any instance of the database:

  • Restricts the data that can be stored in the database

  • Can be specified by the requirements documentation

  • Specified when the schema is defined

  • Checked when relations are modified

A legal instance of the relation is one that satisfices all ICs

  • A DBMS checks for these and should not allow illegal instances

49
New cards

Primary Key Constraints

Candidate key - a set of attributes for which no two distinct tuples can have the same values in all key fields and no subset of the selected fields is a unique identifier for the tuple → minimal set of attributes. 

one of the keys chosen is to be the primary key. The others are candidate keys.

A super key is a set of fields which contain a key. 

in principle, any key can be used to refer to a tuple. use of the primary key is what the DBMS expects and optimizes for. But be careful, since an IC could accidently prevent data from being stored. 

Possible to have many candidate keys along w/ a primary key. Specified using UNIQUE - means the items specified in UNIQUE are also a key. E.g: UNIQUE(cid, grade)

50
New cards

SQL - Foreign Key

Sets of fields in one relation is used to “refer” to a tuple in another relation. Must refer to the primary key of the other relation. Field name can be different than the key name. Acts as a pointer. 

E.g: FOREIGN KEY (user_id) REFERENCES User(uid)

51
New cards

SQL - Referential integrity

Each foreign key entry MUST exist in the associated relation.

  • e.g: reject an insert of a tuple if the foreign key doesn’t exist

If a User tuple is deleted, what do you do?

  • Can remove all associated foreign key reference tuples.

  • Disallow deletion of the User because it is referred to by other data.

    • Set the user_id of Enrolled tuples to a “default uid” value. 

What if a primary key of a tuple in User is updated?

  • Same options as if deleted

4 Actions:

  • NO ACTION → delete or update is rejected

  • CASCADE → delete all tuples which refer to the tuple which is being deleted

  • SET NULL or SET DEFAULT → sets the foreign key value of the referring tuple accordingly. 

  • Context: ON DELETE ___ or ON UPDATE ____

52
New cards

More SQL ICs

ICs are based on the specification of the system

  • Might be in the ER schema

Can perform a check on a database instance to see if ICs are violated

  • However, cannot infer by looking at an instance that an IC is true.

  • An IC is a statement about ALL POSSIBLE instances. (this can change if someone inserts another tuple that violates the IC essentially. The law hasn’t been broken yet. )

Key & foreign keys ICs are the most common type

  • Others are also supported.

53
New cards

Delta table approach

create a relation for each entity

  • every user created in user relation table

  • relation for each subclass with extra associated attributes

  • delete entries in subclass relation when tuple in base is removed. 

54
New cards

ISA Hierarchies for Logical DB design

Union of tables → create a relation for each entity, duplicate the base attributes in each of the subclass relations

Mashup table → everything in one table. 

55
New cards

Advanced Attributes for SQL Query Language

Multi-valued attributes

  • Create a table for this

Composite attributes

  • Create attributes for each sub-attribute only

Derived 

  • requires querying the data

56
New cards

Aggregation - Logical db design

No distinction between entity tables and relationship tables in the relationship model.

Treat a relation table as an entity to another relationship.

Requires utilization the primary key of the created relation as a component of the primary key for the relationship.

assignes (cid, title, posted_on)

monitors (cid, title, uid, until)

57
New cards

Schema refinement

Identifying and eliminating redudancy using ICs. There are three main types of constraints: functional dependencies, multivalued dependencies, join dependencies

58
New cards

Redundancy problem

Redundant storage

  • Repeated storage of the same data

Update anomalies

  • Reduces the data integrity and complicates maintenance with the need to update many places

Insertion Anomalies 

  • Complicates storage of data by requiring unnecessary data storage

Deletion Anomalies

  • Loss of data when unrelated or other information is removed

59
New cards

Functional dependencies

can help to identify problem schemas and suggest improvements

60
New cards

Decomposition reason

replacement of a relation with a collection of ‘smaller’ regions

each smaller relation includes a subset of the attributes, together they cover them all. 

needs to be used with reason and without negative impact. each functional dependency is a statement about all allowable relations. 

61
New cards

Reflexivity

If Y is a subset of X, then X → Y. Eg. Given all attributes {A,B,C,D}, Y = AB and X = ABD, then ABD AB

62
New cards

Augmentation

If X→Y, then XZ → YZ. Eg. Given all attributes {A,B,C,D},  if AB C, then ABD CD

63
New cards

Transitivity

If X→Y and Y→Z, then X→Z.

Eg. Given all attributes {A,B,C,D}, if AB C and C D then AB  D

64
New cards

Union

If X Y and X Z, then X YZ

65
New cards

Decomposition

If X YZ, then X Y and X Z

66
New cards

Self-determination

X X based on Reflexivity (Y X, then X Y)

67
New cards

Input

An instance of a relation. Schema of input relation are fixed

68
New cards

output

An instance of a relation. Schema for the result is also fixed. 

69
New cards

Sigma (selection)

selects of subset of rows from relation

70
New cards

Projection (pi)

Extracts columns from a relation

71
New cards

Cross product

Combines fields of 2 relations into a single relation (cartesian product)

72
New cards

Self-difference

returns only the tuples in the first relation, but not in the second relation

73
New cards

Union

all tuples in the first and second relation or both

74
New cards

intersection

only tuples which appear in both relations

75
New cards

join

cross-product of 2 relations on a condition

76
New cards

Division

identifying which are related to all members of an attribute set

77
New cards

Renaming (rho)

Assign new field names to resultant relations

78
New cards

Explore top flashcards

Gatsby vocab
Updated 629d ago
flashcards Flashcards (40)
PMCY 4500L - Exam 2
Updated 384d ago
flashcards Flashcards (208)
Ettmmology #17&18
Updated 1082d ago
flashcards Flashcards (40)
Analogy
Updated 936d ago
flashcards Flashcards (39)
MODULE 4
Updated 22d ago
flashcards Flashcards (38)
Physics Quiz 1
Updated 791d ago
flashcards Flashcards (59)
Gatsby vocab
Updated 629d ago
flashcards Flashcards (40)
PMCY 4500L - Exam 2
Updated 384d ago
flashcards Flashcards (208)
Ettmmology #17&18
Updated 1082d ago
flashcards Flashcards (40)
Analogy
Updated 936d ago
flashcards Flashcards (39)
MODULE 4
Updated 22d ago
flashcards Flashcards (38)
Physics Quiz 1
Updated 791d ago
flashcards Flashcards (59)