1/42
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are relational algebra and relational calculus
languages for processing and retrieving information from a database
What is the difference between RA and RC
Relational algebra is procedural and relational calculus is declarative
what are the six basic operators of RA
σ - select, noted σp( r ) where p is the selection predicate (using propositional calculus) and r is the schema. only displays rows that meet the selection predicate
Π - project, noted Πa1, a2, a3, … ( r ) where a1, a2, a3, … are the names of attributes (columns) within the schema r
∪ - union, noted r ∪ s where r and s are two different schemas having the same arity/degree (number of attributes) and each attribute deals with the same type of data. displays all rows from each schema, removing any duplicate rows between the two
— - set difference, noted r — s, where r and s are two different schemas having the same degree and attribute domains. displays only the rows that r does not have in common with s
x - cartesian product, noted r x s, where r and s are two different schemas having the disjoint attributes (having no attributes in common). this displays each row of r combined with each row of s. if r has x attributes and m entries, and s has y attributes and n entries, the resulting schema will have x + y attributes and m*n entries
p - rename, noted px( r ) where x is a name and r is a schema, allowing us to rename the schema for later referral
additional operators of RA
∩ - set intersection, noted r ∩ s where r and s are different compatible schemas. returns all entries/tuples that are in both schemas. equivalent to r — (r — s)
⋈ - natural join, noted r ⋈ s where r and s are different schemas, having certain attributes of the same variable type. returns a product of both r and s containing only entries where the matching attributes have the same value for both r and s, making only one column where there was two
/ - division, noted r/s where r and s are different schemas where for each attribute of s r has an attribute of the same type, as well as some extra attributes. if A are attributes unique to r and B are attributes shared by s and r, this displays all the entries of A where Br = Bs, while not displaying B
← - assignment, noted x ← E where x is a variable name and E is an RA expression. used to simplify the appearance of complex queries
extended operators of RA
generalized projection - noted ΠF1, F2, F3, …( r ) where F1, F2, F3, … are mathematical functions transforming the values of each attribute they might involve. this does not look into other rows, but only operates on the values within one row at a time
aggregation functions - noted G1, G2, G3, … G F1(A1), F2(A2), F3(A3), … ( r ) where r is a schema, Gx are the attributes the schema is being grouped by, and Fx(Ax) are functions being performed on the entries in the Ax attribute within these groups. Groups collect entries where that attribute is the same for each tuple.
examples of aggregate functions: avg, min, max, sum, count
types of joins
types of joins in RA
theta join - noted r ⋈c s, where r and s are schemas with attributes of the same variable type and c is a condition that must be met. displays a schema that is a cartesian product of r and s where all entries satisfy c. equivalent to σc(r x s)
equality join - noted r ⋈c s, a special case of theta join where the condition is an equality dictating what matches for a natural join, equivocating an attribute of r and s
semi-join - noted r ⋉c s, where r and s are the schemas and c is a condition. like a theta join that just returns the matching entries of r
outer-join - an extension of a join that prevents the loss of entries that do not match both schemas. tuples that do not match another schema will have null entries in the place of that schema’s attributes. left outer joins ⟕ preserve only the first schema’s tuples, right only the second’s, and a full outer join preserves all.
priority of RA operators
select, project, rename
cartesian product, join
intersection
union, set difference
complete set of operators RA
you can define every other operation using this set of operators. the basic operators are a complete set
relationally complete RA
according to Codd’s theorem, a language is relationally complete if it is at least as powerful as the complete set
null values in RA
aggregation typically ignores null values, however with boolean logic null values take on an unknown status, which sits between true and false value wise
modifying schemas with RA
deletion - noted with assignment as r ← r — E, where E is an RA expression that captures all the tuples to be removed
insertion - noted with assignment as r ← r ∪ E where E is n expression capturing all the tuples to be added (this will not create duplicate tuples if r already contains them)
updating - noted with assignment as r ← ΠF1, F2, F3…( r ) where Fn is either the original modification or an expression changing the value of a specific attribute in r
views with RA
views are a way of preventing a user from viewing an entire schema, typically defined with projection. the creation of a view means an expression is created and maintained, and substituted into any queries using this view. view A depends directly on view B if view B is used to define it in its expression. view A depends indirectly on view B if view B is used to define a view that A depends either directly or indirectly on. view A is recursive if it depends on itself.
tuple relational calculus
a non-procedural query language where each query is in the form {t | P(t) }, which is the set of all tuples t such that the predicate P is true. t[A] denotes the value of t on attribute A. t ∈ r denotes that t is in the schema r. P is a formula involving predicate calculus. a trc expression is considered safe if every tuple within its relation previously appeared in a relation, tuple, or constant within P.
domain relational calculus
a nonprocedural query language equivalent in power to TRC with each expression in the form { <x1, x2, … xn> | P(x1, x2, … xn)} where x1…n are domain variables (columns) and P is a formula similar to predicate calculus. a drc expression is safe if: all values in the tuples of the expression are from dom(P) (meaning the values either appear in P or in a tuple of a relation mentioned in P), and for every subformula P1(x) the formula is true if and only f there are xs in the dom(P1)
SQL data types
char(n) - character string of fixed length n
varchar(n) - character string of max length n
int - integer
smallint - small integer
numeric(p,d) - fixed point number, with p number of digits, not including 0 for numbers <1, and the d number of digits after the decimal
real, double precision - floating point and double precision floating point numbers
float(n) - floating point number with a precision of n
integrity constraints in sql
not null - preventing an attribute from being null
primary key (Ai… Ak) making attributes Ai - Ak primary keys and not null
foreign key - references r, an attribute referencing the primary key of another schema r
schema modifications in sql
creating a table - defined with CREATE TABLE r(A1D1, … AnDn, i1….in) where r is the name of the schema, Ax is the name of an attribute, Dx is the domain of that attribute, and in is an integrity constraint
insertion into a table - defined with INSERT INTO r VALUES (t1…tn) where r is the schema and tx are the values of the new tuple, and correspond to each attribute in the table
deletion of tuples - defined with DELETE FROM r, removes all tuples from the schema r unless there is a WHERE constraint defining what to delete
deletion of a table - defined with DROP TABLE r which just complete deletes the schema r
adding/removing attributes - ALTER TABLE r ADD/DROP a (d) where r is the schema being updated, a is the attribute being modified, and d is the domain of the attribute if its being added
basic query structure in sql
SELECT [attributes]
FROM [schemas/relations/subqueries]
WHERE [condition]
extra query stuff sql
to eliminate duplicate tuples, you can write DISTINCT after SELECT before listing attributes
similarly, you can write ALL to ensure duplicates are not removed
when listing attributes, you can rename an attribute in the returned vallues by writing A AS X where A is the current name and X is the new name
you can add GROUPED BY after the WHERE clause - this is necessary if you are performing and aggregate function
you can add ORDER BY after the WHERE clause
decomposition of schemas
decomposition is done to avoid repetition of information. a schema is decomposed into multiple schemas. lossy composition occurs when one of the attributes shared by the new schemas has multiple tuples with the same value, so recombining the decomposed schema loses the nuance on those tuples. lossless composition prevents this, which is true if the intersection of the new schemas is the same as one of the new schemas. decomposition can use the dependencies of canonical and attribute cover
normalization theory
transforming a relation into good form as dictated by functional dependencies via lossless decomposition
functional dependency
constraints on data in the real world, used to test legality. the value for a certain set of attributes determines uniquely the value for another set of attributes. ex: the dependency a → b holds on r if and only if whenever two tuples of R agree on attribute a, they also agree on b
legal instance of a relation
an instance of a relation that satisfies all functional dependencies. a legal instance of a database contains only legal instances
superkeys in functional dependencies
a set of attributes that UNIQUELY identifies the entire tuple. K is a superkey of R if and only if K→R. K is a CANDIDATE KEY of R if and only if K→R AND there is no subset of K that is a superkey
closure of a set of functional dependencies
the closure of a set of functional dependencies F+ is the set of all dependencies implied by F
trivial dependencies
a dependency is trivial if it is satisfied by all instances of the relation or A→B if B is a subset of A
dependency preserving
a decomposition is not dependency preserving if the split of the schema makes it computationally hard to enforce dependencies
normal forms of relations
this is what “good” form is based off, there are several kinds but the most relevant are the Boyce-Codd Normal Form and the Third Normal Form.
a schema is in BCNF if for all of its FDs of the form a→b, the FD is trivial or a is a superkey. BCNF does not guarantee dependency preservation.
a schema is in 3NF if for all a→b FDs, the FD is trivial or a is a super key or each attribute in the set difference of b—a is contained in a candidate key
every BCNF relation is also in 3NF, but 3NF is always possible without sacrificing losslessness or dependency preservation. 3NF relations may have to use null values to represent some relationships, and they have repetition of some information
transformations on functional dependencies
union rule - if a→b holds and a→c holds, then a→bc holds
decomposition rule - if a→bc holds, then a→b and a→c also hold
pseudotransivity rule - if a→b holds and cb→d holds, then ac→d holds
closure of attribute sets for functional dependencies
given a set of attributes A, A+ is the set of attributes functionally determined by F. this includes itself, any other set that is dependent on exclusively A (ex: C if A→C but not if AB→C)
canonical cover of FDs
a simplified set of FDs noted as Fc created by removing F+’s extraneous attributes
if A→B holds and B→C holds, then A→C is extraneous
if A→B holds, AC→B is extraneous because A alone already covers B
if the closure of the attribute remains the same after removing a dependency, the dependency is extraneous
minimum cover of FDs
the same as canonical cover, but for each dependency an attribute only implies one other attribute at once (ex A→B and A→C are within the minimum cover but A→BC is not)
multivalued dependency of FDs
represented as A→→B, shows the dependency A→B where B can have a set of values for each A
Fourth Normal Form
a relation is in 4NF with respect to set D if for all MVDs in D+ of the form a→→b, the dependency is either trivial or a is a super key. being in 4NF means a relation is also in BCNF and 3NF
two major branches of structured query language
data definition language - language for defining the structure of a database
data modification language - language for retrieving and modifying data
on delete/update foreign key constrains
cascade - deletes related records with parent record
restrict - prevents deletion of parent record if related records exist
set null - sets the value in related records to null
set default - sets the value in related records to a specified default
set difference in sql
“WHERE NOT EXISTS [schema]” (when defining the schema in here, you can use the where clause to specify what to join on if the items dont have identical domains)
is % or _ for individual unknown string characters in sql
_
first normal form
how to violate - mixed datatypes, no primary key, using row order to convey information, repeating groups across rows
second normal form
1nf + each nonkey attribute must depend on the entire key
third normal form
2nf + no transitive dependencies. “every non-key attribute should depend on the key, the whole key, and nothing but the key”
fourth normal form
3nf + multivalued dependencies can only rely on they key
fifth normal form
4nf + the table cannot logically be thought of as being the result of joining other tables