Database Systems: SQL Developer, Relational Models, and Query Logic
SQL Developer Environment and Academic Expectations
Access and Permissions in SQL Developer
A common point of confusion for students is why tables do not initially appear when browsing in Oracle SQL Developer. The reason is that tables are associated with specific user accounts. In a personal database account, a user is free to create, drop (), or modify any table they own. However, students do not have permission to destroy or modify shared tables that everyone must access for assignments.
To resolve this, shared tables are associated with a separate administrative account rather than the student's personal account. Students must browse and query these tables from that external schema. There is significant metadata that can be queried within Oracle to identify accessible tables. For example, a query for user_tables may return zero results if the user has not created personal tables, but querying all tables in the accessible schema will reveal the necessary datasets.
Performance and Result Sets
When a query is executed in SQL Developer, the interface often displays a message such as " rows fetched." This does not indicate that the table only contains rows; rather, it is a performance optimization. The database identifies a "result set" (all results matching the query), but only fetches a limited batch ( rows) at a time. As the user scrolls down, the interface continues to fetch more rows. This is critical because queries on large databases could potentially return rows, and loading them all at once would be inefficient and taxing on system resources.
Methodology for Learning SQL
SQL is a primary programming language, and like Java, it requires extensive practice. One cannot simply memorize every possible program or query; instead, students must learn to formalize problems into SQL syntax. The provided assignments include expected answers verified against the database, allowing students to test their queries and ensure they return the correct tuples. Success in the course requires a cycle of "code, code, code" and "debug, debug, debug."
Fundamental Concepts of the Relational Model
Formal Terminology
In formal database language, the standard terms used in industry have mathematical counterparts:
Relation: The formal term for a Table.
Attribute: The formal term for a Column.
Tuple: The formal term for a Row.
Relation Schema: A logical structure defined by a set of attributes ().
Relation Instance: A snapshot of the database containing the actual values/tuples at a specific moment.
Attribute Domains and Atomic Values
Every attribute must have a defined domain (similar to a variable type in Java). A primary restriction in the relational model is that values must be atomic. This means a value cannot be composed of multiple distinct parts (e.g., a single attribute should not contain a first name, last name, and Social Security Number simultaneously; these should be separate attributes).
The Concept of Null Values
A null value () signifies that the value for a specific attribute is unknown or does not exist. The semantics of null are business-specific. For example, a in a GPA column might mean the student hasn't developed a GPA yet, or a in a name field might represent a data entry error. Databases handle nulls differently than common programming languages; they are not merely "uninitialized" variables but carry specific semantic meaning tied to the business model.
Ordering of Relations
By definition, relations are unordered. The order in which tuples appear in a table display is merely a representation of how the hardware has stored the data. Users should never count on a specific order unless they explicitly enforce one using an ORDER BY clause in their query.
The Hierarchy and Function of Database Keys
Keys are used to identify tuples within a relation and to enforce integrity constraints.
Super Keys and Candidate Keys
Super Key (): A set of one or more attributes that, taken together, allow us to identify uniquely a tuple in the relation. For example, in an
instructortable, a Social Security Number () is a super key because it identify a single person. However, a combination of () is also a super key, even though the name is redundant.Candidate Key: A minimal super key. This is a super key for which no proper subset is also a super key. If we take the super key () and remove the name, leaving only (), and that ID is still sufficient to identify the tuple, then () is the candidate key.
Primary Keys and Identity
The Primary Key is the specific candidate key chosen by the database designer to identify tuples. This selection is crucial for enforcing Integrity Constraints. For instance, if is the primary key, the database will physically prevent the insertion of two instructors with the same . This ensures data consistency even if an external application attempts to input duplicate or "crappy" data.
Foreign Keys and Relationships
Foreign Keys are attributes in one table that reference the primary key of another table.
Purpose: They link entities across the database (e.g., linking an instructor to their department).
Constraint: A foreign key must point to a primary key in the referenced table. If a foreign key pointed to a non-unique attribute (like department name where multiple departments could share a name), the relationship would be ambiguous.
Integrity: They prevent orphaned data. You cannot add an instructor to a "Physics" department if that department does not exist in the
departmenttable.
SQL Data Definition (DDL) and Manipulation (DML)
Data Definition Language (DDL)
DDL allows for the creation and modification of the database schema. Key domains include:
char(n): A fixed-length string of length . If the input is shorter than , it is padded with blanks.varchar(n): A variable-length string with a maximum length of . This is generally preferred to save database space, especially in large datasets ( entries).int: Typical integer values.
Creating a Table
Syntax example: CREATE TABLE table_name (attribute1 domain1, attribute2 domain2, ..., PRIMARY KEY (attribute));
Critical Note: Always terminate SQL statements with a semicolon (). Failing to do so is a common cause of query errors ( of student questions).
Data Manipulation Language (DML)
DML allows for the modification of the data within the schema:
INSERT INTO relation VALUES (...);: Adds a new tuple. String values must be enclosed in single quotes (' ').DELETE FROM relation;: Clears all tuples from a table but keeps the table structure in the schema.DROP TABLE relation;: Completely removes the table and its definition from the database.ALTER TABLE relation ADD attribute domain;: Adds a new column. New columns will default to for existing entries.
Constructing Queries and Relational Logic
The Basic SELECT Structure
A standard query follows the SELECT-FROM-WHERE block:
SELECT: Corresponds to the Projection operation. It specifies which attributes to return. Using
SELECT DISTINCTremoves duplicate entries.FROM: Specifies the relations (tables) to be scanned. Listing multiple tables without a where clause results in a Cartesian Product, which creates every possible combination of rows between the tables (e.g., rows), which can crash a server.
WHERE: Corresponds to the Selection predicate. It filters the results based on specific conditions (e.g.,
WHERE department_name = 'Comp. Sci.').
Joining Tables
Joining is the process of connecting two tables using their shared attributes (usually a foreign key and a primary key).
Logical Approach:
SELECT * FROM instructor, department WHERE instructor.dept_name = department.dept_name;This allows the retrieval of information spread across multiple tables, such as finding the budget of the department where a specific instructor works.
Queries as Set Operations
Complex queries should be visualized as operations on sets (Intersections, Unions, and Memberships). Instead of trying to write a complex query all at once, students should:
Identify the individual sets (e.g., the set of all instructors, the set of all teaching assignments).
Determine how these sets interact (e.g., finding the intersection of instructors and those in the 'Art' department).
Combine these components logically.
Questions & Discussion
Question (Alyssa): What are the symbols and in the slides? - Response: represents the super key, which is a set of attributes. represents the relation. The expression implies that the set of attributes in the key must be a subset of the total attributes in the relation schema.
Question (Roxanna): Is a candidate key normally just one entry (attribute)? - Response: While often small, it depends on the business logic. In a
takestable, the key might require five attributes (, , , , ) to ensure a student cannot be registered for the same section twice in the same semester.Question (Amanda): If we query departments from the
instructortable vs. thedepartmenttable, is the result the same? - Response: No. Queryinginstructoronly shows departments that currently have at least one instructor. Queryingdepartmentshows all departments defined in the system, even those with no employees.Question (Anna): Does a
SELECTquery add new columns to the actual table? - Response: No. ASELECTquery is for retrieval only. It spits out a temporary table as a results representation; it never modifies the underlying data structure.Question (Aj): What is the concept used to connect tables? - Response: The concept is called Joining. It uses foreign keys to navigate relationships between different relations.