Database Systems: SQL Developer, Relational Models, and Query Logic

SQL Developer Environment and Academic Expectations

Access and Permissions in SQL Developer

A common point of confusion for students is why tables do not initially appear when browsing in Oracle SQL Developer. The reason is that tables are associated with specific user accounts. In a personal database account, a user is free to create, drop ( $\text{destroy}$ ), or modify any table they own. However, students do not have permission to destroy or modify shared tables that everyone must access for assignments.

To resolve this, shared tables are associated with a separate administrative account rather than the student's personal account. Students must browse and query these tables from that external schema. There is significant metadata that can be queried within Oracle to identify accessible tables. For example, a query for user_tables may return zero results if the user has not created personal tables, but querying all tables in the accessible schema will reveal the necessary datasets.

Performance and Result Sets

When a query is executed in SQL Developer, the interface often displays a message such as " $50$ rows fetched." This does not indicate that the table only contains $50$ rows; rather, it is a performance optimization. The database identifies a "result set" (all results matching the query), but only fetches a limited batch ( $50$ rows) at a time. As the user scrolls down, the interface continues to fetch more rows. This is critical because queries on large databases could potentially return $20,000,000$ rows, and loading them all at once would be inefficient and taxing on system resources.

Methodology for Learning SQL

SQL is a primary programming language, and like Java, it requires extensive practice. One cannot simply memorize every possible program or query; instead, students must learn to formalize problems into SQL syntax. The provided assignments include expected answers verified against the database, allowing students to test their queries and ensure they return the correct tuples. Success in the course requires a cycle of "code, code, code" and "debug, debug, debug."

Fundamental Concepts of the Relational Model

Formal Terminology

In formal database language, the standard terms used in industry have mathematical counterparts:

Relation: The formal term for a Table.
Attribute: The formal term for a Column.
Tuple: The formal term for a Row.
Relation Schema: A logical structure defined by a set of attributes ( $A_1, A_2, \dots, A_n$ ).
Relation Instance: A snapshot of the database containing the actual values/tuples at a specific moment.

Attribute Domains and Atomic Values

Every attribute must have a defined domain (similar to a variable type in Java). A primary restriction in the relational model is that values must be atomic. This means a value cannot be composed of multiple distinct parts (e.g., a single attribute should not contain a first name, last name, and Social Security Number simultaneously; these should be separate attributes).

The Concept of Null Values

A null value ( $\text{null}$ ) signifies that the value for a specific attribute is unknown or does not exist. The semantics of null are business-specific. For example, a $\text{null}$ in a GPA column might mean the student hasn't developed a GPA yet, or a $\text{null}$ in a name field might represent a data entry error. Databases handle nulls differently than common programming languages; they are not merely "uninitialized" variables but carry specific semantic meaning tied to the business model.

Ordering of Relations

By definition, relations are unordered. The order in which tuples appear in a table display is merely a representation of how the hardware has stored the data. Users should never count on a specific order unless they explicitly enforce one using an ORDER BY clause in their query.

The Hierarchy and Function of Database Keys

Keys are used to identify tuples within a relation and to enforce integrity constraints.

Super Keys and Candidate Keys

Super Key ( $K$ ): A set of one or more attributes that, taken together, allow us to identify uniquely a tuple in the relation. For example, in an instructor table, a Social Security Number ( $SSN$ ) is a super key because it identify a single person. However, a combination of ( $ID, \text{name}$ ) is also a super key, even though the name is redundant.
Candidate Key: A minimal super key. This is a super key for which no proper subset is also a super key. If we take the super key ( $ID, \text{name}$ ) and remove the name, leaving only ( $ID$ ), and that ID is still sufficient to identify the tuple, then ( $ID$ ) is the candidate key.

Primary Keys and Identity

The Primary Key is the specific candidate key chosen by the database designer to identify tuples. This selection is crucial for enforcing Integrity Constraints. For instance, if $ID$ is the primary key, the database will physically prevent the insertion of two instructors with the same $ID$ . This ensures data consistency even if an external application attempts to input duplicate or "crappy" data.

Foreign Keys and Relationships

Foreign Keys are attributes in one table that reference the primary key of another table.

Purpose: They link entities across the database (e.g., linking an instructor to their department).
Constraint: A foreign key must point to a primary key in the referenced table. If a foreign key pointed to a non-unique attribute (like department name where multiple departments could share a name), the relationship would be ambiguous.
Integrity: They prevent orphaned data. You cannot add an instructor to a "Physics" department if that department does not exist in the department table.

SQL Data Definition (DDL) and Manipulation (DML)

Data Definition Language (DDL)

DDL allows for the creation and modification of the database schema. Key domains include:

char(n): A fixed-length string of length $n$ . If the input is shorter than $n$ , it is padded with blanks.
varchar(n): A variable-length string with a maximum length of $n$ . This is generally preferred to save database space, especially in large datasets ( $1,000,000+$ entries).
int: Typical integer values.

Creating a Table

Syntax example: CREATE TABLE table_name (attribute1 domain1, attribute2 domain2, ..., PRIMARY KEY (attribute));

Critical Note: Always terminate SQL statements with a semicolon ( $;$ ). Failing to do so is a common cause of query errors ( $\approx 10\%$ of student questions).

Data Manipulation Language (DML)

DML allows for the modification of the data within the schema:

INSERT INTO relation VALUES (...);: Adds a new tuple. String values must be enclosed in single quotes (' ').
DELETE FROM relation;: Clears all tuples from a table but keeps the table structure in the schema.
DROP TABLE relation;: Completely removes the table and its definition from the database.
ALTER TABLE relation ADD attribute domain;: Adds a new column. New columns will default to $\text{null}$ for existing entries.

Constructing Queries and Relational Logic

The Basic SELECT Structure

A standard query follows the SELECT-FROM-WHERE block:

SELECT: Corresponds to the Projection operation. It specifies which attributes to return. Using SELECT DISTINCT removes duplicate entries.
FROM: Specifies the relations (tables) to be scanned. Listing multiple tables without a where clause results in a Cartesian Product, which creates every possible combination of rows between the tables (e.g., $1,000,000 \times 1,000,000$ rows), which can crash a server.
WHERE: Corresponds to the Selection predicate. It filters the results based on specific conditions (e.g., WHERE department_name = 'Comp. Sci.').

Joining Tables

Joining is the process of connecting two tables using their shared attributes (usually a foreign key and a primary key).

Logical Approach: SELECT * FROM instructor, department WHERE instructor.dept_name = department.dept_name;
This allows the retrieval of information spread across multiple tables, such as finding the budget of the department where a specific instructor works.

Queries as Set Operations

Complex queries should be visualized as operations on sets (Intersections, Unions, and Memberships). Instead of trying to write a complex query all at once, students should:

Identify the individual sets (e.g., the set of all instructors, the set of all teaching assignments).
Determine how these sets interact (e.g., finding the intersection of instructors and those in the 'Art' department).
Combine these components logically.

Questions & Discussion

Question (Alyssa): What are the symbols $k$ and $r$ in the slides? - Response: $k$ represents the super key, which is a set of attributes. $r$ represents the relation. The expression implies that the set of attributes in the key must be a subset of the total attributes in the relation schema.
Question (Roxanna): Is a candidate key normally just one entry (attribute)? - Response: While often small, it depends on the business logic. In a takes table, the key might require five attributes ( $ID$ , $course\_id$ , $section\_id$ , $semester$ , $year$ ) to ensure a student cannot be registered for the same section twice in the same semester.
Question (Amanda): If we query departments from the instructor table vs. the department table, is the result the same? - Response: No. Querying instructor only shows departments that currently have at least one instructor. Querying department shows all departments defined in the system, even those with no employees.
Question (Anna): Does a SELECT query add new columns to the actual table? - Response: No. A SELECT query is for retrieval only. It spits out a temporary table as a results representation; it never modifies the underlying data structure.
Question (Aj): What is the concept used to connect tables? - Response: The concept is called Joining. It uses foreign keys to navigate relationships between different relations.