CGS Final Exam

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/113

There's no tags or description

Looks like no tags are added yet.

Last updated 11:21 AM on 5/4/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

114 Terms

New cards

NoSQL

A generation of database management systems not based on the traditional relational model.

New cards

What "NoSQL" actually stands for

"Not Only SQL"

New cards

Five NoSQL characteristics

(1) Not based on the relational model, (2) Support distributed architectures, (3) Provide fault tolerance and high scalability/availability, (4) Support large amounts of sparse data, (5) Geared toward performance over consistency.

New cards

Four main categories of NoSQL databases

Key-Value, Document, Graph, Column-Oriented.

New cards

Key-Value (KV) database

A NoSQL model that stores data as key-value pairs in which the value is unintelligible to the DBMS.

New cards

Bucket (KV database)

A logical grouping of keys, similar to a table; a key can appear only once within a bucket.

New cards

Three KV operations

Get (retrieve value by key), Store (write value to key, replacing any existing value), Delete (remove the key-value pair).

New cards

Examples of Key-Value databases

Dynamo, Riak, Redis, Voldemort.

New cards

Document database

A NoSQL model that stores key-value pairs in which the value is a tag-encoded document (XML, JSON, BSON), and the DBMS understands the document's content.

New cards

Collection (document database)

The grouping container for key-value pairs, analogous to a bucket in KV databases.

New cards

Key difference between KV and Document databases

Document DBMSs understand and can query the value's internal structure; KV DBMSs do not.

New cards

Examples of Document databases

MongoDB, CouchDB, OrientDB, RavenDB.

New cards

Graph database

A NoSQL database that uses graph theory to store entity instances and the relationships between them, represented as nodes and edges.

New cards

Node (graph DB)

A single entity instance.

New cards

Edge (graph DB)

A relationship between nodes.

New cards

Property (graph DB)

An attribute describing a node or an edge.

New cards

How is graph data physically stored

Often in structures like an adjacency matrix or as key-value pairs, even though it is visualized as nodes and edges.

New cards

Examples of Graph databases

Neo4j, ArangoDB, GraphBase, Aerospike.

New cards

Hadoop

A Java-based framework (not a database) for distributing and processing very large data sets across clusters of computers.

New cards

Two most important parts of Hadoop

HDFS (Hadoop Distributed File System) and MapReduce.

New cards

HDFS

A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speed; a low-level distributed file system used directly for storage.

New cards

Four HDFS assumptions

(1) High volume (terabyte+ files), (2) Write-once, read-many (no edits after close), (3) Streaming access (process whole files as a stream), (4) Fault tolerance (replicate data across many machines).

New cards

Client node (HDFS)

A node that makes requests to the file system.

New cards

Name node (HDFS)

The node that stores metadata about which blocks belong to which files and which data nodes hold them.

New cards

Data node (HDFS)

A node that stores the actual file data blocks.

New cards

Block report

A report sent every 6 hours from a data node to the name node listing which blocks it holds.

New cards

Heartbeat

A signal sent every 3 seconds from a data node to the name node to confirm it is still available.

New cards

What happens when a name node stops receiving heartbeats from a data node

It excludes that data node from future read/write lists and may instruct other nodes to replicate the missing data.

New cards

MapReduce

A divide-and-conquer parallel processing technique: split a large data block into sub-blocks, compute intermediate results, then summarize into one final answer.

New cards

Mapper

A program that performs the Map function

New cards

Reducer

A program that performs the Reduce function

New cards

Big Data

A term describing data sets so large, fast, or varied that traditional RDBMSs cannot handle them efficiently.

New cards

The 3 Vs

Volume, Velocity, Variety.

New cards

Volume

A characteristic of Big Data describing the quantity of data to be stored.

New cards

Velocity

A characteristic of Big Data describing the speed at which data enters the system and the speed at which it must be processed.

New cards

Variety

A characteristic of Big Data describing variations in the structure of the data being stored.

New cards

Scaling up

Handling data growth by migrating to a more powerful single system (more CPUs, more storage on one machine).

New cards

Scaling out

Handling data growth by distributing storage across a cluster of commodity servers; the dominant approach for Big Data.

New cards

Why RDBMSs are ill-suited for clusters

Distributing an RDBMS requires heavy communication and coordination among nodes, with significant performance cost.

New cards

Stream processing

Processing data as it enters the system to decide what to keep and what to discard before storage (focuses on inputs).

New cards

Feedback loop processing

Analyzing stored data to produce actionable results (focuses on outputs).

New cards

Structured data

Data that conforms to a predefined data model (e.g., relational tables).

New cards

Unstructured data

Data that does not conform to a predefined data model (e.g., images, video, audio).

New cards

BLOB (Binary Large Object)

An RDBMS data type for storing unstructured objects as a single atomic value; semantic content is opaque to the DBMS.

New cards

Variability

Big Data characteristic where the same data values may have different meanings or interpretations over time.

New cards

Veracity

Big Data characteristic regarding the trustworthiness/quality of the data.

New cards

Value

Big Data characteristic regarding the degree to which data can provide meaningful insights.

New cards

Visualization

The ability to graphically present data in a way that makes it understandable to users.

New cards

Concurrency control

A DBMS feature that coordinates simultaneous execution of transactions in a multi-user system while preserving data integrity.

New cards

Which ACID property does concurrency control mostly preserve

Isolation.

New cards

The three concurrency control problems

Lost update, uncommitted data (dirty read), inconsistent retrieval.

New cards

Lost update

A concurrency problem in which a data update is overwritten and lost during concurrent execution of transactions.

New cards

Uncommitted data (dirty read)

A concurrency problem in which a transaction reads data written by another transaction that later rolls back.

New cards

Inconsistent retrieval

A concurrency problem in which a transaction uses an aggregate function on data while other transactions are updating that data, producing incorrect aggregate results.

New cards

Scheduler

A DBMS component that establishes the order in which concurrent transaction operations are executed, interleaving them to ensure serializability.

New cards

Serializable schedule

A schedule of operations whose interleaved execution yields the same result as some serial execution.

New cards

Lock

A device that guarantees unique use of a data item for a particular transaction operation.

New cards

Pessimistic locking

Use of locks based on the assumption that conflicts between transactions will occur.

New cards

Lock manager

A DBMS component responsible for assigning and releasing locks.

New cards

Lock granularity

The level at which locks are applied: database, table, page, row, or field (broadest to most fine-grained).

New cards

Database-level lock

A lock that restricts database access to the lock owner; only one user at a time can use the database.

New cards

Table-level lock

A lock that allows only one transaction at a time to access a given table.

New cards

Page-level lock

A lock that restricts access to a disk page (a section of disk).

New cards

Row-level lock

A lock that allows concurrent transactions to access different rows of the same table, even if those rows live on the same page.

New cards

Field-level lock

A lock that allows concurrent transactions to access the same row but different fields; most flexible, highest overhead.

New cards

Trade-off as lock granularity gets finer

More concurrency, but higher overhead cost.

New cards

Binary lock

A lock with only two states: locked and unlocked.

New cards

Exclusive lock

A lock issued when a transaction requests permission to update a data item and no other locks are held on it.

New cards

Shared lock

A lock issued when a transaction requests permission to read a data item and no exclusive lock is held on it by another transaction.

New cards

Deadlock

A condition in which two or more transactions wait indefinitely for each other to release locks (also called a deadly embrace).

New cards

Deadlock prevention

A transaction requesting a lock is aborted if there is any chance of deadlock.

New cards

Deadlock detection

The DBMS periodically tests the database for deadlocks; if found, one transaction is aborted.

New cards

Deadlock avoidance

Transactions must obtain every lock they will need before being allowed to execute.

New cards

Transaction

A sequence of database requests that accesses the database; a logical unit of work that either entirely completes or is aborted.

New cards

Consistent database state

A state in which all data integrity constraints are satisfied.

New cards

Rollback

Reverting the database to its previous consistent state because a transaction failed or was explicitly aborted.

New cards

Atomicity (the A in ACID)

All parts of a transaction are treated as a single, indivisible logical unit

New cards

Consistency (the C in ACID)

Data integrity constraints are satisfied; transactions must start and end in consistent states.

New cards

Isolation (the I in ACID)

A data item used by one transaction is not available to other transactions until the first one ends.

New cards

Durability (the D in ACID)

Once a transaction is committed, its changes cannot be undone or lost, even after a system failure.

New cards

Serializability

The selected order of concurrent transaction operations produces the same final database state as some serial execution would have produced.

New cards

COMMIT

Permanently records all changes made by the transaction and ends the transaction.

New cards

ROLLBACK

Aborts all changes made by the transaction and reverts the database to its previous state.

New cards

START TRANSACTION (MySQL)

Explicitly begins a transaction; required in MySQL where transactions are not always implicit.

New cards

Implicit COMMIT

When the SQL command set ends successfully, all changes are recorded automatically as if COMMIT were issued.

New cards

Implicit ROLLBACK

When the SQL command set terminates abnormally, changes are aborted automatically as if ROLLBACK were issued.

New cards

Transaction log

A DBMS feature that keeps track of all transaction operations that update the database, used for recovery from rollbacks, abnormal termination, or system failure.

New cards

Six things stored in the transaction log

(1) Begin marker for each transaction, (2) Operation type (INSERT/UPDATE/DELETE), (3) Names of affected objects, (4) Before-and-after values for updated fields, (5) Pointers to previous and next log entries, (6) End/COMMIT marker.

New cards

Embedded SQL

SQL statements contained within an application written in a host programming language such as C, C++, Java, or ASP.NET.

New cards

Host language

Any programming language that contains embedded SQL.

New cards

Steps to build an embedded-SQL program

(1) Programmer writes embedded SQL inside host code, (2) Pre-processor transforms it into DBMS- and language-specific procedure calls, (3) Host compiler compiles the program, (4) Linker produces the executable plus an "access plan" module.

New cards

Access plan

The compiled module containing the instructions needed to run embedded SQL code at runtime.

New cards

Main weakness of embedded SQL

Executables can be decompiled, exposing table names and dictionary structure; SQL errors are not caught at compile time and may surface at run-time.

New cards

Stored procedure

Business logic stored on the database server in the form of SQL code (or a DBMS-specific procedural language) that can be called by applications.

New cards

Two main advantages of stored procedures

(1) Reduce network traffic and improve performance (SQL is not transmitted across the network), (2) Reduce code duplication, lowering errors and maintenance cost.

New cards

Stored procedure syntax (MySQL)

CREATE PROCEDURE name(parameter_list) BEGIN SQL_statements; END;

New cards

IN parameter

A value supplied by the caller into the stored procedure.

New cards

OUT parameter

A value returned from the stored procedure to the caller.

New cards

How to invoke a stored procedure manually

Use CALL procedure_name(arg1, arg2, …);

100

New cards

Why prefer stored procedures over inline SQL strings in app code

Centralizes business logic, improves security (less SQL injection surface), reduces network traffic, easier to maintain.