IS 420 Exam 2 Review (storage, indexing, triggers, NoSQL, Column Family DB)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/73

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

74 Terms

1
New cards

Index

Data structure to improve database query performance.

2
New cards

Primary Index

Unique index for a table's primary key.

<p>Unique index for a table's primary key.</p>
3
New cards

Trigger

Database procedure automatically executed on events.

4
New cards

BEFORE Keyword

Specifies trigger execution before an operation.

5
New cards

Row Level Trigger

Trigger that executes for each row affected.

6
New cards

Statement Level Trigger

Trigger that executes once per SQL statement.

7
New cards

NoSQL Database

Non-relational database for unstructured data.

<p>Non-relational database for unstructured data.</p>
8
New cards

CAP Theorem

States consistency, availability, and partition tolerance.. there is priority assigned to each one. You must have trade offs. You have to pick TWO!
(Distributed systems can only have 2 at a time, never all 3:
- Consistency
- Availability
- Partition tolerance)

<p>States consistency, availability, and partition tolerance.. there is priority assigned to each one. You must have trade offs. You have to pick TWO! <br>(Distributed systems can only have 2 at a time, never all 3:<br>- Consistency<br>- Availability<br>- Partition tolerance)</p>
9
New cards

Consistency

Consistent copies of data on different servers

10
New cards

Availability

Response to any query

11
New cards

Partition tolerance

If network partition occurs (inability to send messages between servers) system can still operate correctly

12
New cards

Column Family Database

NoSQL database storing data in columns.

<p>NoSQL database storing data in columns.</p>
13
New cards

Gossip Protocol

Method for nodes to communicate in distributed systems.

<p>Method for nodes to communicate in distributed systems.</p>
14
New cards

Information Entropy

The more scattered and inconsistent the information is in a database, the higher its entropy.
- Measure of uncertainty in data.
- Cassandra uses an anti-entropy algorithm to correct inconsistencies among replicas.

15
New cards

Indexes (NoSQL)

allow for a rapid lookup of data in table
- Look up keyword (entry) in _________
- Find the page (row location) on book (disk)
- Directly access page (row)
- Each ___________ is a separate table
- allow for a rapid lookup of data in table (DB indexes work like an index of a book).

16
New cards

Create a trigger to do a specific task

DO THE WORK and have chat grade you.

17
New cards

Given PL/SQL code containing a trigger, provide the output after an operation occurs that may fire the trigger:

-- Just the table, not the trigger

CREATE TABLE employees (

id NUMBER PRIMARY KEY,

name VARCHAR2(50),

salary NUMBER

);

INSERT INTO employees VALUES (1, 'John Doe', 50000);

INSERT INTO employees VALUES (2, 'Jane Smith', 60000);

COMMIT;

-- The trigger

CREATE OR REPLACE TRIGGER trg_after_update_employee

AFTER UPDATE ON employees

FOR EACH ROW

BEGIN

DBMS_OUTPUT.PUT_LINE('Employee ' || :OLD.name || ' changed salary from ' || :OLD.salary || ' to ' || :NEW.salary);

END;

/

-- Needs this to FIRE

UPDATE employees SET salary = 55000 WHERE id = 1;

This trigger is fired after an UPDATE operation on the employees table. It logs the employee's name and their salary change.

Employee John Doe changed salary from 50000 to 55000

18
New cards

NoSQL: BASE

BAsically available. Data will be provided from available servers

Soft state: Data in different servers do not have to be mutually consistent

Eventual consistency: Data in different servers will become consistent at some point in time

19
New cards

Relational DB's: ACID

Atomic

Consistent

Isolated

Durable transactions

20
New cards

Relational Databases

Schema: Fixed schema (tables, columns, data types).

Data Model: Structured as tables with rows and columns.

Joins: Supports joins to combine data from multiple tables.

Scaling: Scales vertically (adding more resources to a single server).

Consistency: Strong consistency (ACID properties).

Transaction Support: Fully supports multi-row transactions.

Performance: Optimized for structured data and complex queries.

Use Cases: Best for structured data and complex relationships (e.g., banking).

21
New cards

NoSQL Databases

Minimal storage and retrieval principle

No tables, columns, constraints, foreign keys

No joins

No SQL

Mapping to Rel

Schema: Schema-less or flexible schema.

Data Model: Varies: key-value, document, column-family, or graph.

NO Joins: Does not support joins; data is often denormalized.

Scaling: Scales horizontally (adding more servers).

Consistency: Eventual consistency (BASE properties).

Transaction Support: Limited or no support for multi-row transactions.

Performance: Optimized for unstructured data and high-speed operations.

Use Cases: Best for large-scale, unstructured, or semi-structured data (e.g., social media).

22
New cards

Relational DBs

• Tables with rows/cols
• Tables have fixed structure
• Object information may be located in multiple tables (EMP, DEPT, PROJ)
• Join operations match records from diff tables

23
New cards

Column DBs

• There are no tables
• Rows do not have fixed structure
• Object information is in a single (long) row
• There is no join operation

24
New cards

namespace

- a collection of identifiers within Key-value DB
- Keys must be unique within
- may be used for an entire db, Or, the same db may contain >1 _______________,
each one called a bucket
- a container that holds a set of identifiers or names, ensuring that they are unique to avoid any conflict

25
New cards

quorum

the number of servers that must
respond to a read or write operation, for the
operation to be considered complete

26
New cards

For a read operation

- Read from all servers
- Return the value that is equal or greater than a
configurable read-threshold

27
New cards

Using Quorum for Reads

• Read-threshold = 3, return value after 3 servers send same value
• Read-threshold = 1, response time? Consistency?
• Read-threshold = 5, response time? Consistency?

28
New cards

Using Quorum for Writes

- complete when a minimum number of server copies have been written
- Send write to all servers
- The write is complete when the number of servers that updated (made a copy of) the value is equal or greater than a configurable write-threshold

29
New cards

Eventual Consistency

- updates to the database will propagate through the system and eventually all data copies will be consistent
- There are periods of time that same item in different db servers does not have the same value, and the two dbs are inconsistent (temporarily)
- Reads from different db server will return different results (during that period of time)

30
New cards

NoSQL databases often use

Eventual Consistency

31
New cards

NoSQL Types

(Key-Value DBs: A key is associated with a value.

Document DBs: Values are documents (collections of items stored together)-Semi-structured (e.g., JSON, XML).

Column Family DBs: Data stored/organized in columns rather than rows.

Graph DBs: Model objects and relationships

32
New cards

NoSQL Properties

• Scalability
• Flexibility
• Cost control
• Availability
(... & many more)

33
New cards

primary storage media

- Cache (RAM) and main memory
- Fastest media but volatile (cache, main memory)

34
New cards

secondary storage

next level in hierarchy, non-volatile, moderately fast access time
- also called on-line storage
- E.g. flash memory, magnetic disks

35
New cards

tertiary storage

lowest level in hierarchy, non-volatile,
slow access time
- also called off-line storage
- E.g. magnetic tape, optical storage

36
New cards

Magnetic Hard Disk Mechanism: Track

Surface of platter divided into circular
- Over 16,000 per platter on typical hard disks

37
New cards

Magnetic Hard Disk Mechanism: Sector

Each track is divided into __________
- the smallest unit of data that can be read or written.
- size typically 512 bytes
- Typical ______________ per track: 200 (on inner tracks) to 400 (on outer
tracks)

38
New cards

Magnetic Hard Disk Mechanism: Cylinder

refers to the set of tracks that are aligned vertically across all platters in a hard disk at the same radial position.
It represents all the tracks on a hard disk that can be accessed without moving the read/write head laterally.

- a cylindrical intersection through the stack of platters in a disk, centered around the disk's spindle.
- holds a group of tracks that are separated by platters or heads
- the aggregate of the same track number on every platter used for recording.

<p>refers to the set of tracks that are aligned vertically across all platters in a hard disk at the same radial position.<br>It represents all the tracks on a hard disk that can be accessed without moving the read/write head laterally.<br><br>- a cylindrical intersection through the stack of platters in a disk, centered around the disk's spindle.<br>- holds a group of tracks that are separated by platters or heads<br>- the aggregate of the same track number on every platter used for recording.</p>
39
New cards

Access time

the time it takes from start of a read or write request to when data transfer begins.

Consists of:

Seek time – time to position the arm over the correct track.

• Average seek time is 1/2 the worst case seek time.

• 4 to 10 milliseconds on typical disks

Rotational latency – time it takes for the sector to be

accessed to appear under the head.

• Average latency is 1/2 of a full rotation of the disk

• 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)

Data-transfer rate – from or to the disk.

– 50-120 to MB per second is typical

40
New cards

Solid State Drives (SSD)

• Based on flash memory (NAND blocks)

• There are no moving parts

• Contain a controller as a bridge between the OS and the NAND block storage

- Buffer/Cache: High-speed RAM for quick data access.

- Wear Leveling

- OS boot time 10-13 sec

- Access time ~0.1 ms

- Data Transfer Rate 100-600 MB/sec

- Reliability Good, but cannot survive multiple power failures

- Cost $0.35 per GB

- Capacity Up to 16 TB (120-512GB more common)

Durability: Limited write/erase cycles.

41
New cards

Solid State Drives (SSDs)

Mechanics: No moving parts, uses NAND flash memory.

Controller: Handles error correction, wear leveling, and encryption.

Buffer/Cache: High-speed RAM for quick data access.

Performance: Access time: ~0.1 ms.

Data Transfer Rate: 100–600 MB/s.

Reliability: Good but prone to failure after repeated power outages.

Block Wear Issue: Blocks wear out faster with repeated writes; wear leveling helps mitigate.

Capacity: 120 GB to 16 TB.

Cost: ~$0.35 per GB.

Durability: Limited write/erase cycles.

42
New cards

Magnetic Disks (HDDs)

Mechanics: Uses a read-write head close to a platter to store and retrieve magnetically encoded data.

Platter Surface: Divided into tracks (16,000 per platter) and sectors (512 bytes per sector).

Disk Arm: Moves the read/write head to the correct track.

Head-Disk Assembly: Consists of 2 to 4 platters on a spindle.

Performance: Access time: 4–10 ms (seek time) + 4–11 ms (rotational latency).

Data Transfer Rate: 50–120 MB/s.

Reliability: Resistant to catastrophic failure but individual sectors may corrupt.

Capacity: Up to 8 TB.

Cost: ~$0.05 per GB.

Durability: Mechanical wear is common over time.

43
New cards

A table can have only one primary index, as it is tied to the table's primary key

How many primary indexes can a table have?

44
New cards

When is it suggested to create an index?

These are recommended for large tables, fields with a wide range of values, or fields frequently used in searches or joins.

45
New cards

Is an index a table?

- No, it is a separate database object designed to speed up data retrieval by maintaining pointers to the locations of records in the table.

46
New cards

database table index

- Distinct database table
- Contains data values with corresponding columns that specify physical locations of records that contain data values

47
New cards

Creating an Index

CREATE INDEX index_name

ON tablename (index_fieldname);

48
New cards

Create an index when

- Table contains a large number of records.
- Field contains a wide range of values.
- Field contains a large number of null values.
- Queries frequently use field in search conditions or join conditions.
- Most queries retrieve less than 2% to 4% of table rows.

49
New cards

Do not create index when

- Table does not contain large number of records
- Applications do not use proposed index field in query search condition
- Most queries retrieve more than 2% to 4% of table records
- Applications frequently insert or modify table data
- Decision based on judgment and experience

50
New cards

Composite Index

CREATE INDEX index_name
ON tablename(
index_fieldname1,
index_fieldname2, ...);

51
New cards

Is an Index a table?

An index is not a table; however, you may have a table with indexes
- In class, he said its anothertype of table... Google says otherwise.

52
New cards

How many secondary indexes can a table have?

Index on one or more column values. Either DB or application maintains index. If the DB maintains indexes, use them.
- NO MENTION IN SLIDES OR BOOK, MUST BE ANY AMOUNT THEN
- in Amazon DynamoDB, you can create up to 20 global secondary indexes and 5 local secondary indexes per table

53
New cards

When to Use Secondary Indexes Managed by the Column Family Database System

if you need secondary indexes on column values and the column family database system provides automatically managed secondary indexes, then you should use them. (not mentioned)

54
New cards

Database triggers

are objects in the database, operates on tables. When an event occurs, an action is taken

55
New cards

Trigger Events

INSERT, UPDATE, DELETE

56
New cards

Trigger Timing

BEFORE, AFTER

(Defines whether a trigger fires before or after the triggering SQL statement executes)

57
New cards

Trigger Level

ROW, STATEMENT

(Defines whether a trigger fires once for triggering statement or once for each row affected by the triggering statement)

58
New cards

Trigger Action

A PL/SQL code that will automatically execute/fire when event is realized

59
New cards

statement level trigger

only fired once per event

60
New cards

row level trigger

will fire once for each row affected by the event

61
New cards

Create a row level trigger

CREATE OR REPLACE TRIGGER trigger_name

{BEFORE | AFTER} {INSERT or UPDATE or DELETE} ON table_name

FOR EACH ROW [WHEN (condition)]

[DECLARE …]

BEGIN

trigger body

END;

--{} is a must

--[] is optional

--X | Y means X or Y

62
New cards

Create a statement level trigger

CREATE OR REPLACE TRIGGER trigger_name

{BEFORE | AFTER} {INSERT or UPDATE or DELETE} ON table_name

[DECLARE

declare local variables here]

BEGIN

trigger body

END;

-- {} is a must

-- [] is optional

-- X | Y means X OR Y

63
New cards

What is NEW and OLD in triggers

• Use :OLD.fieldname to reference value before the triggering event

• Use :NEW.fieldname to reference value after the triggering even

64
New cards

:OLD.fieldname

to reference value before the triggering event

65
New cards

:NEW.fieldname

to reference value after the triggering even

66
New cards

Commit;

In order for a trigger to be fired on (insert, delete, or update) statements you should add after your statement:

67
New cards

Common Errors for Row-Level Triggers

•Forget for each row

•Forget () for when condition

•Forget to use :new or :old in trigger body and new or old in when condition

68
New cards

Create a row level trigger that fires before the insert operation associated with employees table. The trigger displays the attribute values for first_name and last_name

Set serveroutput on;
CREATE OR REPLACE TRIGGER Inv_update_payment_transfer --make sure this line is not too long.
BEFORE UPDATE OF payment_total ON invoices
FOR EACH ROW
WHEN (NEW.payment_total + NEW.credit_total = NEW.invoice_total)
BEGIN
INSERT INTO paid_invoices (
invoice_id,
vendor_id,
invoice_number,
invoice_date,
invoice_total,
payment_total,
credit_total,
terms_id,
invoice_due_date,
payment_date
) VALUES (
:OLD.invoice_id, -- Use :OLD to reference the previous value of the invoice_id
:NEW.vendor_id, -- Use :NEW for the new value of vendor_id
:NEW.invoice_number,
:NEW.invoice_date,
:NEW.invoice_total,
:NEW.payment_total,
:NEW.credit_total,
:NEW.terms_id,
:NEW.invoice_due_date,
:NEW.payment_date
);
DBMS_OUTPUT.PUT_LINE('New paid invoice inserted');
END;
UPDATE INVOICES set PAYMENT_TOTAL=1575
WHERE INVOICE_ID=81;
Commit;

69
New cards

Create a row level trigger that fires before update operation on payment_total in the invoices table, the trigger should check if the value of payment_total + credit_total is greater than invoice_total . If yes, then a an exception should be raised automatically.

set serveroutput on;
create or replace TRIGGER
Inv_update_payment_transfer
before UPDATE OF payment_total ON invoices
FOR EACH ROW
WHEN (NEW.payment_total = new.invoice_total - new.credit_total)
BEGIN
INSERT INTO paid_invoices
VALUES(:old.INVOICE_ID,
:new.VENDOR_ID,
:new.INVOICE_NUMBER,
:new.INVOICE_DATE,
:new.INVOICE_TOTAL,
:new.PAYMENT_TOTAL,
:new.CREDIT_TOTAL,
:new.TERMS_ID,
:new.INVOICE_DUE_DATE,
:new.PAYMENT_DATE);
DBMS_OUTPUT.PUT_LINE('new paid invoice Inserted');
END;
/
commit;

70
New cards

How to maintain data availability in NoSQL environments?

Use a distributed system (a primary server and a backup/participating server)

71
New cards

Google announced its BigTable in 2006

The first column family DB

72
New cards

BigTable used in

- Web indexing (the typical google search)
- Google Earth
- Google Finance

73
New cards

Similarities/differences with relational databases

74
New cards

Gossip Protocol

• A server (peer) sends information about itself
and other servers it knows about, to another
server
• Information received from servers is aggregated before it is passed on to the next server resulting in information being passed more efficiently
• Version information about each server is also
exchanged
- allows for quick exchange of information