W7-9

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/59

Earn XP

Description and Tags

finalisation ting

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

60 Terms

New cards

Normalization

A table design technique aimed at minimizing data redundancies, focusing on the characteristics of specific entities.

New cards

1NF, 2NF, and 3NF

First three normal forms, most commonly used in normalization.

New cards

Iterative ER process

The best practice when performing normalization to define all entities and their attributes so that all equivalent tables are in 3NF.

New cards

Data Redundancy

A situation in a database where data is unnecessarily repeated, leading to potential inconsistencies and inefficiencies.

New cards

Data Anomalies

An undesirable situation in a database where inconsistencies arise during insertion, deletion, or update operations due to data redundancy.

New cards

Primary Key

To fulfill 1NF, eliminate repeating groups and identify the .

New cards

2NF

Remove partial dependency to meet this normal form.

New cards

3NF

Remove tranistive dependency to meet this normal form.

New cards

Composite (Bridge) Entity

A table that is used to capture the relationship between two tables after a split, especially in many-to-many relationships.

New cards

Boyce-Codd Normal Form (BCNF)

A special case of 3NF, to cover some specific aspects and problems with 3NF, also known as 3.5NF.

New cards

candidate key

BCNF can be violated only when the table contains more than one .

New cards

determinant

If a table is in BCNF, every in all dependencies must be a complete CK.

New cards

Denormalization

The process of adding redundancy back into a normalized database to improve performance.

New cards

CREATE

SQL command used to create new tables, indexes, or database structures.

New cards

ALTER

SQL command used to modify existing database structures.

New cards

DROP

SQL command used to remove tables, indexes, or other database objects permanently.

New cards

INSERT

SQL command used to add new rows of data to a table.

New cards

DELETE

SQL command used to remove existing rows from a table.

New cards

UPDATE

SQL command used to modify existing data within a table.

New cards

SELECT

SQL command used to retrieve data from one or more tables

New cards

COMMIT

SQL command used to saves all changes made during the current transaction to the database.

New cards

ROLLBACK

SQL command used to revert changes made in the current transaction if an error occurs.

New cards

SAVEPOINT

SQL command used to creates a temporary save point within a transaction, allowing partial rollbacks.

New cards

GROUP BY

SQL clause used to group rows with the same values in one or more columns into a summary row.

New cards

HAVING

SQL clause used to filter the results of a GROUP BY query based on a specified condition.

New cards

Data Privacy

The rights of individuals and organisations to determine access to data about themselves.

New cards

Data Governance Model

A framework that outlines the roles, responsibilities, processes, and policies for managing and governing data within an organisation.

New cards

Ethics

Moral principles that control or influence a person’s behavior.

New cards

Data Privacy

Focuses on protecting personal private data and information.

New cards

Data Ethics

Relevant to all data use, regardless of privacy protection or the specific actions taken with the data.

New cards

Grant

Used to give user access privileges to a database.

New cards

Revoke

Used to revoke authorization, i.e., to take back permissions from the user.

New cards

Deny

Explicitly prevents a user from receiving a particular permission.

New cards

Need-to-know basis

Principle of information security to minimize the risk of unauthorized disclosure.

New cards

Big Data

Large and complex sets of raw data (difficult or impossible to capture in ER models).

New cards

Volume

Quantity of data to be stored.

New cards

Velocity

Speed at which data is entering the system.

New cards

Variety

Variations in the structure of the data to be stored.

New cards

Structured Data

Any data types that can be clearly defined, stored, accessed, and processed in a fixed format.

New cards

Unstructured Data

Anything that cannot be described as structured data.

New cards

Semi-Structured

Data is in between Structured Data and Unstructured Data.

New cards

First-Party Data

Data directly collected from companies’ own websites and apps.

New cards

Contextual Advertising

Placing ads based on the content of the website or app being viewed, rather than the user's browsing history.

New cards

Hadoop

Open-source framework for storing and analyzing massive amounts of distributed, unstructured data.

New cards

HDFS

Hadoop Distributed File System; a low-level distributed file processing system for storing files across networks.

New cards

MapReduce

Open-source application programming interface (API) and framework used to process large data sets across clusters.

New cards

Name Node

In HDFS, contains file system metadata.

New cards

Data Node

In HDFS, stores the actual file data.

New cards

Job Tracker

Central control program in MapReduce to accept, distribute, monitor, and report on jobs in a Hadoop environment.

New cards

Task Tracker

Program in MapReduce responsible for executing the individual map and reduce tasks assigned by the Job Tracker.

New cards

Data Ingestion Applications

Flume/Sqoop gather data from existing systems and ingest into Hadoop.

New cards

Hive

Sits on top of Hadoop to help create MapReduce jobs, using a SQL-like language called HiveQL.

New cards

Pig

Hadoop platform to write MapReduce programs using its own high-level scripting/programming language: Pig Latin.

New cards

HBase / Impala

Provides faster query access directly to HDFS without using MapReduce.

New cards

NoSQL

Not modeled using relational model / non-SQL / not-only SQL / Non-relational database, developed to address Big Data challenges.

New cards

Key-Value Database

NoSQL Database; Stores data as a collection of key-value pairs where keys are similar to primary keys in relational databases.

New cards

Column-oriented Databases

NoSQL Database; Blocks hold data from a single column across many rows, with relational logic.

New cards

Graph Databases

NoSQL Database; Suitable for relationship-rich data, using a collection of nodes and edges.

New cards

Document Databases

NoSQL Database; Stores data in key-value pairs in which the value components are tag-encoded documents (XML, JSON, or BSON).

New cards

Challenges Big data solves

Linear scalability, High throughput, Fault tolerance, Auto recovery, High degree of parallelism, Distributed data processing.