1/59
finalisation ting
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Normalization
A table design technique aimed at minimizing data redundancies, focusing on the characteristics of specific entities.
1NF, 2NF, and 3NF
First three normal forms, most commonly used in normalization.
Iterative ER process
The best practice when performing normalization to define all entities and their attributes so that all equivalent tables are in 3NF.
Data Redundancy
A situation in a database where data is unnecessarily repeated, leading to potential inconsistencies and inefficiencies.
Data Anomalies
An undesirable situation in a database where inconsistencies arise during insertion, deletion, or update operations due to data redundancy.
Primary Key
To fulfill 1NF, eliminate repeating groups and identify the .
2NF
Remove partial dependency to meet this normal form.
3NF
Remove tranistive dependency to meet this normal form.
Composite (Bridge) Entity
A table that is used to capture the relationship between two tables after a split, especially in many-to-many relationships.
Boyce-Codd Normal Form (BCNF)
A special case of 3NF, to cover some specific aspects and problems with 3NF, also known as 3.5NF.
candidate key
BCNF can be violated only when the table contains more than one .
determinant
If a table is in BCNF, every in all dependencies must be a complete CK.
Denormalization
The process of adding redundancy back into a normalized database to improve performance.
CREATE
SQL command used to create new tables, indexes, or database structures.
ALTER
SQL command used to modify existing database structures.
DROP
SQL command used to remove tables, indexes, or other database objects permanently.
INSERT
SQL command used to add new rows of data to a table.
DELETE
SQL command used to remove existing rows from a table.
UPDATE
SQL command used to modify existing data within a table.
SELECT
SQL command used to retrieve data from one or more tables
COMMIT
SQL command used to saves all changes made during the current transaction to the database.
ROLLBACK
SQL command used to revert changes made in the current transaction if an error occurs.
SAVEPOINT
SQL command used to creates a temporary save point within a transaction, allowing partial rollbacks.
GROUP BY
SQL clause used to group rows with the same values in one or more columns into a summary row.
HAVING
SQL clause used to filter the results of a GROUP BY query based on a specified condition.
Data Privacy
The rights of individuals and organisations to determine access to data about themselves.
Data Governance Model
A framework that outlines the roles, responsibilities, processes, and policies for managing and governing data within an organisation.
Ethics
Moral principles that control or influence a person’s behavior.
Data Privacy
Focuses on protecting personal private data and information.
Data Ethics
Relevant to all data use, regardless of privacy protection or the specific actions taken with the data.
Grant
Used to give user access privileges to a database.
Revoke
Used to revoke authorization, i.e., to take back permissions from the user.
Deny
Explicitly prevents a user from receiving a particular permission.
Need-to-know basis
Principle of information security to minimize the risk of unauthorized disclosure.
Big Data
Large and complex sets of raw data (difficult or impossible to capture in ER models).
Volume
Quantity of data to be stored.
Velocity
Speed at which data is entering the system.
Variety
Variations in the structure of the data to be stored.
Structured Data
Any data types that can be clearly defined, stored, accessed, and processed in a fixed format.
Unstructured Data
Anything that cannot be described as structured data.
Semi-Structured
Data is in between Structured Data and Unstructured Data.
First-Party Data
Data directly collected from companies’ own websites and apps.
Contextual Advertising
Placing ads based on the content of the website or app being viewed, rather than the user's browsing history.
Hadoop
Open-source framework for storing and analyzing massive amounts of distributed, unstructured data.
HDFS
Hadoop Distributed File System; a low-level distributed file processing system for storing files across networks.
MapReduce
Open-source application programming interface (API) and framework used to process large data sets across clusters.
Name Node
In HDFS, contains file system metadata.
Data Node
In HDFS, stores the actual file data.
Job Tracker
Central control program in MapReduce to accept, distribute, monitor, and report on jobs in a Hadoop environment.
Task Tracker
Program in MapReduce responsible for executing the individual map and reduce tasks assigned by the Job Tracker.
Data Ingestion Applications
Flume/Sqoop gather data from existing systems and ingest into Hadoop.
Hive
Sits on top of Hadoop to help create MapReduce jobs, using a SQL-like language called HiveQL.
Pig
Hadoop platform to write MapReduce programs using its own high-level scripting/programming language: Pig Latin.
HBase / Impala
Provides faster query access directly to HDFS without using MapReduce.
NoSQL
Not modeled using relational model / non-SQL / not-only SQL / Non-relational database, developed to address Big Data challenges.
Key-Value Database
NoSQL Database; Stores data as a collection of key-value pairs where keys are similar to primary keys in relational databases.
Column-oriented Databases
NoSQL Database; Blocks hold data from a single column across many rows, with relational logic.
Graph Databases
NoSQL Database; Suitable for relationship-rich data, using a collection of nodes and edges.
Document Databases
NoSQL Database; Stores data in key-value pairs in which the value components are tag-encoded documents (XML, JSON, or BSON).
Challenges Big data solves
Linear scalability, High throughput, Fault tolerance, Auto recovery, High degree of parallelism, Distributed data processing.