1/24
Comprehensive practice flashcards covering data integrity, normalization, big data characteristics, storage architectures, and analytics based on the IT Edexcel Unit 3 lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is data manipulation?
The process of changing data to make it easier to read or be more organized, presenting it in a more meaningful manner.
How is data integrity defined?
The overall accuracy, completeness, consistency, and reliability of data from the time it is collected and entered to when it is archived.
What are the two main types of data integrity?
Physical integrity and logical integrity.
What tools can ensure physical integrity against human or natural disasters?
Redundant hardware, an uninterruptible power supply (UPS), radiation hardened chips, and error-correcting memory.
What is Entity integrity?
A condition in which all tuples within a table are uniquely identified by their primary key.
What is Referential integrity?
A condition where a foreign key value has a match in the corresponding table or the foreign key value is null.
What is the primary difference between data security and data integrity?
Data security refers to the protection of data, while data integrity refers to the trustworthiness of data.
Name the five standard ways to validate input to preserve data integrity.
Presence check, format check, length check, type check, and range check.
What are the Four Pillars of Data Governance?
Data Quality, Data stewardship, Data Protection and Compliance, and Data management.
What is a data dictionary?
A collection of metadata about a database, containing records about objects like data ownership and relationships, usually handled by database administrators.
What is the difference between an Active and a Passive Data Dictionary?
An Active Data Dictionary is part of the database and updates automatically via the DBMS, while a Passive Data Dictionary is held externally and maintained manually or at intervals.
Define a Check digit.
The last one or two digits in a code used to check that the other digits are correct, commonly used in barcode readers.
What is data redundancy?
A data organization issue involving the unnecessary duplication of data, leading to wasted storage and maintenance problems.
What are the three common anomalies caused by data redundancy?
Insertion Anomaly, Updation Anomaly, and Deletion Anomaly.
What is the requirement for a table to be in First Normal Form (1NF)?
It must contain no repeating fields or columns.
What is required for a table to be in Second Normal Form (2NF)?
It must be in 1NF and all fields must be functionally dependent on the primary key, removing partial dependencies.
What is the requirement for Third Normal Form (3NF)?
It must be in 2NF and have no transitive functional dependencies on the primary key.
Define Big Data.
Extremely large data sets, typically exceeding 1 TB, analyzed computationally to reveal patterns, trends, and associations.
What are the five V's of Big Data?
Velocity, Volume, Value, Variety, and Veracity.
What is the difference between a Data Lake and a Data Warehouse regarding schema?
A Data Warehouse is schema-on-write (designed prior to implementation), while a Data Lake is schema-on-read (written at the time of analysis).
What is the core difference between TCP and UDP protocols?
TCP is a reliable transport protocol with integrity checking and guaranteed delivery, while UDP is an unreliable but faster protocol used for time-sensitive transmissions like video streaming.
What is Hadoop?
An open source framework based on Java that uses distributed storage and parallel processing to handle Big Data across clusters of hardware.
Compare Lambda and Kappa architectures.
Lambda architecture uses separate layers for batch and streaming, while Kappa architecture uses a unified layer for both.
Define Data Mining.
The process of extracting value from data stored in a data warehouse, changing raw data into information using techniques like clustering or regression.
What are the four types of Data Analytics?
Descriptive (history), Diagnostic (why it happened), Predictive (future forecasts), and Prescriptive (recommended actions).