Information Technology - Unit 3: Manipulating Data and Big Data

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

Comprehensive practice flashcards covering data integrity, normalization, big data characteristics, storage architectures, and analytics based on the IT Edexcel Unit 3 lecture notes.

Last updated 6:15 AM on 5/25/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

1
New cards

What is data manipulation?

The process of changing data to make it easier to read or be more organized, presenting it in a more meaningful manner.

2
New cards

How is data integrity defined?

The overall accuracy, completeness, consistency, and reliability of data from the time it is collected and entered to when it is archived.

3
New cards

What are the two main types of data integrity?

Physical integrity and logical integrity.

4
New cards

What tools can ensure physical integrity against human or natural disasters?

Redundant hardware, an uninterruptible power supply (UPS), radiation hardened chips, and error-correcting memory.

5
New cards

What is Entity integrity?

A condition in which all tuples within a table are uniquely identified by their primary key.

6
New cards

What is Referential integrity?

A condition where a foreign key value has a match in the corresponding table or the foreign key value is null.

7
New cards

What is the primary difference between data security and data integrity?

Data security refers to the protection of data, while data integrity refers to the trustworthiness of data.

8
New cards

Name the five standard ways to validate input to preserve data integrity.

Presence check, format check, length check, type check, and range check.

9
New cards

What are the Four Pillars of Data Governance?

Data Quality, Data stewardship, Data Protection and Compliance, and Data management.

10
New cards

What is a data dictionary?

A collection of metadata about a database, containing records about objects like data ownership and relationships, usually handled by database administrators.

11
New cards

What is the difference between an Active and a Passive Data Dictionary?

An Active Data Dictionary is part of the database and updates automatically via the DBMS, while a Passive Data Dictionary is held externally and maintained manually or at intervals.

12
New cards

Define a Check digit.

The last one or two digits in a code used to check that the other digits are correct, commonly used in barcode readers.

13
New cards

What is data redundancy?

A data organization issue involving the unnecessary duplication of data, leading to wasted storage and maintenance problems.

14
New cards

What are the three common anomalies caused by data redundancy?

Insertion Anomaly, Updation Anomaly, and Deletion Anomaly.

15
New cards

What is the requirement for a table to be in First Normal Form (1NF1NF)?

It must contain no repeating fields or columns.

16
New cards

What is required for a table to be in Second Normal Form (2NF2NF)?

It must be in 1NF1NF and all fields must be functionally dependent on the primary key, removing partial dependencies.

17
New cards

What is the requirement for Third Normal Form (3NF3NF)?

It must be in 2NF2NF and have no transitive functional dependencies on the primary key.

18
New cards

Define Big Data.

Extremely large data sets, typically exceeding 1 TB1 \text{ TB}, analyzed computationally to reveal patterns, trends, and associations.

19
New cards

What are the five V's of Big Data?

Velocity, Volume, Value, Variety, and Veracity.

20
New cards

What is the difference between a Data Lake and a Data Warehouse regarding schema?

A Data Warehouse is schema-on-write (designed prior to implementation), while a Data Lake is schema-on-read (written at the time of analysis).

21
New cards

What is the core difference between TCP and UDP protocols?

TCP is a reliable transport protocol with integrity checking and guaranteed delivery, while UDP is an unreliable but faster protocol used for time-sensitive transmissions like video streaming.

22
New cards

What is Hadoop?

An open source framework based on Java that uses distributed storage and parallel processing to handle Big Data across clusters of hardware.

23
New cards

Compare Lambda and Kappa architectures.

Lambda architecture uses separate layers for batch and streaming, while Kappa architecture uses a unified layer for both.

24
New cards

Define Data Mining.

The process of extracting value from data stored in a data warehouse, changing raw data into information using techniques like clustering or regression.

25
New cards

What are the four types of Data Analytics?

Descriptive (history), Diagnostic (why it happened), Predictive (future forecasts), and Prescriptive (recommended actions).