1/20
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Big Data
Datasets that are too large or complex for traditional data processing applications to handle.
The 3 V's of Big Data
Volume (amount), Velocity (speed of generation), and Variety (types of data like text, video, and audio).
Structured Data
Data that fits into fixed fields or tables (e.g., spreadsheets, CSV files).
Unstructured Data
Data that does not have a predefined model (e.g., social media posts, videos, raw sensor data).
Information vs. Data
Data is the raw facts; Information is the knowledge or patterns extracted after processing that data.
Scalability
The ability of a system to maintain performance and handle growth as the amount of data increases.
Sequential Processing
A method where tasks are completed one after another in an ordered sequence; slower for Big Data.
Parallel Processing
Splitting a large task into smaller parts that are processed simultaneously by multiple processors to save time.
Distributed Systems
A network of independent computers that work together as a single system to process massive datasets.
Cloud Computing
Using remote servers hosted on the internet to store and process data rather than a local server or PC.
Data Cleaning
The process of fixing or removing incomplete, duplicate, or incorrectly formatted records to ensure accuracy.
Data Filtering
Narrowing down a dataset to a specific subset based on certain criteria (e.g., only looking at data from 2024).
Classification
A data mining technique that assigns data into predefined categories (e.g., sorting emails into 'Spam' or 'Inbox').
Clustering
Grouping similar data points together without pre-existing labels to find natural patterns.
Data Visualization
Using charts or graphs to help humans identify trends or patterns in processed data.
Correlation
A statistical relationship where two variables move together, but one does not necessarily cause the other.
Causation
A relationship where one event or variable is the direct result of the other.
Digital Divide
The gap between those who have access to modern technology/internet and those who do not; often leads to Data Bias.
Bias in Data
When the data collection method excludes certain groups, leading to results that don't accurately represent the whole population.
Re-identification
The process of matching anonymous data with other available information to discover an individual's identity (a major privacy risk).
Open Data
Publicly available datasets that anyone can access, use, and share (often used by 'Citizen Scientists').