1/21
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Big Data
A massive volume of structured, semi-structured, and unstructured data characterized by high volume, high velocity, and high variety that traditional data processing methods struggle to handle efficiently
Big Data Analytics
The use of advanced analytic techniques against very large, diverse datasets from different sources ranging in size from terabytes to zettabytes
3 V's of Big Data
The three main characteristics of big data: Volume (the amount of data), Velocity (the speed at which data is generated), and Variety (the different types and sources of data)
Structured Data
Data that is organized in a predefined format, typically stored in relational databases and easily searchable
Unstructured Data
Data that does not have a predefined format or organization, such as social media posts, videos, and sensor data
Data Analytics
The process of examining data sets to uncover hidden patterns, correlations, trends, and other useful information to turn raw data into actionable insights
Descriptive Analytics
A type of big data analytics that involves easily readable and interpretable data used to create reports and visualize information such as company profits and sales, answering the question "what happened"
Diagnostics Analytics
A type of big data analytics that helps companies understand why a problem occurred by mining and recovering data to dissect issues and prevent future occurrences, answering the question "why it happened"
Predictive Analytics
A type of big data analytics that examines past and present data using AI, machine learning, and data mining to forecast future trends and outcomes, answering the question "what will happen"
Prescriptive Analytics
A type of big data analytics that provides solutions to problems by relying on AI and machine learning for data-driven risk management, answering the question "what should be done about it"
Batch Processing
A data processing method that examines large data blocks over time, useful when there is a longer turnaround time between collecting and analyzing data
Stream Processing
A data processing method that examines small batches of data at once to shorten the delay between data collection and analysis for quicker decision-making, though more complex and expensive
Data Mining
A big data analysis method that sorts through large datasets to identify patterns, relationships, and anomalies by creating data clusters
Data Lake
A storage system where raw or unstructured data that is too diverse or complex for a warehouse is assigned metadata and stored
Data Warehouse
A storage system for large amounts of data collected from many different sources, typically using predefined schemas
Hadoop
An open-source framework that stores and processes big data sets, capable of handling and analyzing both structured and unstructured data
Spark
An open-source cluster computing framework used for real-time processing and analyzing data
NoSQL Databases
Non-relational data management systems ideal for dealing with raw and unstructured data
Distributed Storage
Databases that can split data across multiple servers and have the capability to identify lost or corrupt data, such as Cassandra
Stream Analytics Tools
Systems that filter, aggregate, and analyze data that might be stored in different platforms and formats, such as Kafka
Data Integration Software
Programs that allow big data to be streamlined across different platforms such as MongoDB, Apache, Hadoop, and Amazon EMR
Dirty Data
Data that contains duplicates, errors, absences, conflicts, and inconsistencies that can obscure and mislead, creating flawed insights