knowt logo

chapter 8

Flashcard 1

Q: What is Analytics? 

A: Processes, technologies, frameworks, and algorithms used to extract meaningful, actionable insights from data.


Flashcard 2

Q: What are the Seven Giants in the context of big data analytics? 

A: Basic Statistics, Generalized N-Body Problems, Linear Algebraic Computations, Graph-Theoretic Computation, Optimization, Integration, Alignment Problems


Flashcard 3

Q: Name the Basic Statistics used in analytics. 

A: Mean, Median, Variance, Counts, Top-N, Distinct.


Flashcard 4

Q: What is Descriptive Analytics? 

A: Analyzes past data and presents it in a summarized form to answer "What happened?"


Flashcard 5

Q: What types of analytics are Diagnostic, Predictive, and Prescriptive? 

A: They represent different levels of data analysis:

  • Diagnostic: Understand why something happened.

  • Predictive: Forecast what is likely to happen.

  • Prescriptive: Determine what actions to take to achieve desired outcomes.


Flashcard 6

Q: What are Generalized N-Body Problems in analytics? 

A: Computational tasks involving calculations of distances, kernels, similarity between pairs of points, nearest neighbors, clustering, and kernel SVM.


Flashcard 7

Q: What are Linear Algebraic Computations used for in analytics? 

A: Performing operations like Linear Algebra, Linear Regression, and Principal Component Analysis (PCA) for Descriptive, Diagnostic, and Predictive Analytics.


Flashcard 8

Q: Define Graph-Theoretic Computations in analytics.

A: Computations involving graph search, betweenness, centrality, commute distance, shortest path, and minimum spanning tree for Diagnostic, Predictive, and Prescriptive Analytics.


Flashcard 9

Q: What is Optimization in the context of analytics? 

A: Techniques like Minimization, Maximization, Linear Programming, Quadratic Programming, and Gradient Descent used in Prescriptive Analytics.


Flashcard 10

Q: What does Integration refer to in analytics? 

A: Methods such as Bayesian Inference, Expectations, and Markov Chain Monte Carlo used for Predictive and Prescriptive Analytics.


Flashcard 11

Q: What are Alignment Problems in analytics? 

A: Tasks like matching between data sets (text, images, sequences) and Hidden Markov Models used for Predictive and Prescriptive Analytics.


Flashcard 12

Q: What are the Types of Analytics? 

A: Descriptive, Diagnostic, Predictive, and Prescriptive Analytics.


Flashcard 13

Q: What does Descriptive Analytics aim to answer? 

A: "What happened?" by analyzing past data and presenting it in a summarized form using reports and alerts.


Flashcard 14

Q: How does Diagnostic Analytics differ from Descriptive Analytics? 

A: It seeks to understand "Why did it happen?" by analyzing past data through queries and data mining.


Flashcard 15

Q: What question does Predictive Analytics aim to answer? 

A: "What is likely to happen?" by predicting future events based on patterns and trained models using forecasts and simulations.



Flashcard 16

Q: What is the goal of Prescriptive Analytics

A: To determine "What can we do to make it happen?" by using predictive analyses to figure out the best course of action through planning and optimization.


Flashcard 17

Q: How is Big Data defined? 

A: Collections of datasets so large that they are difficult to manage, process, and analyze using traditional means.


Flashcard 18

Q: According to IBM, how much data is created every day? 

A: 2.5 quintillion bytes of data.


Flashcard 19

Q: What is Big Data Analytics

A: The collection, storage, processing, and analysis of massive-scale data, involving steps like data cleansing, munging, processing, and visualization.


Flashcard 20

Q: Why are special tools needed for Big Data Analytics? 

A: Because of the volume, velocity, and variety of data, making it difficult to store, process, and analyze on a single machine.



Flashcard 21

Q: List some Big Data Examples

A:

  • Data from social networks (text, images, audio, video)

  • Click-stream data from web applications

  • Machine sensor data from industrial and energy systems

  • Healthcare data from electronic health records (EHR)

  • Logs from web applications

  • Stock market data

  • Transactional data from banking and financial applications


Flashcard 22

Q: What are the Five V's of Big Data

A: Volume, Velocity, Variety, Veracity, Value.


Flashcard 23

Q: Define Volume in Big Data characteristics. 

A: The large size of data that requires specialized tools and frameworks for storage, processing, and analysis.


Flashcard 24

Q: What does Velocity refer to in Big Data? 

A: The speed at which data is generated and needs to be processed, often in real-time.


Flashcard 25

Q: Explain Variety in Big Data. 

A: The different forms of data, including structured, unstructured, and semi-structured data like text, images, audio, video, and sensor data.


Flashcard 26

Q: What is Veracity in the context of Big Data? 

A: The accuracy and trustworthiness of the data, which often requires cleaning to remove noise and ensure quality.


Flashcard 27

Q: What does Value signify in Big Data characteristics? 

A: The usefulness of data for its intended purpose, aiming to extract meaningful insights and benefits.


Flashcard 28

Q: What are the main steps in the Analytic Flow for Big Data? A:

  1. Data Collection

  2. Data Preparation

  3. Analysis Types

  4. Analysis Modes

  5. Visualizations


Flashcard 29

Q: What activities are involved in Data Preparation

A: Cleaning data by fixing corrupt records, handling missing values, removing duplicates, standardizing abbreviations and units, correcting typos, spellings, and formatting.


Flashcard 30

Q: What are the Analysis Modes in Big Data Analytics? 

A: Batch, real-time, or interactive analysis.


Flashcard 31

Q: What is the Big Data Stack

A: A layered framework consisting of Raw Data Sources, Data Access Connectors, Data Storage, Batch Analytics, Real-Time Analytics, Interactive Querying, and Serving Database/Web & Visualization Frameworks.


Flashcard 32

Q: What are Raw Data Sources in the Big Data Stack? 

A: The origins from where data is captured.


Flashcard 33

Q: What are Data Access Connectors

A: Tools and frameworks used for collecting data from various sources.


Flashcard 34

Q: Where is data stored in the Big Data Stack? 

A: In distributed file systems and NoSQL databases.


Flashcard 35

Q: What is Batch Analytics

A: Analyzing data in large chunks or batches, typically not in real-time.



Flashcard 36

Q: What are examples of Real-Time Analytics tools? 

A: Apache Storm and Spark Streaming.


Flashcard 37

Q: What does Interactive Querying involve? 

A: Using SQL-like languages to perform queries on data interactively.


Flashcard 38

Q: What are Serving Database, Web & Visualization Frameworks used for? 

A: To present and visualize the analyzed data for end-users.


Flashcard 39

Q: Describe the Alpha Pattern in Analytic Patterns. 

A: Batch Analysis used to ingest large amounts of data.


Flashcard 40

Q: What is the Beta Pattern in Analytic Patterns? 

A: Real-Time Analysis focused on ingesting streaming data.





Flashcard 41

Q: Explain the Gamma Pattern in Analytic Patterns. 

A: Combines Batch and Real-Time Analysis by ingesting streaming data into the big data stack.


Flashcard 42

Q: What is the Delta Pattern in Analytic Patterns? 

A: Interactive Querying using source-sink connectors or SQL connectors, followed by using SQL-like languages.


Flashcard 43

Q: What are Visualizations in the Analytic Flow? 

A: Tools used to present data visually, which can be static, dynamic, or interactive.


Flashcard 44

Q: Why is Data Cleansing important in Big Data Analytics? 

A: To remove noise and ensure data quality, which is essential for extracting accurate insights.


Flashcard 45

Q: What types of data does Big Data encompass? 

A: Structured, unstructured, and semi-structured data.





Flashcard 46

Q: How does Big Data Analytics handle the Variety of data? 

A: By using specialized tools and frameworks that can process different data formats like text, images, audio, video, and sensor data.


Flashcard 47

Q: What role do NoSQL databases play in the Big Data Stack? 

A: They provide scalable and flexible storage solutions for large and diverse datasets.


Flashcard 48

Q: What is the primary goal of any Big Data Analytics system? 

A: To extract value from the data by uncovering meaningful insights and supporting decision-making.


Flashcard 49

Q: How does Real-Time Analytics differ from Batch Analytics

A: Real-Time Analytics processes data as it arrives, enabling immediate insights, whereas Batch Analytics processes data in large groups at scheduled intervals.


Flashcard 50

Q: What are Reports and Alerts used for in Descriptive Analytics? 

A: To present summarized data and notify users of important events or thresholds.

chapter 8

Flashcard 1

Q: What is Analytics? 

A: Processes, technologies, frameworks, and algorithms used to extract meaningful, actionable insights from data.


Flashcard 2

Q: What are the Seven Giants in the context of big data analytics? 

A: Basic Statistics, Generalized N-Body Problems, Linear Algebraic Computations, Graph-Theoretic Computation, Optimization, Integration, Alignment Problems


Flashcard 3

Q: Name the Basic Statistics used in analytics. 

A: Mean, Median, Variance, Counts, Top-N, Distinct.


Flashcard 4

Q: What is Descriptive Analytics? 

A: Analyzes past data and presents it in a summarized form to answer "What happened?"


Flashcard 5

Q: What types of analytics are Diagnostic, Predictive, and Prescriptive? 

A: They represent different levels of data analysis:

  • Diagnostic: Understand why something happened.

  • Predictive: Forecast what is likely to happen.

  • Prescriptive: Determine what actions to take to achieve desired outcomes.


Flashcard 6

Q: What are Generalized N-Body Problems in analytics? 

A: Computational tasks involving calculations of distances, kernels, similarity between pairs of points, nearest neighbors, clustering, and kernel SVM.


Flashcard 7

Q: What are Linear Algebraic Computations used for in analytics? 

A: Performing operations like Linear Algebra, Linear Regression, and Principal Component Analysis (PCA) for Descriptive, Diagnostic, and Predictive Analytics.


Flashcard 8

Q: Define Graph-Theoretic Computations in analytics.

A: Computations involving graph search, betweenness, centrality, commute distance, shortest path, and minimum spanning tree for Diagnostic, Predictive, and Prescriptive Analytics.


Flashcard 9

Q: What is Optimization in the context of analytics? 

A: Techniques like Minimization, Maximization, Linear Programming, Quadratic Programming, and Gradient Descent used in Prescriptive Analytics.


Flashcard 10

Q: What does Integration refer to in analytics? 

A: Methods such as Bayesian Inference, Expectations, and Markov Chain Monte Carlo used for Predictive and Prescriptive Analytics.


Flashcard 11

Q: What are Alignment Problems in analytics? 

A: Tasks like matching between data sets (text, images, sequences) and Hidden Markov Models used for Predictive and Prescriptive Analytics.


Flashcard 12

Q: What are the Types of Analytics? 

A: Descriptive, Diagnostic, Predictive, and Prescriptive Analytics.


Flashcard 13

Q: What does Descriptive Analytics aim to answer? 

A: "What happened?" by analyzing past data and presenting it in a summarized form using reports and alerts.


Flashcard 14

Q: How does Diagnostic Analytics differ from Descriptive Analytics? 

A: It seeks to understand "Why did it happen?" by analyzing past data through queries and data mining.


Flashcard 15

Q: What question does Predictive Analytics aim to answer? 

A: "What is likely to happen?" by predicting future events based on patterns and trained models using forecasts and simulations.



Flashcard 16

Q: What is the goal of Prescriptive Analytics

A: To determine "What can we do to make it happen?" by using predictive analyses to figure out the best course of action through planning and optimization.


Flashcard 17

Q: How is Big Data defined? 

A: Collections of datasets so large that they are difficult to manage, process, and analyze using traditional means.


Flashcard 18

Q: According to IBM, how much data is created every day? 

A: 2.5 quintillion bytes of data.


Flashcard 19

Q: What is Big Data Analytics

A: The collection, storage, processing, and analysis of massive-scale data, involving steps like data cleansing, munging, processing, and visualization.


Flashcard 20

Q: Why are special tools needed for Big Data Analytics? 

A: Because of the volume, velocity, and variety of data, making it difficult to store, process, and analyze on a single machine.



Flashcard 21

Q: List some Big Data Examples

A:

  • Data from social networks (text, images, audio, video)

  • Click-stream data from web applications

  • Machine sensor data from industrial and energy systems

  • Healthcare data from electronic health records (EHR)

  • Logs from web applications

  • Stock market data

  • Transactional data from banking and financial applications


Flashcard 22

Q: What are the Five V's of Big Data

A: Volume, Velocity, Variety, Veracity, Value.


Flashcard 23

Q: Define Volume in Big Data characteristics. 

A: The large size of data that requires specialized tools and frameworks for storage, processing, and analysis.


Flashcard 24

Q: What does Velocity refer to in Big Data? 

A: The speed at which data is generated and needs to be processed, often in real-time.


Flashcard 25

Q: Explain Variety in Big Data. 

A: The different forms of data, including structured, unstructured, and semi-structured data like text, images, audio, video, and sensor data.


Flashcard 26

Q: What is Veracity in the context of Big Data? 

A: The accuracy and trustworthiness of the data, which often requires cleaning to remove noise and ensure quality.


Flashcard 27

Q: What does Value signify in Big Data characteristics? 

A: The usefulness of data for its intended purpose, aiming to extract meaningful insights and benefits.


Flashcard 28

Q: What are the main steps in the Analytic Flow for Big Data? A:

  1. Data Collection

  2. Data Preparation

  3. Analysis Types

  4. Analysis Modes

  5. Visualizations


Flashcard 29

Q: What activities are involved in Data Preparation

A: Cleaning data by fixing corrupt records, handling missing values, removing duplicates, standardizing abbreviations and units, correcting typos, spellings, and formatting.


Flashcard 30

Q: What are the Analysis Modes in Big Data Analytics? 

A: Batch, real-time, or interactive analysis.


Flashcard 31

Q: What is the Big Data Stack

A: A layered framework consisting of Raw Data Sources, Data Access Connectors, Data Storage, Batch Analytics, Real-Time Analytics, Interactive Querying, and Serving Database/Web & Visualization Frameworks.


Flashcard 32

Q: What are Raw Data Sources in the Big Data Stack? 

A: The origins from where data is captured.


Flashcard 33

Q: What are Data Access Connectors

A: Tools and frameworks used for collecting data from various sources.


Flashcard 34

Q: Where is data stored in the Big Data Stack? 

A: In distributed file systems and NoSQL databases.


Flashcard 35

Q: What is Batch Analytics

A: Analyzing data in large chunks or batches, typically not in real-time.



Flashcard 36

Q: What are examples of Real-Time Analytics tools? 

A: Apache Storm and Spark Streaming.


Flashcard 37

Q: What does Interactive Querying involve? 

A: Using SQL-like languages to perform queries on data interactively.


Flashcard 38

Q: What are Serving Database, Web & Visualization Frameworks used for? 

A: To present and visualize the analyzed data for end-users.


Flashcard 39

Q: Describe the Alpha Pattern in Analytic Patterns. 

A: Batch Analysis used to ingest large amounts of data.


Flashcard 40

Q: What is the Beta Pattern in Analytic Patterns? 

A: Real-Time Analysis focused on ingesting streaming data.





Flashcard 41

Q: Explain the Gamma Pattern in Analytic Patterns. 

A: Combines Batch and Real-Time Analysis by ingesting streaming data into the big data stack.


Flashcard 42

Q: What is the Delta Pattern in Analytic Patterns? 

A: Interactive Querying using source-sink connectors or SQL connectors, followed by using SQL-like languages.


Flashcard 43

Q: What are Visualizations in the Analytic Flow? 

A: Tools used to present data visually, which can be static, dynamic, or interactive.


Flashcard 44

Q: Why is Data Cleansing important in Big Data Analytics? 

A: To remove noise and ensure data quality, which is essential for extracting accurate insights.


Flashcard 45

Q: What types of data does Big Data encompass? 

A: Structured, unstructured, and semi-structured data.





Flashcard 46

Q: How does Big Data Analytics handle the Variety of data? 

A: By using specialized tools and frameworks that can process different data formats like text, images, audio, video, and sensor data.


Flashcard 47

Q: What role do NoSQL databases play in the Big Data Stack? 

A: They provide scalable and flexible storage solutions for large and diverse datasets.


Flashcard 48

Q: What is the primary goal of any Big Data Analytics system? 

A: To extract value from the data by uncovering meaningful insights and supporting decision-making.


Flashcard 49

Q: How does Real-Time Analytics differ from Batch Analytics

A: Real-Time Analytics processes data as it arrives, enabling immediate insights, whereas Batch Analytics processes data in large groups at scheduled intervals.


Flashcard 50

Q: What are Reports and Alerts used for in Descriptive Analytics? 

A: To present summarized data and notify users of important events or thresholds.

robot