chapter 8
Q: What is Analytics?
A: Processes, technologies, frameworks, and algorithms used to extract meaningful, actionable insights from data.
Q: What are the Seven Giants in the context of big data analytics?
A: Basic Statistics, Generalized N-Body Problems, Linear Algebraic Computations, Graph-Theoretic Computation, Optimization, Integration, Alignment Problems
Q: Name the Basic Statistics used in analytics.
A: Mean, Median, Variance, Counts, Top-N, Distinct.
Q: What is Descriptive Analytics?
A: Analyzes past data and presents it in a summarized form to answer "What happened?"
Q: What types of analytics are Diagnostic, Predictive, and Prescriptive?
A: They represent different levels of data analysis:
Diagnostic: Understand why something happened.
Predictive: Forecast what is likely to happen.
Prescriptive: Determine what actions to take to achieve desired outcomes.
Q: What are Generalized N-Body Problems in analytics?
A: Computational tasks involving calculations of distances, kernels, similarity between pairs of points, nearest neighbors, clustering, and kernel SVM.
Q: What are Linear Algebraic Computations used for in analytics?
A: Performing operations like Linear Algebra, Linear Regression, and Principal Component Analysis (PCA) for Descriptive, Diagnostic, and Predictive Analytics.
Q: Define Graph-Theoretic Computations in analytics.
A: Computations involving graph search, betweenness, centrality, commute distance, shortest path, and minimum spanning tree for Diagnostic, Predictive, and Prescriptive Analytics.
Q: What is Optimization in the context of analytics?
A: Techniques like Minimization, Maximization, Linear Programming, Quadratic Programming, and Gradient Descent used in Prescriptive Analytics.
Q: What does Integration refer to in analytics?
A: Methods such as Bayesian Inference, Expectations, and Markov Chain Monte Carlo used for Predictive and Prescriptive Analytics.
Q: What are Alignment Problems in analytics?
A: Tasks like matching between data sets (text, images, sequences) and Hidden Markov Models used for Predictive and Prescriptive Analytics.
Q: What are the Types of Analytics?
A: Descriptive, Diagnostic, Predictive, and Prescriptive Analytics.
Q: What does Descriptive Analytics aim to answer?
A: "What happened?" by analyzing past data and presenting it in a summarized form using reports and alerts.
Q: How does Diagnostic Analytics differ from Descriptive Analytics?
A: It seeks to understand "Why did it happen?" by analyzing past data through queries and data mining.
Q: What question does Predictive Analytics aim to answer?
A: "What is likely to happen?" by predicting future events based on patterns and trained models using forecasts and simulations.
Q: What is the goal of Prescriptive Analytics?
A: To determine "What can we do to make it happen?" by using predictive analyses to figure out the best course of action through planning and optimization.
Q: How is Big Data defined?
A: Collections of datasets so large that they are difficult to manage, process, and analyze using traditional means.
Q: According to IBM, how much data is created every day?
A: 2.5 quintillion bytes of data.
Q: What is Big Data Analytics?
A: The collection, storage, processing, and analysis of massive-scale data, involving steps like data cleansing, munging, processing, and visualization.
Q: Why are special tools needed for Big Data Analytics?
A: Because of the volume, velocity, and variety of data, making it difficult to store, process, and analyze on a single machine.
Q: List some Big Data Examples.
A:
Data from social networks (text, images, audio, video)
Click-stream data from web applications
Machine sensor data from industrial and energy systems
Healthcare data from electronic health records (EHR)
Logs from web applications
Stock market data
Transactional data from banking and financial applications
Q: What are the Five V's of Big Data?
A: Volume, Velocity, Variety, Veracity, Value.
Q: Define Volume in Big Data characteristics.
A: The large size of data that requires specialized tools and frameworks for storage, processing, and analysis.
Q: What does Velocity refer to in Big Data?
A: The speed at which data is generated and needs to be processed, often in real-time.
Q: Explain Variety in Big Data.
A: The different forms of data, including structured, unstructured, and semi-structured data like text, images, audio, video, and sensor data.
Q: What is Veracity in the context of Big Data?
A: The accuracy and trustworthiness of the data, which often requires cleaning to remove noise and ensure quality.
Q: What does Value signify in Big Data characteristics?
A: The usefulness of data for its intended purpose, aiming to extract meaningful insights and benefits.
Q: What are the main steps in the Analytic Flow for Big Data? A:
Data Collection
Data Preparation
Analysis Types
Analysis Modes
Visualizations
Q: What activities are involved in Data Preparation?
A: Cleaning data by fixing corrupt records, handling missing values, removing duplicates, standardizing abbreviations and units, correcting typos, spellings, and formatting.
Q: What are the Analysis Modes in Big Data Analytics?
A: Batch, real-time, or interactive analysis.
Q: What is the Big Data Stack?
A: A layered framework consisting of Raw Data Sources, Data Access Connectors, Data Storage, Batch Analytics, Real-Time Analytics, Interactive Querying, and Serving Database/Web & Visualization Frameworks.
Q: What are Raw Data Sources in the Big Data Stack?
A: The origins from where data is captured.
Q: What are Data Access Connectors?
A: Tools and frameworks used for collecting data from various sources.
Q: Where is data stored in the Big Data Stack?
A: In distributed file systems and NoSQL databases.
Q: What is Batch Analytics?
A: Analyzing data in large chunks or batches, typically not in real-time.
Q: What are examples of Real-Time Analytics tools?
A: Apache Storm and Spark Streaming.
Q: What does Interactive Querying involve?
A: Using SQL-like languages to perform queries on data interactively.
Q: What are Serving Database, Web & Visualization Frameworks used for?
A: To present and visualize the analyzed data for end-users.
Q: Describe the Alpha Pattern in Analytic Patterns.
A: Batch Analysis used to ingest large amounts of data.
Q: What is the Beta Pattern in Analytic Patterns?
A: Real-Time Analysis focused on ingesting streaming data.
Q: Explain the Gamma Pattern in Analytic Patterns.
A: Combines Batch and Real-Time Analysis by ingesting streaming data into the big data stack.
Q: What is the Delta Pattern in Analytic Patterns?
A: Interactive Querying using source-sink connectors or SQL connectors, followed by using SQL-like languages.
Q: What are Visualizations in the Analytic Flow?
A: Tools used to present data visually, which can be static, dynamic, or interactive.
Q: Why is Data Cleansing important in Big Data Analytics?
A: To remove noise and ensure data quality, which is essential for extracting accurate insights.
Q: What types of data does Big Data encompass?
A: Structured, unstructured, and semi-structured data.
Q: How does Big Data Analytics handle the Variety of data?
A: By using specialized tools and frameworks that can process different data formats like text, images, audio, video, and sensor data.
Q: What role do NoSQL databases play in the Big Data Stack?
A: They provide scalable and flexible storage solutions for large and diverse datasets.
Q: What is the primary goal of any Big Data Analytics system?
A: To extract value from the data by uncovering meaningful insights and supporting decision-making.
Q: How does Real-Time Analytics differ from Batch Analytics?
A: Real-Time Analytics processes data as it arrives, enabling immediate insights, whereas Batch Analytics processes data in large groups at scheduled intervals.
Q: What are Reports and Alerts used for in Descriptive Analytics?
A: To present summarized data and notify users of important events or thresholds.
Q: What is Analytics?
A: Processes, technologies, frameworks, and algorithms used to extract meaningful, actionable insights from data.
Q: What are the Seven Giants in the context of big data analytics?
A: Basic Statistics, Generalized N-Body Problems, Linear Algebraic Computations, Graph-Theoretic Computation, Optimization, Integration, Alignment Problems
Q: Name the Basic Statistics used in analytics.
A: Mean, Median, Variance, Counts, Top-N, Distinct.
Q: What is Descriptive Analytics?
A: Analyzes past data and presents it in a summarized form to answer "What happened?"
Q: What types of analytics are Diagnostic, Predictive, and Prescriptive?
A: They represent different levels of data analysis:
Diagnostic: Understand why something happened.
Predictive: Forecast what is likely to happen.
Prescriptive: Determine what actions to take to achieve desired outcomes.
Q: What are Generalized N-Body Problems in analytics?
A: Computational tasks involving calculations of distances, kernels, similarity between pairs of points, nearest neighbors, clustering, and kernel SVM.
Q: What are Linear Algebraic Computations used for in analytics?
A: Performing operations like Linear Algebra, Linear Regression, and Principal Component Analysis (PCA) for Descriptive, Diagnostic, and Predictive Analytics.
Q: Define Graph-Theoretic Computations in analytics.
A: Computations involving graph search, betweenness, centrality, commute distance, shortest path, and minimum spanning tree for Diagnostic, Predictive, and Prescriptive Analytics.
Q: What is Optimization in the context of analytics?
A: Techniques like Minimization, Maximization, Linear Programming, Quadratic Programming, and Gradient Descent used in Prescriptive Analytics.
Q: What does Integration refer to in analytics?
A: Methods such as Bayesian Inference, Expectations, and Markov Chain Monte Carlo used for Predictive and Prescriptive Analytics.
Q: What are Alignment Problems in analytics?
A: Tasks like matching between data sets (text, images, sequences) and Hidden Markov Models used for Predictive and Prescriptive Analytics.
Q: What are the Types of Analytics?
A: Descriptive, Diagnostic, Predictive, and Prescriptive Analytics.
Q: What does Descriptive Analytics aim to answer?
A: "What happened?" by analyzing past data and presenting it in a summarized form using reports and alerts.
Q: How does Diagnostic Analytics differ from Descriptive Analytics?
A: It seeks to understand "Why did it happen?" by analyzing past data through queries and data mining.
Q: What question does Predictive Analytics aim to answer?
A: "What is likely to happen?" by predicting future events based on patterns and trained models using forecasts and simulations.
Q: What is the goal of Prescriptive Analytics?
A: To determine "What can we do to make it happen?" by using predictive analyses to figure out the best course of action through planning and optimization.
Q: How is Big Data defined?
A: Collections of datasets so large that they are difficult to manage, process, and analyze using traditional means.
Q: According to IBM, how much data is created every day?
A: 2.5 quintillion bytes of data.
Q: What is Big Data Analytics?
A: The collection, storage, processing, and analysis of massive-scale data, involving steps like data cleansing, munging, processing, and visualization.
Q: Why are special tools needed for Big Data Analytics?
A: Because of the volume, velocity, and variety of data, making it difficult to store, process, and analyze on a single machine.
Q: List some Big Data Examples.
A:
Data from social networks (text, images, audio, video)
Click-stream data from web applications
Machine sensor data from industrial and energy systems
Healthcare data from electronic health records (EHR)
Logs from web applications
Stock market data
Transactional data from banking and financial applications
Q: What are the Five V's of Big Data?
A: Volume, Velocity, Variety, Veracity, Value.
Q: Define Volume in Big Data characteristics.
A: The large size of data that requires specialized tools and frameworks for storage, processing, and analysis.
Q: What does Velocity refer to in Big Data?
A: The speed at which data is generated and needs to be processed, often in real-time.
Q: Explain Variety in Big Data.
A: The different forms of data, including structured, unstructured, and semi-structured data like text, images, audio, video, and sensor data.
Q: What is Veracity in the context of Big Data?
A: The accuracy and trustworthiness of the data, which often requires cleaning to remove noise and ensure quality.
Q: What does Value signify in Big Data characteristics?
A: The usefulness of data for its intended purpose, aiming to extract meaningful insights and benefits.
Q: What are the main steps in the Analytic Flow for Big Data? A:
Data Collection
Data Preparation
Analysis Types
Analysis Modes
Visualizations
Q: What activities are involved in Data Preparation?
A: Cleaning data by fixing corrupt records, handling missing values, removing duplicates, standardizing abbreviations and units, correcting typos, spellings, and formatting.
Q: What are the Analysis Modes in Big Data Analytics?
A: Batch, real-time, or interactive analysis.
Q: What is the Big Data Stack?
A: A layered framework consisting of Raw Data Sources, Data Access Connectors, Data Storage, Batch Analytics, Real-Time Analytics, Interactive Querying, and Serving Database/Web & Visualization Frameworks.
Q: What are Raw Data Sources in the Big Data Stack?
A: The origins from where data is captured.
Q: What are Data Access Connectors?
A: Tools and frameworks used for collecting data from various sources.
Q: Where is data stored in the Big Data Stack?
A: In distributed file systems and NoSQL databases.
Q: What is Batch Analytics?
A: Analyzing data in large chunks or batches, typically not in real-time.
Q: What are examples of Real-Time Analytics tools?
A: Apache Storm and Spark Streaming.
Q: What does Interactive Querying involve?
A: Using SQL-like languages to perform queries on data interactively.
Q: What are Serving Database, Web & Visualization Frameworks used for?
A: To present and visualize the analyzed data for end-users.
Q: Describe the Alpha Pattern in Analytic Patterns.
A: Batch Analysis used to ingest large amounts of data.
Q: What is the Beta Pattern in Analytic Patterns?
A: Real-Time Analysis focused on ingesting streaming data.
Q: Explain the Gamma Pattern in Analytic Patterns.
A: Combines Batch and Real-Time Analysis by ingesting streaming data into the big data stack.
Q: What is the Delta Pattern in Analytic Patterns?
A: Interactive Querying using source-sink connectors or SQL connectors, followed by using SQL-like languages.
Q: What are Visualizations in the Analytic Flow?
A: Tools used to present data visually, which can be static, dynamic, or interactive.
Q: Why is Data Cleansing important in Big Data Analytics?
A: To remove noise and ensure data quality, which is essential for extracting accurate insights.
Q: What types of data does Big Data encompass?
A: Structured, unstructured, and semi-structured data.
Q: How does Big Data Analytics handle the Variety of data?
A: By using specialized tools and frameworks that can process different data formats like text, images, audio, video, and sensor data.
Q: What role do NoSQL databases play in the Big Data Stack?
A: They provide scalable and flexible storage solutions for large and diverse datasets.
Q: What is the primary goal of any Big Data Analytics system?
A: To extract value from the data by uncovering meaningful insights and supporting decision-making.
Q: How does Real-Time Analytics differ from Batch Analytics?
A: Real-Time Analytics processes data as it arrives, enabling immediate insights, whereas Batch Analytics processes data in large groups at scheduled intervals.
Q: What are Reports and Alerts used for in Descriptive Analytics?
A: To present summarized data and notify users of important events or thresholds.