Stats And Analysis Finals Reviewer (copy)

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/99

flashcard set

Earn XP

Description and Tags

SA2, SA3

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

100 Terms

1
New cards

Analyze data types: A column contains numeric values with some missing values (NaN). What data type will Pandas default to?

Float64
string

Int64

object


Float64

2
New cards

Synthesize the relationship between matplotlib and seaborn. Which statement MOST accurately describes their connection?

Seaborn replaces matplotlib entirely in modern data science
Seaborn is a low-level library that matplotlib builds upon

Seaborn is a higher-level API based on matplotlib with better default settings

Matplotlib and Seaborn are completely independent libraries

Seaborn is a higher-level API based on matplotlib with better default settings

3
New cards

Evaluate which library is BEST suited for machine learning tasks including classification, regression, clustering, and model validation:

A. Pandas

D. SciPy

B. Scikit-learn

C. Matplotlib

Scikit-learn

4
New cards

Analyze the query optimization: PostgreSQL's query optimizer is described as "more sophisticated" for analytical queries. What does this mean in practice?


It requires manual optimization for every query

It handles complex joins and subqueries more efficiently than simpler optimizers

It makes all queries slower

It only works with small datasets


It handles complex joins and subqueries more efficiently than simpler optimizers

5
New cards

Evaluate materialized views: What is their primary advantage for analytical workloads?

C. They slow down all queries

A. They increase storage requirements only

D. They eliminate the need for base tables

B. They provide pre-computed results and caching, improving query performance for complex aggregations

They provide pre-computed results and caching, improving query performance for complex aggregations.

6
New cards

Synthesize advanced aggregation: How does the FILTER clause enhance aggregation capabilities?

C. It removes all NULL values

D. It only works with text data

A. It slows down queries

B. It enables multiple conditional aggregations in a single query for different segments simultaneously

It enables multiple conditional aggregations in a single query for different segments simultaneously

7
New cards

Apply your understanding: If you need to work with semi-structured data in a relational database, which PostgreSQL feature would be MOST useful?

It eliminates the need for programming languages

It makes the database larger

It allows seamless integration with Python, R, and machine learning libraries

It only works with cloud platforms


It allows seamless integration with Python, R, and machine learning libraries

8
New cards

Apply your understanding: If you need to create spiders/bots to scan websites and collect structured data, which library would you use?

C. Scrapy

D. BeautifulSoup

A. Gensim

B. NLTK

Scrapy

9
New cards

Evaluate the scenario: A team needs to build a deep learning model for image recognition with GPU acceleration and automatic gradient calculation. Which framework would be MOST suitable?

A. Scikit-learn

B. Pandas

D. NLTK

C. PyTorch

PyTorch

10
New cards

Evaluate the ACID compliance: Why is PostgreSQL's full ACID compliance across all storage engines important for analytical databases?

It reduces storage space

It makes queries run faster

 It ensures data integrity and reliable transactions, which is crucial for accurate analysis

.It eliminates the need for backups

 It ensures data integrity and reliable transactions, which is crucial for accurate analysis

11
New cards

Evaluate regression functions: If you need to fit a linear model and calculate the R-squared coefficient directly in SQL, which function would you use?

A. AVG()

B. CORR()

D. SUM()

C. REGR_R2()

C. REGR_R2()

12
New cards

Compare the primary purposes of Pandas and NumPy. Which statement BEST differentiates their core functionalities?

B. Pandas is for visualization while NumPy is for machine learning

C. Pandas handles only numerical data while NumPy handles text data

D. Pandas is slower than NumPy in all operations

A. Pandas focuses on tabular data manipulation while NumPy focuses on array and matrix operations

A. Pandas focuses on tabular data manipulation while NumPy focuses on array and matrix operations

13
New cards

Analyze query optimization: What is the primary purpose of the EXPLAIN ANALYZE command?

C. To create new indexes automatically

B. To reveal execution plans and actual performance metrics for query optimization

A. To delete slow queries

D. To backup the database

B. To reveal execution plans and actual performance metrics for query optimization

14
New cards

Apply your understanding: If you need to measure the relationship between two variables to identify patterns, which function is MOST appropriate?

D. CONCAT()

B. CORR() for correlation coefficient

A. COUNT()

C. MAX()

B. CORR() for correlation coefficient

15
New cards

Compare PostgreSQL and MySQL regarding data types. Which statement is MOST accurate?

Both have identical data type support

PostgreSQL provides advanced data types including arrays, hstore, and JSONB

PostgreSQL only supports basic integer and string types

.MySQL has more advanced data types than PostgreSQL

PostgreSQL provides advanced data types including arrays, hstore, and JSONB

16
New cards

Analyze statistical functions: Which PostgreSQL function calculates the 50th percentile (median) of a dataset?

A. AVG()

D. STDDEV()

C. MODE()

B. MEDIAN() or PERCENTILE_CONT(0.5)

B. MEDIAN() or PERCENTILE_CONT(0.5)

17
New cards

Apply your knowledge: If you need to read an Excel file named "sales.xlsx" from Sheet2 with missing values represented as "N/A", which command is MOST appropriate?

Correct answer:

B. pd.read_excel('sales.xlsx', sheet_name='Sheet2', na_values=['N/A'])

C. pd.read_csv('sales.xlsx', sheet='Sheet2')

A. pd.read_excel('sales.xlsx', sheet_name='Sheet1', na_values=['N/A'])

D. pd.read_stata('sales.xlsx')

B. pd.read_excel('sales.xlsx', sheet_name='Sheet2', na_values=['N/A'])

18
New cards

Evaluate which library combination would be MOST effective for a project requiring data manipulation, statistical analysis, and linear algebra operations:

D. TensorFlow and Keras

B. NumPy and SciPy

A. Pandas only

C. Matplotlib and Seaborn

B. NumPy and SciPy

19
New cards

Synthesize integration concepts: Why is PostgreSQL's integration with Python (psycopg2, SQLAlchemy, Pandas) valuable for data science workflows?

B. It minimizes data movement, allows prototyping in SQL, and enables seamless scaling without architectural changes

C. It makes PostgreSQL slower

A. It eliminates the need for SQL knowledge

D. It only works for small datasets

B. It minimizes data movement, allows prototyping in SQL, and enables seamless scaling without architectural changes

20
New cards

Compare the memory allocation: What does the "64" in Int64 and Float64 data types represent?

C. The memory (in bits) allocated to hold the character

B. The number of decimal places

D. The version number of Pandas

A. The maximum value that can be stored

C. The memory (in bits) allocated to hold the character

21
New cards

Analyze the use case: A data scientist needs to perform natural language processing tasks including tokenization, tagging, and information extraction. Which library is MOST appropriate?

D. Matplotlib

C. NumPy

A. Scrapy

B. NLTK

B. NLTK

22
New cards

Compare PERCENTILE_CONT() and PERCENTILE_DISC(). Which statement is MOST accurate?

D. PERCENTILE_DISC() only works with text data

A. They perform identical operations

C. PERCENTILE_CONT() is always faster

B. PERCENTILE_CONT() provides continuous percentiles while PERCENTILE_DISC() provides discrete percentiles

B. PERCENTILE_CONT() provides continuous percentiles while PERCENTILE_DISC() provides discrete percentiles

23
New cards

Apply your knowledge: If you need to maintain accurate query execution plans, which PostgreSQL command should you run regularly?

A. DELETE

C. DROP

D. TRUNCATE

B. ANALYZE to update table statistics

B. ANALYZE to update table statistics

24
New cards

Apply your knowledge of visualization libraries: If you need to create interactive web-based visualizations with JavaScript widgets, which library should you choose?

B. Seaborn

A. Matplotlib

C. Bokeh

D. Plotly

C. Bokeh

25
New cards

Compare TensorFlow and Keras in terms of their relationship and functionality. Which statement is MOST accurate?

B. Keras is a high-level library that can run on top of TensorFlow, simplifying neural network tasks

A. TensorFlow is a high-level library running on top of Keras

C. TensorFlow and Keras are competing frameworks with no relationship

D. Keras is only used for data visualization

B. Keras is a high-level library that can run on top of TensorFlow, simplifying neural network tasks

26
New cards

Compare index types: Which index type would be MOST efficient for equality comparisons and range queries on ordered data?

D. No index

C. GIN index

A. Hash index

B. B-tree index

B. B-tree index

27
New cards

Evaluate the purpose of the head() function in Pandas. Which statement BEST describes its primary use?

To delete the first rows of a DataFrame

To view the first few records of a DataFrame for initial data exploration

To sort the DataFrame in ascending order

To view the last few records of a DataFrame

To view the first few records of a DataFrame for initial data exploration

28
New cards

Analyze the following scenario: You have a dataset with mixed data types (numbers and strings) in a column. What Pandas data type will be assigned to this column?

D. datetime64

A. Int64

B. Float64

C. object

C. object

29
New cards

Analyze the characteristics: What makes PostgreSQL particularly suitable for data analysis tasks compared to MySQL?

D. PostgreSQL only works with Python

C. PostgreSQL is easier to install

B. PostgreSQL offers advanced features like window functions, CTEs, and better query optimization for analytical workloads

A. PostgreSQL is faster for all web applications

B. PostgreSQL offers advanced features like window functions, CTEs, and better query optimization for analytical workloads

30
New cards

Analyze the scenario: A data scientist needs to perform complex linear algebra operations on large multidimensional arrays with high performance. Which Python library would be MOST appropriate and why?

B. NumPy, because it is designed for multidimensional array operations with high-level mathematical functions

D. Scikit-learn, because it provides machine learning algorithms

C. Matplotlib, because it can visualize mathematical operations

A. Pandas, because it handles tabular data efficiently

B. NumPy, because it is designed for multidimensional array operations with high-level mathematical functions

31
New cards

Which type of analytics examines historical data to understand past performance?

 

Descriptive Analytics

Predictive Analytics

Prescriptive Analytics

Diagnostic Analytics

Descriptive Analytics

32
New cards

What is the primary question that Prescriptive Analytics aims to answer?

 

What should we do?

What will happen?

Why did it happen?

What happened?

What should we do?

33
New cards

If a company uses an algorithm to recommend the optimal pricing strategy for its products, which type of analytics is it primarily employing?

 

Prescriptive Analytics

Descriptive Analytics

Diagnostic Analytics

Predictive Analytics

Prescriptive Analytics

34
New cards

In the context of the evolution of analytics, why is Prescriptive Analytics considered the pinnacle of data-driven decision making?

 

Because it recommends actions to optimize outcomes, building upon previous stages.

Because it focuses solely on historical data.

Because it identifies causes and relationships in data.

Because it forecasts future outcomes based on patterns.

Because it recommends actions to optimize outcomes, building upon previous stages.

35
New cards

Which of the following is NOT listed as a key technology or technique leveraged by prescriptive analytics?

 

Statistical Process Control

Simulation Modeling

Optimization Algorithms

Machine Learning & AI

Statistical Process Control

36
New cards

What is the role of 'Predictive Modeling' in the process of how Prescriptive Analytics works?

 

To create models to forecast potential future outcomes

To gather relevant historical and real-time data.

To evaluate multiple possible scenarios and their implications.

To deliver actionable insights to decision-makers.

To create models to forecast potential future outcomes

37
New cards

A hospital uses prescriptive analytics to optimize patient treatment plans and resource allocation. In which business application area does this fall?

 

Healthcare

Supply Chain

Financial Services

Manufacturing

Healthcare

38
New cards

How does prescriptive analytics enhance decision-making within an organization?

 

By simplifying complex decisions and improving decision quality.

By increasing decision complexity.

By relying solely on intuition-based decisions.

By decreasing decision speed

By simplifying complex decisions and improving decision quality.

39
New cards

Which of the following best describes a solution to the 'Skill Gaps' challenge in implementing prescriptive analytics?

 

Investing in training programs and cross-functional teams.

Implementing robust data governance and ETL processes.

Starting with high-impact use cases and demonstrating ROI.

Beginning with pilot projects to evaluate technology fit.

Investing in training programs and cross-functional teams.

40
New cards

What is a future trend in prescriptive analytics where AI-powered tools make advanced analytics accessible to non-technical business users?

 

Augmented Analytics

Autonomous Decision Systems

Edge Computing

Explainable AI

Augmented Analytics

41
New cards

What is the core question that Predictive Analytics seeks to answer?

 

What does the data say will happen?

What happened?

What happened?

What should we do?

What does the data say will happen?

42
New cards

In the Predictive Analytics Process, what is the purpose of the 'Data Sampling' phase?

 

To extract, clean, and transform data.

To define the modeling objective.

To apply different modeling techniques.

To review model performance.

To extract, clean, and transform data.

43
New cards

A data scientist is building a model to forecast next month's sales based on advertising spending. Which type of analytics is primarily being used?

 

Predictive Analytics

Descriptive Analytics

Diagnostic Analytics

Prescriptive Analytics

 

Predictive Analytics

44
New cards

How does the 'Data Exploration' phase contribute to the overall Predictive Analytics Process?

 

It helps identify trends, anomalies, and data dependencies before modeling.

It defines the project scope and objectives

It focuses on selecting the final modeling methodology.

It involves the final validation of the model's performance.

It helps identify trends, anomalies, and data dependencies before modeling.

45
New cards

 In a simple linear regression equation  y = α + βx + ε what does y  represent?

 

 

Dependent variable

Independent variable

Error term

Alpha intercept

Dependent variable

46
New cards

What is the significance of the beta coefficient (β) in a simple linear regression model?

 

It indicates the rate at which the dependent variable changes for a unit change in the independent variable.

 

It represents the baseline figure for the dependent variable.

 

It accounts for unexplained variability.

 

It is the balancing figure in the equation.

It indicates the rate at which the dependent variable changes for a unit change in the independent variable.

47
New cards

If a regression model has an R² value of 0.85, what does this imply about the model?

85% of the variation in the independent variable is explained by the dependent variable.

The model is 85% accurate in its predictions.

85% of the variability in the response is explained by the independent variables.

The model has an 85% error rate.

85% of the variation in the independent variable is explained by the dependent variable.

48
New cards

Why is an error term (ε) included in linear regression models?

 

To account for unexplained variability and measurement error.

To ensure the model is perfectly accurate.

To account for unexplained variability and measurement error.

To make the model more complex.

To account for unexplained variability and measurement error.

49
New cards

When assessing the goodness-of-fit of a regression model, why is a higher R² generally preferred?

 

It means the model explains more of the variation in the response variable.

It indicates that the model is simpler.

It suggests that the model has fewer independent variables.

It implies that the model is less prone to overfitting.

It means the model explains more of the variation in the response variable.

50
New cards

What is the primary purpose of Analysis of Variance (ANOVA) in the context of regression analysis?

 

To decompose the total variation in the response into explained and unexplained parts.

To determine the significance of individual predictors.

To calculate the mean absolute error of the model.

To identify trends in the data.

To decompose the total variation in the response into explained and unexplained parts.

51
New cards

If a t-test for a predictor variable yields a p-value of 0.03, what can be concluded?

 

The predictor is significant.

The predictor is insignificant.

The model is not a good fit.

The dependent variable is not well-explained.

The predictor is significant.

52
New cards

If a t-test for a predictor variable yields a p-value of 0.03, what can be concluded?

 

The predictor is significant.

The predictor is insignificant.

The model is not a good fit

The dependent variable is not well-explained.

The predictor is significant.

53
New cards

How does the concept of 'explained variability' in ANOVA relate to the independent variables in a regression model?

 

It is the amount of variation in the response variable that can be attributed to the predictors explicitly stated in the model.

It represents the variation in the response due to random error.

It measures the overall accuracy of the model.

It indicates the correlation between the dependent and independent variables.

It is the amount of variation in the response variable that can be attributed to the predictors explicitly stated in the model.

54
New cards

Why is it important to investigate a model's goodness-of-fit and predictor significance before using it for prediction purposes?

 

To confirm the model's reliability and validity for making accurate forecasts.

To ensure the model is overly complex.

To reduce the number of independent variables.

To increase the model's computational efficiency.

To confirm the model's reliability and validity for making accurate forecasts.

55
New cards

Which of the following is a forecast accuracy measure?

 

Mean Absolute Error (MAE)

R-squared

ANOVA

t-test

Mean Absolute Error (MAE)

56
New cards

 What does the alpha intercept (α) represent in a simple linear regression equation?

 

The baseline figure for the dependent variable when the independent variable is zero.

The rate of change of the dependent variable.

The unexplained variability in the model.

The value driving the prediction.

The baseline figure for the dependent variable when the independent variable is zero.

57
New cards

A company wants to predict customer churn based on their usage patterns. Which analytical technique would be most appropriate for this task?

 

Predictive Analytics

Descriptive Analytics

Diagnostic Analytics

Prescriptive Analytics

Predictive Analytics

58
New cards

How does Multiple Linear Regression differ from Simple Linear Regression?

 

Multiple Linear Regression allows for more than one predictor variable.

Multiple Linear Regression only uses one independent variable.

Simple Linear Regression can have more than one predictor variable.

They are essentially the same, just different names.

Multiple Linear Regression allows for more than one predictor variable.

59
New cards

Why is it crucial to perform data cleaning and transformation during the Data Sampling phase of the Predictive Analytics Process?

 

To ensure the data is in a suitable format and quality for modeling.

To reduce the number of variables in the dataset.

To directly generate predictions without further analysis.

To identify external data sources.

To ensure the data is in a suitable format and quality for modeling.

60
New cards

 Which of the following is NOT a forecast accuracy measure mentioned in the slides?

 

Standard Deviation

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Standard Deviation

61
New cards

What is the primary goal of the 'Optimization' step in the Prescriptive Analytics process?

 

To determine the best course of action among alternatives.

To collect and integrate data.

To forecast potential future outcomes.

To deliver actionable insights.

To determine the best course of action among alternatives.

62
New cards

A manufacturing company uses sensors to collect real-time data from its machinery to predict potential equipment failures. This is an example of which type of analytics in action?

 

Predictive Analytics

Descriptive Analytics

Diagnostic Analytics

Prescriptive Analytics

Predictive Analytics

63
New cards

How does 'Simulation Modeling' contribute to Prescriptive Analytics?

 

By creating virtual scenarios to test different strategies and understand potential outcomes.

By learning from data patterns to make predictions.

By finding the best solution among alternatives considering constraints.

By processing data as it's created to provide immediate insights.

By creating virtual scenarios to test different strategies and understand potential outcomes.

64
New cards

Why is 'Data Integration' considered a significant challenge in implementing Prescriptive Analytics?

 

Because it involves combining data from disparate sources with varying formats and quality.

Because it requires advanced machine learning algorithms.

 Because it focuses on forecasting future outcomes.

Because it only applies to real-time data.

Because it involves combining data from disparate sources with varying formats and quality.

65
New cards

Which future trend in prescriptive analytics involves systems that not only recommend actions but can implement them with minimal human intervention?

 

Autonomous Decision Systems

Augmented Analytics

Edge Computing

Explainable AI

 

Autonomous Decision Systems

66
New cards

What is the primary function of the 'Project Design' phase in the Predictive Analytics Process?

 

To define the modeling objective and acceptance criteria.

To perform exploratory data analysis.

To select the final modeling methodology.

To review the model's performance.

To define the modeling objective and acceptance criteria.

67
New cards

A financial institution uses predictive analytics to identify potential fraudulent transactions. Which business application area does this fall under?

 

Financial Services

Healthcare

Supply Chain

Energy & Utilities

Financial Services

68
New cards

How does the 'Validation' phase ensure the quality of a predictive model?

 

By selecting the best model and reviewing its performance against acceptance criteria.

By focusing on data extraction and cleaning.

By defining the project's scope.

By identifying data dependencies and correlations.

By selecting the best model and reviewing its performance against acceptance criteria.

69
New cards

Why is it important to consider 'Organizational Change' as a challenge when implementing prescriptive analytics?

 

Because it involves shifting from an intuition-based to a data-driven decision-making culture.

Because it primarily deals with technology selection.

Because it focuses on finding talent with expertise.

Because it is only relevant for small organizations.

Because it involves shifting from an intuition-based to a data-driven decision-making culture.

70
New cards

What does NLP stand for in the context of key technologies and techniques for prescriptive analytics?

 

Natural Language Processing

Natural Language Programming

Neural Linguistic Processing

New Learning Paradigms

 

Natural Language Processing

71
New cards

What is the main difference between Descriptive Analytics and Diagnostic Analytics?

 

Descriptive Analytics tells what happened, while Diagnostic Analytics explains why it happened.

Descriptive Analytics predicts future outcomes, while Diagnostic Analytics recommends actions.

Descriptive Analytics focuses on real-time data, while Diagnostic Analytics uses historical data.

Descriptive Analytics focuses on real-time data, while Diagnostic Analytics uses historical data.

Descriptive Analytics tells what happened, while Diagnostic Analytics explains why it happened.

72
New cards

A retail company analyzes past sales data to identify which products were most popular during the holiday season. This is an example of which type of analytics?

 

Descriptive Analytics

Predictive Analytics

Prescriptive Analytics

Diagnostic Analytics

 

Descriptive Analytics

73
New cards

How does the iterative nature of the Prescriptive Analytics process contribute to its effectiveness?

 

It continuously improves as more data becomes available and outcomes are measured.

It allows for a one-time implementation without further adjustments.

It ensures that the process is rigid and unchanging.

It reduces the need for data collection.cc

It continuously improves as more data becomes available and outcomes are measured.

74
New cards

When considering the implementation challenges of prescriptive analytics, why is it crucial to address the issue of 'Technology Selection'?

 

Because choosing the right tools and platforms is essential for successful implementation and achieving specific business needs.

Because all technologies are equally effective for all business needs.

Because it only impacts the initial setup cost.

Because it is a one-time decision that does not require re-evaluation.

Because choosing the right tools and platforms is essential for successful implementation and achieving specific business needs.

75
New cards

Which forecast accuracy measure calculates the average of the absolute differences between predicted and actual values?

 

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Mean Absolute Percentage Error (MAPE)

Mean Absolute Error (MAE)

76
New cards

What is the significance of the p-value  in a t-test when assessing the significance of an individual predictor?

 

A p-value less than 0.05 typically suggests the predictor is significant.

It indicates the strength of the relationship between variables.

It represents the error margin of the prediction.

It determines the type of regression model to use.

A p-value less than 0.05 typically suggests the predictor is significant.

77
New cards

If a business wants to optimize its inventory levels by recommending specific reorder points based on demand forecasts and supply chain constraints, which type of analytics is most directly involved?

 

Prescriptive Analytics

Predictive Analytics

Diagnostic Analytics

Descriptive Analytics

Prescriptive Analytics

78
New cards

How does the concept of 'unexplained variability' in ANOVA differ from 'explained variability'?

 

Unexplained variability is attributed to predictors, while explained variability is due to random error.

Unexplained variability is due to random error, while explained variability is attributed to predictors.

Both refer to the same concept, just different terminology.

Unexplained variability is only present in simple linear regression, while explained variability is in multiple linear regression.

Unexplained variability is attributed to predictors, while explained variability is due to random error.

79
New cards

Why is it beneficial for organizations to move towards a data-driven decision-making culture, as facilitated by prescriptive analytics?

 

It reduces the need for human input in decision-making.

It reduces the need for human input in decision-making.

It simplifies the data collection process.

It eliminates the need for any form of analytics.

It reduces the need for human input in decision-making.

80
New cards

Which of the following is a key component of the 'Data Sampling' phase in the Predictive Analytics Process?

 

Data cleaning and transformation

Model validation

Project kickoff meeting

Feedback based on business knowledge

Data cleaning and transformation

81
New cards
82
New cards
83
New cards
84
New cards
85
New cards
86
New cards
87
New cards
88
New cards
89
New cards
90
New cards
91
New cards
92
New cards
93
New cards
94
New cards
95
New cards
96
New cards
97
New cards
98
New cards
99
New cards
100
New cards