SCM 380 Exam 2

5.0(1)

Studied by 15 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/53

Earn XP

Description and Tags

A compilation of discussion post questions from the class

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

54 Terms

New cards

True or False: Qualitative data use labels or names to identify categories of like items.

True

New cards

Multiple Choice: Heat maps are used for:

A) Visualizations within data mining for correlations and missing data

B) Creating a histogram

C) Making a Holt's model

D) Showing scatter plots for variable pairs

A) Visualizations within data mining for correlations and missing data

New cards

Which of the following is the most popular rotational method?

A. Quartimax rotations

B. Varimax rotations

C. Promax rotations

D. Equamax rotations

B. Varimax rotations

New cards

Which of the following is not a predictive measure of error?

A. MAD

B. Average Error

C. Total SSE

D. ROC Curve

New cards

Which distribution plot uses color to show correlation amongst variables?

A) Scatterplots

B) Heat maps

C) Box plots

D) Histograms

B) Heat maps

New cards

Could you have a dummy variable hold a value of any number?

A) True

B) False

New cards

"High separation of records" means that using predictor variables attains _____

A. High Error

B. No Error

C. Low Error

New cards

Measures of Variability are the tabular, graphical, and numerical methods used to summarize and present data. (True/False)

False

New cards

The goal of principle components analysis is to increase a set of numerical variables. (T/F)

New cards

What does jittering do?

A. Moves markers by a small random amount

B. Moves the date closer together

C. Uncrowds the data by allowing more markers to be seen

D. Both A and C

New cards

The curse of dimensionality is the “affliction caused by adding variables to multivariate data”. Why is this a problem for data mining exercises?

A: Too many variables will only allow for one type of data mining technique which is not that accurate: regressions.

B: Can be compared to a chess board, adding a third dimension to chess boards would increase location options by 800%.

C: Too many variables make box plots insufferable to observe.

D: Too many dimensions leave too much remaining noise to perform data mining

B and D

New cards

Which of the following would not be considered a measure of variability?

a) Standard Deviation

b) Percentiles

c) Interquartile Range

d) Range

b) Percentiles

New cards

Both sensitivity and specificity can address the question, 'how often is the test right? (True/False)

True

New cards

T/F Descriptives statistics are the tabular, graphical, and numerical methods used to summerize and present data

True

New cards

T/F Qualitative data are numerical values that indicate that indicate how much or how many and quantitative data use labels or names to identify categories of like items

False

New cards

Naive Rule classifies all records as belonging to the most prevalent class (True/False).

True

New cards

Which of the following types of data uses labels or names to identify categories of like items?

a. Ordinal

b. Qualitative

c. Interval

d. Quantitative

b. Qualitative

New cards

What option is not a step in the cutoff for classification process, choose one.

a. Compare to cutoff value, and classify accordingly

b. Compute the probability of belonging to class "0"

c. Compute the probability of belonging to class "1"

b. Compute the probability of belonging to class "0"

New cards

What model should one use when dealing with a continuous and supervised learning model

a. Logistic Regression

b. Regression

c. Cluster Analysis

d. Principle Components

b. Regression

New cards

Which of the following is not a measurement of error?

A: Tracking Signal

B: Bias

C: Mean Absolute Deviation

D: Mean Squared Error

B: Bias

New cards

The goal of a Principal Component Analysis is to increase the number of numerical variables (True/False).

False

New cards

Is a histogram a basic plot or a distribution plot?

A. Basic plot

B. Distribution plot

New cards

Which of the following is not a measure of error?

A. Mean absolute deviation (MAD)

B. Mean absolute percent error (MAPE)

C. Tracking signal

D. Tracing error

E. Mean squared error (MSE)

D. Tracing error

New cards

For Predictions, which of the following is NOT a metric for performance?

A. Average Error

B. GALE

C. MAPE

D. RMSE

B. GALE

New cards

Bar Charts are useful for comparing multiple statistics like average, count, percentage, etc. across groups (True/False)

False

New cards

Which of the following is NOT considered a basic plot for data exploration?

a. Line Graphs

b. Scatter Plots

c. Bar Charts

d. Histograms

New cards

Graphical Methods for Categorical data include dot plots, histograms, and scatter diagrams (True/False)

False

New cards

Distribution plots display “how many” of each value occur in a data set (True/False).

True

New cards

Percentage of misclassified records out of the total records in the validation data.

A. Error

B. Accuracy

C. Error rate

D. Naive

C. Error rate

New cards

A single categorical variable with m categories is typically transformed into m+1 dummy variables (True/False)

False

New cards

The ROC curve was first used in what war?

a. Cold War

b. WW2

c. Vietnam War

d. WW1

b. WW2

New cards

Error is classifying all records as belonging to the
most prevalent class (True/False)

False

New cards

Fill in the blank: we simplify decision trees by_____ peripheral branches to avoid overfitting.

A. Collecting

B. Pruning

C. Avoid

D. Eliminating

B. Pruning

New cards

The MAPE is a measure of the percentage of how much predictions deviate from the actual values (True/False)

True

New cards

Multiple choice: Two important charts that visualize distribution of data are boxplots and _________

a. histograms

b. line charts

c. bar charts

d. scatter plots

a. histograms

New cards

The Naïve rule is classify all records as belonging to the most prevalent class (True/False)

True

New cards

What is the second step in factor analysis?

A. the correlation matrix for all variables is computed

B. Factor extraction

C. Factor rotation

D. Make final decisions about the number of underlying factors

B. Factor extraction

New cards

We select the split that most increases the Gini Index (True/False)

False

New cards

What is the process of recursive partitioning

a. Graft two branches together

b. Repeatedly split the records into two parts

c. Simplify the tree by removing branches

d. Create a new branch

b. Repeatedly split the records into two parts

New cards

Binary Logistic Regression results in a V-shaped distribution function. (True/False)

False

New cards

Multiple Choice: What are the "Odds" in step 2 of The Logit?

a. Ratio

b. Question

c. Quantity

d. Linear

a. Ratio

New cards

The goal of trees and rules is to classify or predict an outcome based on a set of predictors. (True/False)

True

New cards

Logit can be modeled as a linear function of the ____

A. Probabilities

B. Outcomes

C. Variables

D. Predictors

New cards

Which of the below are advantages of regression trees?

A. Can work without extensive handling of missing data

B. Produce rules that are easy to interpret & implement

C. Variable selection & reduction is automatic

D. All of the above

New cards

Simple linear regression is a relation between 3 continuous variables? (True/False)

False

New cards

What is the proper definition of pruning as used for decision trees?

a. The process of dividing a node into two smaller nodes.

b. The process of adding a whole section of a tree.

c. The process of cutting down the tree.

d. None of the above.

c. The process of cutting down the tree.

New cards

The logistic distribution is an S-shaped distribution function (True/False)

New cards

When determining the cutoff value, the popular initial choice is 0.45. (True/False)

False

New cards

Which of the following is not a term used when talking about decision trees?

a. Splitting

b. Pruning

c. Toning

d. Grafting

c. Toning

New cards

The two most popular ways to measure Impurity are the Gini Index and Entropy Measure (True/False.

True

New cards

Which of the following is NOT an example of a categorical class for stock acquisition?

A. Sell

B. Hold

C. Consider

D. Buy

New cards

Multiple linear regression is the relation between 2 continuous variables (True/False)

False

New cards

When referring to tree structure, split points become nodes on tree (True/False)

True