SCM 380 Exam 2

5.0(1)
studied byStudied by 15 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/53

flashcard set

Earn XP

Description and Tags

A compilation of discussion post questions from the class

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

54 Terms

1
New cards

True or False: Qualitative data use labels or names to identify categories of like items. 

True

2
New cards

Multiple Choice: Heat maps are used for:

A) Visualizations within data mining for correlations and missing data

B) Creating a histogram

C) Making a Holt's model

D) Showing scatter plots for variable pairs

A) Visualizations within data mining for correlations and missing data

3
New cards

Which of the following is the most popular rotational method?

A. Quartimax rotations

B. Varimax rotations

C. Promax rotations

D. Equamax rotations

B. Varimax rotations

4
New cards

Which of the following is not a predictive measure of error?

A. MAD

B. Average Error

C. Total SSE

D. ROC Curve

D. ROC Curve

5
New cards

Which distribution plot uses color to show correlation amongst variables?

A) Scatterplots

B) Heat maps

C) Box plots

D) Histograms

B) Heat maps

6
New cards

Could you have a dummy variable hold a value of any number?

A) True

B) False

B) False

7
New cards

"High separation of records" means that using predictor variables attains _____

     A. High Error

     B. No Error

     C. Low Error

C. Low Error

8
New cards

Measures of Variability are the tabular, graphical, and numerical methods used to summarize and present data. (True/False)

False

9
New cards

The goal of principle components analysis is to increase a set of numerical variables. (T/F)

F

10
New cards

What does jittering do?

A. Moves markers by a small random amount 

B. Moves the date closer together

C. Uncrowds the data by allowing more markers to be seen 

D. Both A and C

D. Both A and C

11
New cards
12
New cards

The curse of dimensionality is the “affliction caused by adding variables to multivariate data”. Why is this a problem for data mining exercises?

A: Too many variables will only allow for one type of data mining technique which is not that accurate: regressions.

B: Can be compared to a chess board, adding a third dimension to chess boards would increase location options by 800%.

C: Too many variables make box plots insufferable to observe.

D: Too many dimensions leave too much remaining noise to perform data mining

B and D

13
New cards

Which of the following would not be considered a measure of variability?

     a) Standard Deviation

     b) Percentiles

     c) Interquartile Range

     d) Range

     b) Percentiles

14
New cards

Both sensitivity and specificity can address the question, 'how often is the test right? (True/False)

True

15
New cards

T/F Descriptives statistics are the tabular, graphical, and numerical methods used to summerize and present data

True

16
New cards

T/F Qualitative data are numerical values that indicate that indicate how much or how many and quantitative data use labels or names to identify categories of like items

False

17
New cards

Naive Rule classifies all records as belonging to the most prevalent class (True/False).

True

18
New cards

Which of the following types of data uses labels or names to identify categories of like items?

a. Ordinal

b. Qualitative

c. Interval

d. Quantitative

b. Qualitative

19
New cards

What option is not a step in the cutoff for classification process, choose one.

a. Compare to cutoff value, and classify accordingly

b. Compute the probability of belonging to class "0"

c. Compute the probability of belonging to class "1"

b. Compute the probability of belonging to class "0"

20
New cards

What model should one use when dealing with a continuous and supervised learning model

a. Logistic Regression

b. Regression

c. Cluster Analysis

d. Principle Components

b. Regression

21
New cards

Which of the following is not a measurement of error? 

A: Tracking Signal 

B: Bias

C: Mean Absolute Deviation

D: Mean Squared Error

B: Bias

22
New cards

The goal of a Principal Component Analysis is to increase the number of numerical variables (True/False).

False

23
New cards

Is a histogram a basic plot or a distribution plot?

A. Basic plot

B. Distribution plot

B. Distribution plot

24
New cards

Which of the following is not a measure of error?

A. Mean absolute deviation (MAD)

B. Mean absolute percent error (MAPE)

C. Tracking signal

D. Tracing error

E. Mean squared error (MSE)

D. Tracing error

25
New cards

For Predictions, which of the following is NOT a metric for performance?

A. Average Error

B. GALE

C. MAPE

D. RMSE

B. GALE

26
New cards

Bar Charts are useful for comparing multiple statistics like average, count, percentage, etc. across groups (True/False)

False

27
New cards

Which of the following is NOT considered a basic plot for data exploration?

    a. Line Graphs

    b. Scatter Plots

    c. Bar Charts

    d. Histograms

    d. Histograms

28
New cards

Graphical Methods for Categorical data include dot plots, histograms, and scatter diagrams (True/False)

False

29
New cards

Distribution plots display “how many” of each value occur in a data set (True/False). 

True

30
New cards

Percentage of misclassified records out of the total records in the validation data.

A. Error

B. Accuracy 

C. Error rate

D. Naive 

C. Error rate

31
New cards

A single categorical variable with m categories is typically transformed into m+1 dummy variables (True/False)

False

32
New cards

The ROC curve was first used in what war?

a. Cold War

b. WW2

c. Vietnam War

d. WW1

b. WW2

33
New cards

Error is classifying all records as belonging to the
most prevalent class (True/False)

False

34
New cards

Fill in the blank: we simplify decision trees by_____ peripheral branches to avoid overfitting.

A. Collecting

B. Pruning

C. Avoid

D. Eliminating

B. Pruning

35
New cards

The MAPE is a measure of the percentage of how much predictions deviate from the actual values (True/False)

True

36
New cards

Multiple choice: Two important charts that visualize distribution of data are boxplots and _________

a. histograms

b. line charts

c. bar charts

d. scatter plots

a. histograms

37
New cards

The Naïve rule is classify all records as belonging to the most prevalent class (True/False)

True

38
New cards

What is the second step in factor analysis? 

A. the correlation matrix for all variables is computed

B. Factor extraction

C. Factor rotation

D. Make final decisions about the number of underlying factors

B. Factor extraction

39
New cards

We select the split that most increases the Gini Index (True/False)

False

40
New cards

What is the process of recursive partitioning

a. Graft two branches together

b. Repeatedly split the records into two parts

c. Simplify the tree by removing branches

d. Create a new branch

b. Repeatedly split the records into two parts

41
New cards

Binary Logistic Regression results in a V-shaped distribution function. (True/False)

False

42
New cards

Multiple Choice: What are the "Odds" in step 2 of The Logit?

a. Ratio

b. Question

c. Quantity

d. Linear

a. Ratio

43
New cards

The goal of trees and rules is to classify or predict an outcome based on a set of predictors. (True/False)

True

44
New cards

Logit can be modeled as a linear function of the ____

    A. Probabilities

    B. Outcomes

    C. Variables

    D. Predictors

D. Predictors

45
New cards

Which of the below are advantages of regression trees?

A. Can work without extensive handling of missing data 

B. Produce rules that are easy to interpret & implement 

C. Variable selection & reduction is automatic 

D. All of the above

D. All of the above

46
New cards

Simple linear regression is a relation between 3 continuous variables? (True/False)

False

47
New cards

What is the proper definition of pruning as used for decision trees?

a. The process of dividing a node into two smaller nodes.

b. The process of adding a whole section of a tree.

c. The process of cutting down the tree.

d. None of the above.

c. The process of cutting down the tree.

48
New cards

The logistic distribution is an S-shaped distribution function (True/False)

49
New cards

When determining the cutoff value, the popular initial choice is 0.45. (True/False)

False

50
New cards

Which of the following is not a term used when talking about decision trees?

a. Splitting

b. Pruning

c. Toning

d. Grafting

c. Toning

51
New cards

The two most popular ways to measure Impurity are the Gini Index and Entropy Measure (True/False.


True

52
New cards

Which of the following is NOT an example of a categorical class for stock acquisition?

A. Sell

B. Hold

C. Consider

D. Buy


53
New cards

Multiple linear regression is the relation between 2 continuous variables (True/False)

False

54
New cards

When referring to tree structure, split points become nodes on tree (True/False)

True