SCM 380 Exam 2 new

0.0(0)
Studied by 12 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/71

flashcard set

Earn XP

Description and Tags

A compilation of discussion post questions from the class. aaron was here

Last updated 12:21 AM on 11/8/23
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

72 Terms

1
New cards

True or False: Qualitative data use labels or names to identify categories of like items. 

True

2
New cards

Multiple Choice: Heat maps are used for:

A) Visualizations within data mining for correlations and missing data

B) Creating a histogram

C) Making a Holt's model

D) Showing scatter plots for variable pairs

A) Visualizations within data mining for correlations and missing data

3
New cards

Which of the following is the most popular rotational method?

A. Quartimax rotations

B. Varimax rotations

C. Promax rotations

D. Equamax rotations

B. Varimax rotations

4
New cards

Which of the following is not a predictive measure of error?

A. MAD

B. Average Error

C. Total SSE

D. ROC Curve

D. ROC Curve

5
New cards

Which distribution plot uses color to show correlation amongst variables?

A) Scatterplots

B) Heat maps

C) Box plots

D) Histograms

B) Heat maps

6
New cards

Could you have a dummy variable hold a value of any number?

A) True

B) False

B) False

7
New cards

"High separation of records" means that using predictor variables attains _____

     A. High Error

     B. No Error

     C. Low Error

C. Low Error

8
New cards

Measures of Variability are the tabular, graphical, and numerical methods used to summarize and present data. (True/False)

False

9
New cards

The goal of principle components analysis is to increase a set of numerical variables. (T/F)

F

10
New cards

What does jittering do?

A. Moves markers by a small random amount 

B. Moves the date closer together

C. Uncrowds the data by allowing more markers to be seen 

D. Both A and C

D. Both A and C

11
New cards
12
New cards

The curse of dimensionality is the “affliction caused by adding variables to multivariate data”. Why is this a problem for data mining exercises?

A: Too many variables will only allow for one type of data mining technique which is not that accurate: regressions.

B: Can be compared to a chess board, adding a third dimension to chess boards would increase location options by 800%.

C: Too many variables make box plots insufferable to observe.

D: Too many dimensions leave too much remaining noise to perform data mining

B and D

13
New cards

Which of the following would not be considered a measure of variability?

     a) Standard Deviation

     b) Percentiles

     c) Interquartile Range

     d) Range

     b) Percentiles

14
New cards

Both sensitivity and specificity can address the question, 'how often is the test right? (True/False)

True

15
New cards

T/F Descriptives statistics are the tabular, graphical, and numerical methods used to summerize and present data

True

16
New cards

T/F Qualitative data are numerical values that indicate that indicate how much or how many and quantitative data use labels or names to identify categories of like items

False

17
New cards

Naive Rule classifies all records as belonging to the most prevalent class (True/False).

True

18
New cards

Which of the following types of data uses labels or names to identify categories of like items?

a. Ordinal

b. Qualitative

c. Interval

d. Quantitative

b. Qualitative

19
New cards

What option is not a step in the cutoff for classification process, choose one.

a. Compare to cutoff value, and classify accordingly

b. Compute the probability of belonging to class "0"

c. Compute the probability of belonging to class "1"

b. Compute the probability of belonging to class "0"

20
New cards

What model should one use when dealing with a continuous and supervised learning model

a. Logistic Regression

b. Regression

c. Cluster Analysis

d. Principle Components

b. Regression

21
New cards

Which of the following is not a measurement of error? 

A: Tracking Signal 

B: Bias

C: Mean Absolute Deviation

D: Mean Squared Error

B: Bias

22
New cards

The goal of a Principal Component Analysis is to increase the number of numerical variables (True/False).

False

23
New cards

Is a histogram a basic plot or a distribution plot?

A. Basic plot

B. Distribution plot

B. Distribution plot

24
New cards

Which of the following is not a measure of error?

A. Mean absolute deviation (MAD)

B. Mean absolute percent error (MAPE)

C. Tracking signal

D. Tracing error

E. Mean squared error (MSE)

D. Tracing error

25
New cards

For Predictions, which of the following is NOT a metric for performance?

A. Average Error

B. GALE

C. MAPE

D. RMSE

B. GALE

26
New cards

Bar Charts are useful for comparing multiple statistics like average, count, percentage, etc. across groups (True/False)

False

27
New cards

Which of the following is NOT considered a basic plot for data exploration?

    a. Line Graphs

    b. Scatter Plots

    c. Bar Charts

    d. Histograms

    d. Histograms

28
New cards

Graphical Methods for Categorical data include dot plots, histograms, and scatter diagrams (True/False)

False

29
New cards

Distribution plots display “how many” of each value occur in a data set (True/False). 

True

30
New cards

Percentage of misclassified records out of the total records in the validation data.

A. Error

B. Accuracy 

C. Error rate

D. Naive 

C. Error rate

31
New cards

A single categorical variable with m categories is typically transformed into m+1 dummy variables (True/False)

False

32
New cards

The ROC curve was first used in what war?

a. Cold War

b. WW2

c. Vietnam War

d. WW1

b. WW2

33
New cards

Error is classifying all records as belonging to the
most prevalent class (True/False)

False

34
New cards

Fill in the blank: we simplify decision trees by_____ peripheral branches to avoid overfitting.

A. Collecting

B. Pruning

C. Avoid

D. Eliminating

B. Pruning

35
New cards

The MAPE is a measure of the percentage of how much predictions deviate from the actual values (True/False)

True

36
New cards

Multiple choice: Two important charts that visualize distribution of data are boxplots and _________

a. histograms

b. line charts

c. bar charts

d. scatter plots

a. histograms

37
New cards

The Naïve rule is classify all records as belonging to the most prevalent class (True/False)

True

38
New cards

What is the second step in factor analysis? 

A. the correlation matrix for all variables is computed

B. Factor extraction

C. Factor rotation

D. Make final decisions about the number of underlying factors

B. Factor extraction

39
New cards

We select the split that most increases the Gini Index (True/False)

False

40
New cards

What is the process of recursive partitioning

a. Graft two branches together

b. Repeatedly split the records into two parts

c. Simplify the tree by removing branches

d. Create a new branch

b. Repeatedly split the records into two parts

41
New cards

Binary Logistic Regression results in a V-shaped distribution function. (True/False)

False

42
New cards

Multiple Choice: What are the "Odds" in step 2 of The Logit?

a. Ratio

b. Question

c. Quantity

d. Linear

a. Ratio

43
New cards

The goal of trees and rules is to classify or predict an outcome based on a set of predictors. (True/False)

True

44
New cards

Logit can be modeled as a linear function of the ____

    A. Probabilities

    B. Outcomes

    C. Variables

    D. Predictors

D. Predictors

45
New cards

Which of the below are advantages of regression trees?

A. Can work without extensive handling of missing data 

B. Produce rules that are easy to interpret & implement 

C. Variable selection & reduction is automatic 

D. All of the above

D. All of the above

46
New cards

Simple linear regression is a relation between 3 continuous variables? (True/False)

False

47
New cards

What is the proper definition of pruning as used for decision trees?

a. The process of dividing a node into two smaller nodes.

b. The process of adding a whole section of a tree.

c. The process of cutting down the tree.

d. None of the above.

c. The process of cutting down the tree.

48
New cards

The logistic distribution is an S-shaped distribution function (True/False)

49
New cards

When determining the cutoff value, the popular initial choice is 0.45. (True/False)

False

50
New cards

Which of the following is not a term used when talking about decision trees?

a. Splitting

b. Pruning

c. Toning

d. Grafting

c. Toning

51
New cards

The two most popular ways to measure Impurity are the Gini Index and Entropy Measure (True/False.


True

52
New cards

Which of the following is NOT an example of a categorical class for stock acquisition?

A. Sell

B. Hold

C. Consider

D. Buy


C

53
New cards

Multiple linear regression is the relation between 2 continuous variables (True/False)

False

54
New cards

When referring to tree structure, split points become nodes on tree (True/False)

True

55
New cards

: T/F: Gini Index and Entropy are inversely correlated, and it is preferable that the tree has high Gini Index and low Entropy.

False

56
New cards

In Binary Logistics Regression, the cutoff value is used to:

A: determine if the observation belongs to class 0 if it greater than or equal to the cutoff value

B: determine if the observation belongs to class 1 if it is greater than or equal to the cutoff value

C: determine if the observation belongs to class 1 if it is less than or equal to the cutoff value

D: determine if the observation belongs to class 0 if it is less than or equal to the cutoff value

B

57
New cards

For regression trees each node can be split into two parent notes.

A. True

B. False

B false

58
New cards

With Categorical Variables (Predictors) what are all possible ways in which the categories A, B and C can be split?

A. (A) and (B, C)

B. (B) and (A, C)

C. (C) and (A, B)

D. All of the above

D

59
New cards

In regards to decision tree terminology,                     is the process in which we repeatedly split records into two parts in order to achieve homogeneity within the new parts.  

     a. Pruning

     b. Recursive Partitioning  

     c. Grafting

     d. Splitting

B

60
New cards

Logistic regression is in no way similar to linear regression.

F

61
New cards
  1. Which of the following is not an advantage of utilizing regression trees?

    • a. They are easy to use and understand.

    • b. They produce rules that are easy to interpret and implement.

    • c. They capture interactions between variables.

    • d. They do not require the assumptions of statistical model

    • e. None of the above; they are all advantages

C

62
New cards
  1. A problem with linear regression is the predicted probabilities can be greater than 1 or less than 0, creating an issue for subsequent analysis.

True

63
New cards

We select the split that most increases the Gini Index

False

64
New cards

Regressions are :

A. Used with continuous outcome variable 

B.  Procedure similar to classification tree 

C.  Do not require the assumptions of statistical models 

D.  A and B

D

65
New cards

Pruning is where you let the tree grow as large as possible

F

66
New cards

The logistic regression model is a linear transformation of the linear regression

False

67
New cards

Logistic regression is similar to linear regression, except that it is used with a categorical response 

True

68
New cards

Stopping Tree Growth: Natural end of process is 100% purity in each leaf

True

69
New cards

The final node of a decision tree is called a Root Node. 

False

70
New cards

_______ Linear Regression is a relation between 2 continuous variables.

A. Simple

B. Multiple 

C. Partial 

D. Binary 

A

71
New cards
  1. T/F: Recursive partitioning is repeatedly splitting records into two parts to achieve maximum homogeneity within the new parts 

T

72
New cards

p= odds/1-odds

F

3.) p=odds/1+odds

Explore top notes

note
Module 8: Price Control
Updated 1245d ago
0.0(0)
note
Toxteth, Liverpool - CASE STUDY
Updated 375d ago
0.0(0)
note
Ch 7 - Deviance and Social Control
Updated 1074d ago
0.0(0)
note
Killing Mr Griffin
Updated 1012d ago
0.0(0)
note
Invisible Man Chapter 22
Updated 1148d ago
0.0(0)
note
Module 8: Price Control
Updated 1245d ago
0.0(0)
note
Toxteth, Liverpool - CASE STUDY
Updated 375d ago
0.0(0)
note
Ch 7 - Deviance and Social Control
Updated 1074d ago
0.0(0)
note
Killing Mr Griffin
Updated 1012d ago
0.0(0)
note
Invisible Man Chapter 22
Updated 1148d ago
0.0(0)

Explore top flashcards

flashcards
MKTG 371 Sharma Exam 1
47
Updated 754d ago
0.0(0)
flashcards
AP Human Geography Unit 4a
41
Updated 346d ago
0.0(0)
flashcards
IHD- Khan
32
Updated 719d ago
0.0(0)
flashcards
3137
79
Updated 920d ago
0.0(0)
flashcards
MKTG 371 Sharma Exam 1
47
Updated 754d ago
0.0(0)
flashcards
AP Human Geography Unit 4a
41
Updated 346d ago
0.0(0)
flashcards
IHD- Khan
32
Updated 719d ago
0.0(0)
flashcards
3137
79
Updated 920d ago
0.0(0)