SCM 380 Exam 2

studied byStudied by 15 people
5.0(1)
Get a hint
Hint

True or False: Qualitative data use labels or names to identify categories of like items. 

1 / 53

flashcard set

Earn XP

Description and Tags

A compilation of discussion post questions from the class

54 Terms

1

True or False: Qualitative data use labels or names to identify categories of like items. 

True

New cards
2

Multiple Choice: Heat maps are used for:

A) Visualizations within data mining for correlations and missing data

B) Creating a histogram

C) Making a Holt's model

D) Showing scatter plots for variable pairs

A) Visualizations within data mining for correlations and missing data

New cards
3

Which of the following is the most popular rotational method?

A. Quartimax rotations

B. Varimax rotations

C. Promax rotations

D. Equamax rotations

B. Varimax rotations

New cards
4

Which of the following is not a predictive measure of error?

A. MAD

B. Average Error

C. Total SSE

D. ROC Curve

D. ROC Curve

New cards
5

Which distribution plot uses color to show correlation amongst variables?

A) Scatterplots

B) Heat maps

C) Box plots

D) Histograms

B) Heat maps

New cards
6

Could you have a dummy variable hold a value of any number?

A) True

B) False

B) False

New cards
7

"High separation of records" means that using predictor variables attains _____

     A. High Error

     B. No Error

     C. Low Error

C. Low Error

New cards
8

Measures of Variability are the tabular, graphical, and numerical methods used to summarize and present data. (True/False)

False

New cards
9

The goal of principle components analysis is to increase a set of numerical variables. (T/F)

F

New cards
10

What does jittering do?

A. Moves markers by a small random amount 

B. Moves the date closer together

C. Uncrowds the data by allowing more markers to be seen 

D. Both A and C

D. Both A and C

New cards
11
New cards
12

The curse of dimensionality is the “affliction caused by adding variables to multivariate data”. Why is this a problem for data mining exercises?

A: Too many variables will only allow for one type of data mining technique which is not that accurate: regressions.

B: Can be compared to a chess board, adding a third dimension to chess boards would increase location options by 800%.

C: Too many variables make box plots insufferable to observe.

D: Too many dimensions leave too much remaining noise to perform data mining

B and D

New cards
13

Which of the following would not be considered a measure of variability?

     a) Standard Deviation

     b) Percentiles

     c) Interquartile Range

     d) Range

     b) Percentiles

New cards
14

Both sensitivity and specificity can address the question, 'how often is the test right? (True/False)

True

New cards
15

T/F Descriptives statistics are the tabular, graphical, and numerical methods used to summerize and present data

True

New cards
16

T/F Qualitative data are numerical values that indicate that indicate how much or how many and quantitative data use labels or names to identify categories of like items

False

New cards
17

Naive Rule classifies all records as belonging to the most prevalent class (True/False).

True

New cards
18

Which of the following types of data uses labels or names to identify categories of like items?

a. Ordinal

b. Qualitative

c. Interval

d. Quantitative

b. Qualitative

New cards
19

What option is not a step in the cutoff for classification process, choose one.

a. Compare to cutoff value, and classify accordingly

b. Compute the probability of belonging to class "0"

c. Compute the probability of belonging to class "1"

b. Compute the probability of belonging to class "0"

New cards
20

What model should one use when dealing with a continuous and supervised learning model

a. Logistic Regression

b. Regression

c. Cluster Analysis

d. Principle Components

b. Regression

New cards
21

Which of the following is not a measurement of error? 

A: Tracking Signal 

B: Bias

C: Mean Absolute Deviation

D: Mean Squared Error

B: Bias

New cards
22

The goal of a Principal Component Analysis is to increase the number of numerical variables (True/False).

False

New cards
23

Is a histogram a basic plot or a distribution plot?

A. Basic plot

B. Distribution plot

B. Distribution plot

New cards
24

Which of the following is not a measure of error?

A. Mean absolute deviation (MAD)

B. Mean absolute percent error (MAPE)

C. Tracking signal

D. Tracing error

E. Mean squared error (MSE)

D. Tracing error

New cards
25

For Predictions, which of the following is NOT a metric for performance?

A. Average Error

B. GALE

C. MAPE

D. RMSE

B. GALE

New cards
26

Bar Charts are useful for comparing multiple statistics like average, count, percentage, etc. across groups (True/False)

False

New cards
27

Which of the following is NOT considered a basic plot for data exploration?

    a. Line Graphs

    b. Scatter Plots

    c. Bar Charts

    d. Histograms

    d. Histograms

New cards
28

Graphical Methods for Categorical data include dot plots, histograms, and scatter diagrams (True/False)

False

New cards
29

Distribution plots display “how many” of each value occur in a data set (True/False). 

True

New cards
30

Percentage of misclassified records out of the total records in the validation data.

A. Error

B. Accuracy 

C. Error rate

D. Naive 

C. Error rate

New cards
31

A single categorical variable with m categories is typically transformed into m+1 dummy variables (True/False)

False

New cards
32

The ROC curve was first used in what war?

a. Cold War

b. WW2

c. Vietnam War

d. WW1

b. WW2

New cards
33

Error is classifying all records as belonging to the
most prevalent class (True/False)

False

New cards
34

Fill in the blank: we simplify decision trees by_____ peripheral branches to avoid overfitting.

A. Collecting

B. Pruning

C. Avoid

D. Eliminating

B. Pruning

New cards
35

The MAPE is a measure of the percentage of how much predictions deviate from the actual values (True/False)

True

New cards
36

Multiple choice: Two important charts that visualize distribution of data are boxplots and _________

a. histograms

b. line charts

c. bar charts

d. scatter plots

a. histograms

New cards
37

The Naïve rule is classify all records as belonging to the most prevalent class (True/False)

True

New cards
38

What is the second step in factor analysis? 

A. the correlation matrix for all variables is computed

B. Factor extraction

C. Factor rotation

D. Make final decisions about the number of underlying factors

B. Factor extraction

New cards
39

We select the split that most increases the Gini Index (True/False)

False

New cards
40

What is the process of recursive partitioning

a. Graft two branches together

b. Repeatedly split the records into two parts

c. Simplify the tree by removing branches

d. Create a new branch

b. Repeatedly split the records into two parts

New cards
41

Binary Logistic Regression results in a V-shaped distribution function. (True/False)

False

New cards
42

Multiple Choice: What are the "Odds" in step 2 of The Logit?

a. Ratio

b. Question

c. Quantity

d. Linear

a. Ratio

New cards
43

The goal of trees and rules is to classify or predict an outcome based on a set of predictors. (True/False)

True

New cards
44

Logit can be modeled as a linear function of the ____

    A. Probabilities

    B. Outcomes

    C. Variables

    D. Predictors

D. Predictors

New cards
45

Which of the below are advantages of regression trees?

A. Can work without extensive handling of missing data 

B. Produce rules that are easy to interpret & implement 

C. Variable selection & reduction is automatic 

D. All of the above

D. All of the above

New cards
46

Simple linear regression is a relation between 3 continuous variables? (True/False)

False

New cards
47

What is the proper definition of pruning as used for decision trees?

a. The process of dividing a node into two smaller nodes.

b. The process of adding a whole section of a tree.

c. The process of cutting down the tree.

d. None of the above.

c. The process of cutting down the tree.

New cards
48

The logistic distribution is an S-shaped distribution function (True/False)

New cards
49

When determining the cutoff value, the popular initial choice is 0.45. (True/False)

False

New cards
50

Which of the following is not a term used when talking about decision trees?

a. Splitting

b. Pruning

c. Toning

d. Grafting

c. Toning

New cards
51

The two most popular ways to measure Impurity are the Gini Index and Entropy Measure (True/False.


True

New cards
52

Which of the following is NOT an example of a categorical class for stock acquisition?

A. Sell

B. Hold

C. Consider

D. Buy


New cards
53

Multiple linear regression is the relation between 2 continuous variables (True/False)

False

New cards
54

When referring to tree structure, split points become nodes on tree (True/False)

True

New cards

Explore top notes

note Note
studied byStudied by 8 people
Updated ... ago
4.0 Stars(1)
note Note
studied byStudied by 11 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 14 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 8 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 101 people
Updated ... ago
5.0 Stars(4)
note Note
studied byStudied by 7 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 20 people
Updated ... ago
4.0 Stars(1)
note Note
studied byStudied by 16 people
Updated ... ago
5.0 Stars(1)

Explore top flashcards

flashcards Flashcard20 terms
studied byStudied by 6 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard29 terms
studied byStudied by 6 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard29 terms
studied byStudied by 12 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard51 terms
studied byStudied by 23 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard173 terms
studied byStudied by 36 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard27 terms
studied byStudied by 6 people
Updated ... ago
5.0 Stars(3)
flashcards Flashcard66 terms
studied byStudied by 70 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard32 terms
studied byStudied by 235 people
Updated ... ago
4.8 Stars(9)