AN 420 Final Exam UKY

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/113

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

114 Terms

1
New cards

What is the algebraic formula used to create predictions in a linear regression model?

y = mx + b

2
New cards

True or false: Value ranges for all attributes for every observation in a scoring data set must be within the value ranges for the corresponding attributes in the training data set in a linear regression model.

True

3
New cards

In linear regression, the sum of the confidence level and alpha (α) is always ________.

1

4
New cards

In linear regression, the p-values for each independent variable must be smaller than ________.

alpha

5
New cards

In linear regression, the m variable is the independent variable's ________.

Coefficient

6
New cards

In linear regression, the x variable is the independent variable's ________.

value

7
New cards

In linear regression, the b variable is the model's ________.

Intercept coefficient

8
New cards

In linear regression, p-values larger than alpha indicate that their corresponding independent variables are __________.

Not statistically significant

9
New cards

The data type of the dependent variable in linear regression must be ________.

Continuous numeric

10
New cards

The data types of all independent variables in linear regression must be _________.

Numeric

11
New cards

In linear regression, the R-squared value indicates the __________ between the dependent variable and the independent variable(s).

Percent of shared variability

12
New cards

T/F: If you attempt to make a prediction for an out-of-range scoring observation in a linear regression model in RapidMiner, the software will throw an error.

false

13
New cards

True or false: In logistic regression, the smaller the p-Value for an independent variable, the more predictive power that variable has relative to the dependent variable.

True

14
New cards

The data type of the dependent variable in logistic regression must be ________.

Binary

A date or date/time

Continuously numeric

A string

Binary

15
New cards

The data types of all independent variables in logistic regression must be _________.

Numeric

16
New cards

In a logistic regression model, the confidence values that indicate how sure you can be that the binary prediction is correct are called _________ percentages.

Post-probability

17
New cards

True or false: The values true/false or 0/1 would both be valid combinations for the dependent variable in a logistic regression model.

True

18
New cards

True or false: Unlike in linear regression, it is possible to have more than one dependent variable in a logistic regression model.

False

19
New cards

In a logistic regression model, if all p-Values are rounded to zero, you can determine the relative predictive power of independent variables using the _________.

z-Value

20
New cards

The default confidence percent used for logistic regression models is _______.

95%

21
New cards

In logistic regression models, if the predicted confidence percent is 50% or greater, the class prediction will be __________.

True

False

0

Neither True nor False

True

22
New cards

True or false: There is more than one algorithm available for use when producing logistic regression models.

True

23
New cards

In RapidMiner, the data type of the dependent variable in a logistic regression model must be __________.

Binominal

24
New cards

In RapidMiner, a logistic regression model will produce _________ when applied to a scoring data set using the Apply Model operator.

A binary class prediction and confidence percentages for the positive and negative dependent variable outcomes

25
New cards

True or false: In RapidMiner, the label (dependent variable) can be coded either alphabetically (e.g., true/false) or numerically (e.g., 0/1).

True

26
New cards

True or false: The Logistic Regression operator in RapidMiner offers only one algorithm for model creation.

False

27
New cards

If a training data set in RapidMiner contains a non-predictive, numeric identification column, how must this be handled when creating logistic regression models?

The role for the identification column must be set to "ID."

28
New cards

In logistic regression in R, the glm command must include a parameter setting the family equal to ___________.

binominal

29
New cards

In a decision tree, the independent variable found at each branch of the tree is known as a _________.

Node

30
New cards

True or false: In decision tree models, no independent variable can be used more than once.

false

31
New cards

True or false: In a decision tree model, not all training observations that follow a specific path through the tree must have the same dependent variable outcome.

true

32
New cards

Data types for independent variables in a decision tree model must be ___________.

Numeric

Binary

Text

Any of the above

Any of the above

33
New cards

If a data analyst finds that a decision tree model has too many nodes or leaves to be meaningful, the analyst should apply _________ to the tree.

pruning

34
New cards

True or false: Unlike some other predictive modeling techniques, decision tree models do not provide confidence percentages alongside their predictions.

false

35
New cards

In decision trees, CART is an acronym that stands for ________.

Classification and Regression Trees

36
New cards

True or false: In decision tree models, all independent variables are given equal weight when making predictions.

False

37
New cards

The data type for the dependent variable in a classification decision tree model must be __________.

Nominal

38
New cards

In a decision tree, the dependent variable value found at the end of each path through the tree is known as a _________.

Leaf

39
New cards

In a decision tree model represented visually in RapidMiner, the first predictive independent variable is represented __________.

At the top

40
New cards

Increasing which parameter of the Decision Tree operator in RapidMiner would reduce the size of the tree?

Minimal Leaf Size

41
New cards

How many algorithm options for constructing tree models are there in the Decision Tree operator in RapidMiner?

5

42
New cards

Neural networks build probability pathways between combinations of independent variable values and dependent variable outcomes through a process called forward and back _________.

Propagation

43
New cards

The data type required for independent variables in a neural network model must be _________.

Numeric

44
New cards

The space between the independent variables and the dependent variable where a neural network model gets trained is called the ____________.

Hidden layer

45
New cards

Inferential, probability-based approaches to data comparisons allowing inference based on probabilities to determine the strength of the relationship between attributes in data sets is known as __________.

Fuzzy logic

46
New cards

When training a neural network, the process of mapping independent variables forward to the dependent variable and then backward to the independent variables will repeat until __________ is reached.

Convergence

47
New cards

In neural networks, the pathways between independent variables and dependent variables are called __________.

Synapses

48
New cards

What is calculated in the nodes of the hidden layer of a neural network?

Independent variable weights

49
New cards

In a neural network model, how many nodes will the output layer always have?

The number of distinct values in the dependent variable

50
New cards

Which formula represents the general guideline for choosing the size of a neural network's hidden layer?

(((count(independent variables) + count(dependent variable levels))/2)+1

51
New cards

What do we call the process of tracing a neural network pathway from a dependent variable outcome to its independent variable inputs?

Backpropagation

52
New cards

In RapidMiner, if the number of hidden layers is not specified by the analyst, how many hidden layers will be used to train a neural network?

1

53
New cards

In RapidMiner, which of the following will automatically be generated when the Apply Model operator applies a neural network model to a scoring data set?

Both class predictions and confidence percentages

54
New cards

In the Neural Net operator in RapidMiner, which of the following parameters will cause the model to stop the training process if its value is reached?

Training cycles

Momentum

Learning rate

All of these would stop training if their value is reached.

None of these would stop training if their value is reached.

All of these would stop training if their value is reached.

55
New cards

In RapidMiner, if one or more independent variables has a non-numeric data type, what would be required for the Neural Net operator to work correctly?

The non-numeric independent variables could be recoded to numeric values or excluded from the model.

56
New cards

In R, the type parameter required to see category predictions from a neural network using the predict function is __________.

class

57
New cards

The process of converting words from text into data points in text mining is called _________.

Tokenization

58
New cards

Words required for sentences to make grammatical sense in written language, but that are not helpful in text mining results are called __________.

Stopwords

59
New cards

Phrases consisting of two or more words that are combined together in text mining results in order to retain context are called __________.

n-grams

60
New cards

Combining similar words such as "nation," "nations," "national," and "nationality" into a single data point in text mining is called _________.

Stemming

61
New cards

To ensure that text mined terms such as "Complaint" and "complaint" are identified as the same data point, an analyst should __________.

Transform cases

62
New cards

True or false: In text mining, if the data analyst wants data elements that are similar (e.g., car, truck, van = vehicle), it is possible to replace these data items with a single representative item

True

63
New cards

True or false: Because text mining deals with words in paragraphs, it is not possible to create charts or graphs to visualize results.

TrueFalse

False

64
New cards

True or false: Data analysts can create their own lists of words to be removed during text mining activities.

True

65
New cards

The text from one or more documents that is analyzed during text mining is referred to as the _________.

Corpus

66
New cards

Which category of data mining and analytics is most descriptive of text mining activities?

Unstructured data analysis

67
New cards

The process of checking for the likelihood of false positives in predictive models is called _________.

Cross-validation

68
New cards

True or false: A false positive is when a model predicts an expected outcome incorrectly.

True

69
New cards

In order to test a predictive model's accuracy, apply the model to the _________ data, then compare predicted values to the dependent variable.

training

70
New cards

True or false: Cross-validation is not necessary if a model produces usable predictions without it.

False

71
New cards

True or false: It is not possible to validate a classification model such as k-Means.

False

72
New cards

True or false: In addition to accuracy rates, cross-validation can also provide a data analyst with error rates.

True

73
New cards

The "k" in k-folds indicates ___________.

The number of groups the training data is segmented into

74
New cards

True or false: When using k-folds cross-validation, k should be set to 10.

False

75
New cards

Cross-validation can determine the predictive accuracy of all of the following except __________.

k-Means clustering

76
New cards

Cross-validation can determine predicted categorical outcomes for all of the following except __________.

Linear regression

77
New cards

True or false: In RapidMiner, the performance operator you choose is dictated by the type of modeling technique you are validating.

True

78
New cards

True or false: When cross-validating a model in RapidMiner, the appropriate Performance operator to be used in the subprocess will depend on the type of dependent variable in the training data.

True

79
New cards

A Performance (Classification) operator in RapidMiner will automatically generate which of the following validation outputs?

A model's predictive accuracy

80
New cards

Requiring user authentication before allowing access to digital data is an example of which of Lawrence Lessig's mechanisms for governing ethical behavior?

code

81
New cards

The set of moral codes above and beyond the legally required minimums that an individual uses to make right and respectful decisions is called _________.

ethics

82
New cards

Publicly disclosing the kinds and extents of data collected by a mobile app and allowing people to have input on the use of such is an example of which of Lawrence Lessig's mechanisms for governing ethical behavior?

social norms

83
New cards

People refusing to use a given social media platform because of the sale of user data for data mining is an example of which of Lawrence Lessig's mechanisms for governing ethical behavior?

markets

84
New cards

Governmental requirements to report unauthorized access to hospital patient information is an example of which of Lawrence Lessig's mechanisms for governing ethical behavior?

laws

85
New cards

Which ethical framework is defined in the following quote?

"Unless a person can take a given action repeatedly without causing harm, that person should not take that action even once."

Descartes' rule of change

Kant's categorical imperative

Thoreau's value expectation

Voltaire's moral standard

Descartes' rule of change

86
New cards

Which ethical framework is defined in the following quote?

"Unless all members of a society can take a given action without causing harm, then no members of that society should take that action."

Thoreau's value expectation

Descartes' rule of change

Kant's categorical imperative

Voltaire's moral standard

Kant's categorical imperative

87
New cards

Which of the following is a professional organization that provides a code of ethics that can be used by data miners and analysts?

Association for Computing Machinery

88
New cards

Data mining ethics includes respect for __________.

Privacy

Accuracy

Confidentiality

All of the above

All of the above

89
New cards

True or false: Because privacy and confidentiality are so important, it is in everyone's best interest to collect and analyze data quietly in the background of an organization.

False

90
New cards

Which of the following is NOT an organization that maintains a professional code of ethics that is relevant to data analysts?

ANA

91
New cards

Linear Regression

predictive data mining method that uses the algebraic formula for calculating the slope of a line to predict where a given observation will likely fall during that line

92
New cards

statistically significant

the measure of whether or not the model has yielded any results that are mathematically reliable enough to be used

93
New cards

confidence interval

the probability that an estimated value in an analytic model, created using a data sample, is also true for the population represents in the sample

94
New cards

alpha

the probability of rejecting a null hypothesis. Alpha is usually 5% leaving CL = 95%

95
New cards

logistic regression

a predictive data mining method that uses a quadratic formula to predict one of a set of possible outcomes, along with a probability that the prediction will be the actual outcome

96
New cards

neural network

a predictive data mining methodology that tries to mimic human brain process by comparing that values of all attributes in a date set to one another through the use of a hidden layer of nodes.

97
New cards

fuzzy logic

data mining concept associated with neural networks where predictions are made using a training data set

98
New cards

stop words

In database searching, "stop words" are small and frequently occurring words like and, or, in, of that are often ignored when keyed as search terms. Sometimes putting them in quotes " " will allow you to search them.

99
New cards

stemming

finding terms that share a common root and mean the same thing and combining them into one attribute

100
New cards

n-grams

a phrase or combination of words that may take one meaning that is different from or greater than the meaning of each owrd individually