Looks like no one added any tags here yet for you.
Tidymodels framwork ordering
Recipe, model, workflow
K-fold is used to do
parameter tuning
a logistic regression variable is
yes/no
A linear regression variable is
quantitative, test score, miler per gallon
For a logistic regression model, you look at (blank) on the lefthand side
The log of a particular class
In a logistic regression model you are trying to predict
probability
The process by which a second sample group is given a test to ensure it is applicable to more than one group
Cross validation
What are the k-folds
3, 5, 10
What does cross validation help with
tuning
response variable is categorical, qualitative predicting binary variable, beta is rate "glm"
logistic regression model
In classification, you want AIC to be
low
in classification, you want r squared to be
high
response variable is numerical, quantitative "lm"
linear regression model
confusion matrix
Predictions vs. Actual
developing probabilities/predictions
50% is default threshold, balancing sensitivity and specificity (important in healthcare, credit card fraud, insurance)
splitting data helps with
overfitting
naive accuracy
Confusion Matrix and Statistics: "No Information Rate" and accuracy.. usually w all variables
The r-squared value for a classification model is
AIC
In ROCR, you are looking for
the curve closest to the top left corner
Classification trees use the
rpart package
The parameter to change complexity is
cp
Lower cp means
tree is big
higher cp means
tree is small
A tree with no splits
terminal node
random forest uses
minn and mtry
in clustering, you do not know
dependent variable
in clustering, who specifies the number of clusters
the user
in clustering, you use what function
mbclust
in clustering, what algorithm do you use
kmeans
What does the "CP" parameter control in rpart()?
The complexity of the classification tree
Which function is used to develop predictions from a classification tree in caret?
predict()
What is the term used to describe a tree without any splits?
leaf node
What does the term "min.node.size" control in the ranger function?
The number of observations in a terminal node
Which measure indicates the proportion of true negative instances that are correctly identified by the model?
specificity
What is the key difference between clustering and previous subjects?
dependent variable is unknown
Which package is used to determine the optimal threshold for classification?
ROCR
Which package is used to determine probabilities from logistic regression?
ROCR