1/44
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Mode
most frequency value in the dataset
Standard deviation
spread of a dataset relative to the mean
Quartiles
divide the data into 4 sections: 1st quartile; median; and 3rd quartile
Boxplots
min and max on the wiskers; 1st and 3rd quartiles are the box ends; median is the line in the box
Histogram
bar graph showing frequency on the y axis and value on the x axis
Probability of an event
p
Probability of independent events
P(A & B) = P(A) x P(B)
Conditional probability
P(B|A) = P(B & A)/P(A)
Posterior Probability
P(A|B)= P(B|A) P(A) / P(B)
Likelihood Table
table of probabilities
Laplace estimator
Add 1 to eliminate zeros
Entropy
Measure of randomness (purely random E = 1
SSE
Sum of squared errors
Covariance matrix
???????
Correlation matrix
1=strong positive relationship; 0=no relationship; -1=strong negative relationship
Confusion Matrix
table of true positive; false positive; etc.
Matthews Correlation Coefficient
Kappa statistic
Model performance compared to random guessing
Sensitivity
Probability of true positives to all positives in training set
Specificity
Probability of true negatives to all negatives in training set
Precision
Probability of true positives to all predicted positives
Recall
sensitivity Percentage of positive results for searches
F-measure
Balance between precision and recall
ROC curve
Visual plot of true positives again avoiding false positives
AUC
area under the curve
subset()
return dataframe elements that match a condition
lapply()
Apply a function to a list
sample()
Generate random indices over a range
tm_map()
stop words and stem words; used in word corpus
DocumentTermMatrix()
used in word corpus
prop.table()
Return percentages of each category
model()
model( class ~ predictors, data = train )
pairs()
Plot distribution between pairs of features in a dataset
pairs.panels()
From package psych
Normalization
adjust for larger values that may bias classification
Min-Max Normalization
normalize based on the min and max values
Z-Score Standardization
shift mean to 0 and set standard deviation to 1
Dummy Coding
Take nominal values and set up binary choices assigned 0 and 1.
Convert numerical data into a limited number of levels
Thresholding
Numerical values above a threshold given 1
Imputation
Use statistics from one feature to predict missing values for another feature
Enhancing Performance
Meta-learning
Set up model to learn how to learn
Bagging
Use the bootstrap method on an unstable learner (i.e. decision tree); Let the bags vote on a class
Boosting
Generate multiple weak learners; Each learner is trained on a complementary portion of the data to capture examples that are difficult to classify (adaptive boosting); Use a weighted vote between models based on past performance
Random Forest
Train an ensemble of decision trees using different combinations of features within the dataset; Allow models to vote on the predicted class.