AI and DataScience

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/44

There's no tags or description

Looks like no tags are added yet.

Last updated 5:45 AM on 4/29/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

45 Terms

New cards

Mode

most frequency value in the dataset

New cards

Standard deviation

spread of a dataset relative to the mean

New cards

Quartiles

divide the data into 4 sections: 1st quartile; median; and 3rd quartile

New cards

Boxplots

min and max on the wiskers; 1st and 3rd quartiles are the box ends; median is the line in the box

New cards

Histogram

bar graph showing frequency on the y axis and value on the x axis

New cards

Probability of an event

New cards

Probability of independent events

P(A & B) = P(A) x P(B)

New cards

Conditional probability

P(B|A) = P(B & A)/P(A)

New cards

Posterior Probability

P(A|B)= P(B|A) P(A) / P(B)

New cards

Likelihood Table

table of probabilities

New cards

Laplace estimator

Add 1 to eliminate zeros

New cards

Entropy

Measure of randomness (purely random E = 1

New cards

SSE

Sum of squared errors

New cards

Covariance matrix

???????

New cards

Correlation matrix

1=strong positive relationship; 0=no relationship; -1=strong negative relationship

New cards

Confusion Matrix

table of true positive; false positive; etc.

New cards

Matthews Correlation Coefficient

New cards

Kappa statistic

Model performance compared to random guessing

New cards

Sensitivity

Probability of true positives to all positives in training set

New cards

Specificity

Probability of true negatives to all negatives in training set

New cards

Precision

Probability of true positives to all predicted positives

New cards

Recall

sensitivity Percentage of positive results for searches

New cards

F-measure

Balance between precision and recall

New cards

ROC curve

Visual plot of true positives again avoiding false positives

New cards

AUC

area under the curve

New cards

subset()

return dataframe elements that match a condition

New cards

lapply()

Apply a function to a list

New cards

sample()

Generate random indices over a range

New cards

tm_map()

stop words and stem words; used in word corpus

New cards

DocumentTermMatrix()

used in word corpus

New cards

prop.table()

Return percentages of each category

New cards

model()

model( class ~ predictors, data = train )

New cards

pairs()

Plot distribution between pairs of features in a dataset

New cards

pairs.panels()

From package psych

New cards

Normalization

adjust for larger values that may bias classification

New cards

Min-Max Normalization

normalize based on the min and max values

New cards

Z-Score Standardization

shift mean to 0 and set standard deviation to 1

New cards

Dummy Coding

Take nominal values and set up binary choices assigned 0 and 1.

Convert numerical data into a limited number of levels

New cards

Thresholding

Numerical values above a threshold given 1

New cards

Imputation

Use statistics from one feature to predict missing values for another feature

New cards

Enhancing Performance

New cards

Meta-learning

Set up model to learn how to learn

New cards

Bagging

Use the bootstrap method on an unstable learner (i.e. decision tree); Let the bags vote on a class

New cards

Boosting

Generate multiple weak learners; Each learner is trained on a complementary portion of the data to capture examples that are difficult to classify (adaptive boosting); Use a weighted vote between models based on past performance

New cards

Random Forest

Train an ensemble of decision trees using different combinations of features within the dataset; Allow models to vote on the predicted class.