1/55
Flashcards based on lecture notes about data imputation, regression analysis, neural networks, clustering, text mining, and prescriptive analytics.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Imputation
Assigning a value to missing data, often using methods like mean, mode, or predictive models.
Random Imputation
Choosing a value from existing records to fill in missing data.
Hot Deck Imputation
Selecting a similar record to impute missing data.
Predictive Imputation
Using a predictive model to estimate missing values.
Cold Deck Imputation
Using a separate dataset to find a similar value for imputation.
Value Substitution
Replacing missing values with a predetermined value like zero or a 'Missing' label.
Discretization
Converting continuous values into discrete values, often 1 and 0.
Data Reduction
Discarding variables that are not useful for the analysis.
Supervised Methods
Dividing a dataset into training and testing sets when there is a target variable to predict.
Unsupervised Methods
Methods that seek patterns in data without a target variable.
Overfitting
A model that performs well on training data but poorly on new data.
Imbalanced Dataset
A dataset where one class has significantly more data than another.
Balance Dataset
Adjusting the dataset to have a more even distribution of classes, or generating artificial data to reach data parity.
Regression Analysis
A statistical method used to discover relationships between variables to make predictions.
Simple Linear Regression
Defines the relationship between a dependent variable and a single independent variable; the goal is to find best fit of the regression line.
Model Review
Examining residual values, model coefficients, standard error, R-squared values, adjusted R-squared values, and F-statistics.
Logistic Regression
Used for classification with a binary categorical target variable and numerical independent variables.
Ensemble Learners
Combines multiple models to improve prediction accuracy and robustness.
Advantages of Ensemble Learners
Reduces overfitting and performs well with many or few data points by unifying predictions for different models.
Bagging
Bootstrap aggregating; changes the training set for each model using sampling with replacement.
Random Forests
An ensemble of decision trees that uses random feature selection.
Boosting
Trains base models sequentially, assigning weights to training records and focusing on records difficult to classify.
Neural Networks
Simulates the behavior of neurons for classification and regression tasks.
Uses of Neural Networks
Pattern recognition, computer vision, and natural language processing.
Perceptron
A simple form of neural network with no cycles and two layers.
Activation Function
Decides whether a neuron activates, helping the network learn patterns in the data.
Backpropagation
Calibrates weights based on the error rate of the previous iteration, with initial weights chosen randomly.
Gradient Descent
Used in training to find the optimal combination of weights, adjusting them until the error is acceptable.
Types of Neural Networks
Feedforward, convolutional, transfer learning, recurrent, generative, and autoencoders.
Clustering Means
Uses clustering to solve problems with descriptive statistics.
Clustering
Finds natural groupings in a dataset.
Types of Clusters
Exclusive, overlapping, hierarchical, and probabilistic.
Types of Clustering Algorithms
Prototype-based, density-based, and hierarchical.
K-means
Divides the space into K partitions using measures of proximity.
How K-means Works
Determining K, assigning points to centroids, calculating new centroids, and repeating until all points are assigned.
Evaluation of K-means
Sum of squared errors, Davies-Bouldin index, and cluster-to-cluster distance.
DBSCAN
Identifies clusters by measuring the distribution of density in a space.
DBSCAN Point Types
Core points, border points, and noise points.
Text Mining
Finding patterns and extracting knowledge from unstructured text data.
Uses of Text Mining
Classification, clustering, extracting meaning, and extracting information.
Difficulties of Text Mining
Ambiguity, homonyms, synonyms, misspellings, high dimensionality, and variety of languages.
Difference Between Text Mining and NLP
Focuses on interaction between human language and computers, while text mining focuses on finding patterns and trends.
Process of Text Mining
Collecting, preprocessing, and analyzing data.
Preprocessing Steps in Text Mining
Lowercasing, removing stop words, punctuation, symbols, numbers, and short words; stemming; and considering synonyms and phrases.
Document Representation
Bag of words, TF-IDF, word embeddings, and context-specific vectors.
Visualization and Analysis of Text Mining
Frequencies, word clouds, and dendrograms.
Lexical-Syntactic Processing
Recognizes tokens and normalizes words.
Semantic Processing
Extracting meaning, identifying sentiments, and analyzing emotions.
Types of Sentiment Analysis
Levels, aspect-based, emotion detection, and intention analysis.
Importance of Sentiment Analysis
Market research, customer service, product analytics, and public relations.
Descriptive Analytics
Analyzes what has already happened.
Predictive Analytics
Oriented towards the future.
Prescriptive Analytics
Indicates a decision or suggests the best course of action.
Uses of Prescriptive Analytics
Optimizes supply chain, demand, production costs, delivery times, resource planning, and inventory management.
Components of Prescriptive Analytics
Good quality data, neural networks, hardware, and software.
Types of Prescriptive Analytics Algorithms
Heuristics and exact algorithms.