Data Analysis and Predictive Modeling

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/55

Earn XP

Description and Tags

Flashcards based on lecture notes about data imputation, regression analysis, neural networks, clustering, text mining, and prescriptive analytics.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

56 Terms

New cards

Data Imputation

Assigning a value to missing data, often using methods like mean, mode, or predictive models.

New cards

Random Imputation

Choosing a value from existing records to fill in missing data.

New cards

Hot Deck Imputation

Selecting a similar record to impute missing data.

New cards

Predictive Imputation

Using a predictive model to estimate missing values.

New cards

Cold Deck Imputation

Using a separate dataset to find a similar value for imputation.

New cards

Value Substitution

Replacing missing values with a predetermined value like zero or a 'Missing' label.

New cards

Discretization

Converting continuous values into discrete values, often 1 and 0.

New cards

Data Reduction

Discarding variables that are not useful for the analysis.

New cards

Supervised Methods

Dividing a dataset into training and testing sets when there is a target variable to predict.

New cards

Unsupervised Methods

Methods that seek patterns in data without a target variable.

New cards

Overfitting

A model that performs well on training data but poorly on new data.

New cards

Imbalanced Dataset

A dataset where one class has significantly more data than another.

New cards

Balance Dataset

Adjusting the dataset to have a more even distribution of classes, or generating artificial data to reach data parity.

New cards

Regression Analysis

A statistical method used to discover relationships between variables to make predictions.

New cards

Simple Linear Regression

Defines the relationship between a dependent variable and a single independent variable; the goal is to find best fit of the regression line.

New cards

Model Review

Examining residual values, model coefficients, standard error, R-squared values, adjusted R-squared values, and F-statistics.

New cards

Logistic Regression

Used for classification with a binary categorical target variable and numerical independent variables.

New cards

Ensemble Learners

Combines multiple models to improve prediction accuracy and robustness.

New cards

Advantages of Ensemble Learners

Reduces overfitting and performs well with many or few data points by unifying predictions for different models.

New cards

Bagging

Bootstrap aggregating; changes the training set for each model using sampling with replacement.

New cards

Random Forests

An ensemble of decision trees that uses random feature selection.

New cards

Boosting

Trains base models sequentially, assigning weights to training records and focusing on records difficult to classify.

New cards

Neural Networks

Simulates the behavior of neurons for classification and regression tasks.

New cards

Uses of Neural Networks

Pattern recognition, computer vision, and natural language processing.

New cards

Perceptron

A simple form of neural network with no cycles and two layers.

New cards

Activation Function

Decides whether a neuron activates, helping the network learn patterns in the data.

New cards

Backpropagation

Calibrates weights based on the error rate of the previous iteration, with initial weights chosen randomly.

New cards

Gradient Descent

Used in training to find the optimal combination of weights, adjusting them until the error is acceptable.

New cards

Types of Neural Networks

Feedforward, convolutional, transfer learning, recurrent, generative, and autoencoders.

New cards

Clustering Means

Uses clustering to solve problems with descriptive statistics.

New cards

Clustering

Finds natural groupings in a dataset.

New cards

Types of Clusters

Exclusive, overlapping, hierarchical, and probabilistic.

New cards

Types of Clustering Algorithms

Prototype-based, density-based, and hierarchical.

New cards

K-means

Divides the space into K partitions using measures of proximity.

New cards

How K-means Works

Determining K, assigning points to centroids, calculating new centroids, and repeating until all points are assigned.

New cards

Evaluation of K-means

Sum of squared errors, Davies-Bouldin index, and cluster-to-cluster distance.

New cards

DBSCAN

Identifies clusters by measuring the distribution of density in a space.

New cards

DBSCAN Point Types

Core points, border points, and noise points.

New cards

Text Mining

Finding patterns and extracting knowledge from unstructured text data.

New cards

Uses of Text Mining

Classification, clustering, extracting meaning, and extracting information.

New cards

Difficulties of Text Mining

Ambiguity, homonyms, synonyms, misspellings, high dimensionality, and variety of languages.

New cards

Difference Between Text Mining and NLP

Focuses on interaction between human language and computers, while text mining focuses on finding patterns and trends.

New cards

Process of Text Mining

Collecting, preprocessing, and analyzing data.

New cards

Preprocessing Steps in Text Mining

Lowercasing, removing stop words, punctuation, symbols, numbers, and short words; stemming; and considering synonyms and phrases.

New cards

Document Representation

Bag of words, TF-IDF, word embeddings, and context-specific vectors.

New cards

Visualization and Analysis of Text Mining

Frequencies, word clouds, and dendrograms.

New cards

Lexical-Syntactic Processing

Recognizes tokens and normalizes words.

New cards

Semantic Processing

Extracting meaning, identifying sentiments, and analyzing emotions.

New cards

Types of Sentiment Analysis

Levels, aspect-based, emotion detection, and intention analysis.

New cards

Importance of Sentiment Analysis

Market research, customer service, product analytics, and public relations.

New cards

Descriptive Analytics

Analyzes what has already happened.

New cards

Predictive Analytics

Oriented towards the future.

New cards

Prescriptive Analytics

Indicates a decision or suggests the best course of action.

New cards

Uses of Prescriptive Analytics

Optimizes supply chain, demand, production costs, delivery times, resource planning, and inventory management.

New cards

Components of Prescriptive Analytics

Good quality data, neural networks, hardware, and software.

New cards

Types of Prescriptive Analytics Algorithms

Heuristics and exact algorithms.