CPSC 375: Introduction to Data Science and Big Data Analytics

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/17

Earn XP

Description and Tags

These flashcards cover key vocabulary and concepts related to data science, particularly focused on regression analysis and model evaluation.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

Data Science Process

A systematic approach for collecting, structuring, and analyzing data to gain insights.

New cards

Linear Regression Model

A statistical method to model the relationship between a dependent variable and one or more explanatory variables by fitting a linear equation.

New cards

Coefficients

Values that represent the relationship between independent variables and the dependent variable in regression models.

New cards

Residuals

The differences between observed and predicted values in a regression model.

New cards

Sum of Squares Error (SSE)

A measure of the total deviation of the response values from the fit to the response values, indicating unexplained variation.

New cards

Adjusted R-squared

A modified version of R-squared that adjusts for the number of predictors in a model, preventing overfitting.

New cards

Dummy Variables

Binary variables created to represent categories of a qualitative variable, used in regression models.

New cards

Least Squares Method

A statistical technique used to estimate the parameters of a linear regression model by minimizing the sum of squared residuals.

New cards

Prediction Interval

An estimate of the range in which new observations are expected to fall, given a certain probability.

New cards

Confidence Interval

An estimate of the range in which the true mean of the dependent variable is expected to fall, given a certain probability.

New cards

Outlier

An observation that deviates significantly from the other data points, which may indicate an error or an unusual occurrence.

New cards

Nonlinear Regression

A form of regression analysis in which data fit a model described by a nonlinear equation.

New cards

Exploratory Data Analysis (EDA)

The process of analyzing data sets to summarize their main characteristics, often visualizing them to gain insights.

New cards

Model Fit

A term that refers to how well a regression line approximates the real data points.

New cards

Variance Inflation Factor (VIF)

A measure of how much the variance of a regression-coefficient estimate is increased due to multicollinearity.

New cards

Homoscedasticity

A characteristic of a dataset in which the variance of the errors is constant across all levels of the independent variable.

New cards

Heteroscedasticity

A condition in which the variance of errors differs across levels of the independent variable.

New cards

P-value

The probability that the observed results would occur by chance if the null hypothesis were true, helping to determine statistical significance.