Principles of Data Analytics Exam 2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/84

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

85 Terms

1
New cards

Bernoulli Distribution

The probability distribution of a random variable with two possible outcomes, each with a constant probability of occurrence.

2
New cards

Binomial Distribution

The distribution that models n independent replications of a Bernoulli experiment, each with a probability p of success.

3
New cards

Chi-square critical value

A threshold for statistical significance for certain hypothesis tests and defines confidence intervals for certain parametersA

4
New cards

Chi-square goodness of fit test

A statistical hypothesis test used to determine whether a variable is likely to come from a specified distribution or not.

5
New cards

Chi-square statistic

The sum of squares of the differences between observed frequency, fo, and expected frequency, fe, divided by the expected frequency in each cell.

6
New cards

Combination

Unique outcomes of predictable event.

7
New cards

Complement

The set of all outcomes in the sample space that is not included in the event.

8
New cards

Conditional Probability

The probability of occurrence of one event A, given that another event B is known to be true or has already occurred.

9
New cards

Continuous Random Variable

A random variable that has outcomes over one or more continuous intervals of real numbers.

10
New cards

Cumulative Distribution Function

A specification of the probability that the random variable X assumes a value less than or equal to a specified value x.

11
New cards

Discrete Random Variable

A random variable for which the number of possible outcomes can be counted.

12
New cards

Discrete Uniform Distribution

A variation of the uniform distribution for which the random variable is restricted to integer values between a and b (also integers).

13
New cards

Empirical Probability Distribution

An approximation of the probability distribution of the associated random variable.

14
New cards

Event

A collection of one or more outcomes from a sample space.

15
New cards

Expected Value

The notion of the mean or average of a random variable; the weighted average of all possible outcomes, where the weights are the probabilities.

16
New cards

Experiment

A process that results in an outcome.

17
New cards

Exponential Distribution

A continuous distribution that models the time between randomly occurring events.

18
New cards

Goodness of Fit

A procedures that attempts to draw a conclusion about the nature of a distribution.

19
New cards

Independent Events

Events that do not affect the occurrence of each other.

20
New cards

Intersection

A composition with all outcomes belonging to both events.

21
New cards

Joint Probability

The probability of the intersection of two events.

22
New cards

Joint Probability Table

A table that summarizes joint probabilities.

23
New cards

Marginal Probability

The probability of an event irrespective of the outcome of the other joint event.

24
New cards

Multiplication Law of Probability

The probability of two events A and B is the product of the probability of A given B, and the probability of B (or) the product of the probability of B given A, and the probability of A.

25
New cards

Mutually Exclusive

Events with no outcomes in common.

26
New cards

Normal Distribution

A continuous distribution described by the familiar bell-shaped curve and is perhaps the most important distribution used in statistics.

27
New cards

Outcome

A result that can be observed.

28
New cards

Permutation

A mathematical technique that determines the number of possible arrangements in a set when the order of the arrangements matters.

29
New cards

Poisson Distribution

A discrete distribution used to model the number of occurrences in some unit of measure.

30
New cards

Probability

The likelihood that an outcome occurs.

31
New cards

Probability Density Function

The distribution that characterizes outcomes of a continuous random variable.

32
New cards

Probability Distribution

The characterization of the possible values that a random variable may assume along with the probability of assuming these values.

33
New cards

Probability Mass Function

The probability distribution of the discrete outcomes for a discrete random variable X.

34
New cards

Random Variable

A numerical description of the outcome of an experiment.

35
New cards

Sample Space

The collection of all possible outcomes of an experiment.

36
New cards

Standard Normal Distribution

A normal distribution with mean 0 and standard deviation 1.

37
New cards

Tree Diagram

A visualization of a multistep experiment. It can help with counting the outcomes.

38
New cards

Triangular Distribution

Defined by three parameters: the minimum, a; maximum, b; and most likely, c.

39
New cards

Uniform Distribution

A function that characterizes a continuous random variable for which all outcomes between some minimum and maximum value are equal likely.

40
New cards

Union

A composition of all outcomes that belongs to either of two events.

41
New cards

Autocorrelation

Correlation among successive observation over time and identified by residual plots having clusters of residuals with the same sign. Autocorrelation can be evaluated more formally using a statistical test based on the measure, Durbin—Watson statistic.

42
New cards

Coefficient of Determination (R²)

The tool gives the proportion of variation in the dependent variable that is explained by the independent variable of the regression model and has the value between 0 and 1.

43
New cards

Coefficient of multiple determination

Similar to simple linear regression, the tool explains the percentage of variation in the dependent variable. The coefficient of multiple determination in the context of multiple regression indicates the strength of association between the dependent and independent variables.

44
New cards

Curvilinear Regression Model

The model is used in forecasting when the independent variable is time.

45
New cards

Dummy Variables

A numerical variable used in regression analysis to represent subgroups of the sample in the study.

46
New cards

Exponential Function

Exponential functions have the property that y rises or falls at constantly increasing rates. y = abx

47
New cards

Homoscedasticity

The assumption means that the variation about the regression line is constant for all values of the independent variable. The data is evaluated by examining the residual plot and looking for large differences in the variances at different values of the independent variable.

48
New cards

Interaction

Occurs when the effect of one variable (i.e., the slope) is dependent on another variable.

49
New cards

Least-squares regression

The mathematical basis for the best-fitting regression line.

50
New cards

Linear function

Linear functions show steady increase or decrease over the range of x and used in predictive models. y = a +bx

51
New cards

Logarithmic Function

Used when the rate of change in a variable increases or decreases quickly and then levels out, such as with diminishing returns to scale. y = ln(x)

52
New cards

Multicollinearity

A condition occurring when two or more independent variable in the same regression model contain high levels of the same information and, consequently, are strongly correlated with one another and can predict each other better than the dependent variable.

53
New cards

Multiple Correlation Coefficient

Multiple R and R Square (or ) in the context of multiple regression indicate the strength of association between the dependent and independent variables.

54
New cards

Multiple linear regression

A linear regression model with more than one independent variable. Simple linear regression is just a special case of multiple linear regression.

55
New cards

Overfitting

If too many terms are added to the model, then the model may not adequately predict other values from the population. Overfitting can be mitigated by using good logic, intuition, physical or behavioral theory, and parsimony.

56
New cards

Parsimony

A model with the fewest number of explanatory variables that will provide an adequate interpretation of the dependent variable.

57
New cards

Partial Regression coefficient

Represent the expected change in the dependent variable when the associated independent variable is increased by one unit while the values of all other independent variable are held constant.

58
New cards

Polynomial function

y = ax² + bx + c (second order—quadratic function), y = ax³ + bx² + dx + e (third order—cubic function), and so on. A second order polynomial is parabolic in nature and has only one hill or valley; a third order polynomial has one or two hills or valleys. Revenue models that incorporate price elasticity are often polynomial functions.

59
New cards

Power Function

Define phenomena that increase at a specific rate. Learning curves that express improving times in performing a task are often modeled with power function having a > 0 and b < 0. y = axbA

60
New cards

(R-squared)

A measure of the “fit” of the line to the data; the value of will be between 0 and 1. The larger the value of , the better the fit.

61
New cards

Regression analysis

A tool for building mathematical and statistical models that characterize relationships between a dependent variable and one or more independent, or explanatory, variables, all of which are numerical.

62
New cards

Residuals

Observed errors which are the differences between the actual values and the estimated values of the dependent variable using the regression equation.

63
New cards

Significance of Regression

A simple hypothesis test checks whether the regression coefficient is zero.

64
New cards

Simple Linear Regression

A tool used to find a linear relationship between one independent variable, X, and one dependent variable, Y.

65
New cards

Standard Error of the Estimate, SYX

The variability of the observed Y-values from the predicted values.

66
New cards

Standard Residuals

Residuals divided by their standard deviation. Standard residuals describe how far each residual is from its mean in units of standard deviations.

67
New cards

Cyclical Effect

Characteristic of a time series that describes ups and downs over a much longer time frame, such as several years.

68
New cards

Delphi Method

A forecasting approach that uses a panel of experts, whose identities are typically kept confidential from one another, to respond to a sequence of questionnaires to converge to an opinion of a future forecast.

69
New cards

Double Exponential Smoothing

A forecasting approach similar to simple exponential smoothing used for time series with a linear trend and no significant seasonal components.

70
New cards

Econometric Model

Explanatory/casual models that seek to identify factors that explain statistically the patterns observed in the variable being forecast.

71
New cards

Historical Analogy

A forecasting approach in which a forecast is obtained through a comparative analysis with a previous situation.

72
New cards

Holt-Winters Models

Forecasting models similar to exponential smoothing models in that smoothing constants are used to smooth out variations in the level and seasonal patterns over time.

73
New cards

Index

A single measure that weighs multiple indicators, thus providing a measure of overall expectation.

74
New cards

Indicator

Measures that are believed to influence the behavior of a variable we wish to forecast.

75
New cards

Mean Absolute Deviation (MAD)

The absolute difference between the actual value and the forecast, averaged over a range of forecasted values.

76
New cards

Mean Absolute Percentage Error (MAPE)

The average of absolute errors divided by actual observation values.

77
New cards

Mean Square Error (MSE)

The average of the square of the differences between the actual value and the forecast.

78
New cards

Root Mean Square Error (RMSE)

The square root of mean square error (MSE).

79
New cards

Seasonal Effect

Characteristic of a time series that repeats at fixed intervals of time, typically a year, month, week, or day.

80
New cards

Simple Exponential Smoothing

An approach for short-range forecasting that is a weighed average of the most recent forecast and actual value.

81
New cards

Simple Moving Average

A smoothing method based on the idea of averaging random fluctuations in the time series to identify the underlying direction in which the time series is changing.

82
New cards

Smoothing Constant

A value between 0 and 1 used to weight exponential smoothing forecasts.

83
New cards

Stationary Time Series

A time series that does not have trend, seasonal, or cyclical effects but is relatively constant and exhibits only random behavior.

84
New cards

Time Series

A stream of historical data.

85
New cards

Trend

A gradual upward or downward movement of a time series over time.