1/84
Intro to business data and analytics NIU comprehensive exam
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Business Analytics
The process of analyzing data to gain valuable insights & inform business decisions
Descriptive Analytics
What has happened in the past?
Predictive Analytics
What could happen in the future?
Prescriptive Analytics
What should we do now?
For descriptive analytics we…
Gather, organize, visualize, and tabulate
For predictive analytics we…
Use statistical models to predict specific outcomes or likelihood
Data Privacy
Branch of data that is SECURITY related to the proper collection, usage, and transmission of data
What are the 3 key principles of data privacy?
Confidentiality, transparency, and accountability
Data Ethics
Studies moral problems related to data
AI
Aims to create machine capable of performing tasks requiring human intelligence
Generative AI
AI techniques that focuses on creating new content
Data
Compiled of facts, figures, and other content
Cross-sectional data
Record a characteristic of individual upon the same time
Time series data
Collected over several time periods focusing on certain groups
Structured/Tabular data
Has a predefined row-column format
Unstructured data
Do not conform to a predefined row-column format
Characteristics of Big Data
Volume, velocity, variety
Interval
0 is another point in the scale does not mean absence
Ratio
0 is a true point and represents complete absence
Nominal
No natural order
Ordinal
Contains a natural order
Discrete
Countable
Continuous
Measured, not counted
Data Wrangling
Transform raw data in a format that is easier to analyze
Data management
Process that is used to acquire, organize, store, manipulate, and distribute data
Mean
Sum of all observations divided by the size
Median
The middle value of sorted data
Mode
The most frequent observation
Percentile
Value by which the data falls under
5 number summary
Minimum, Q1:25th, Q2:50th (Median), Q3:75th, Maximum
Range
= Maximum-Minimum
Mean absolute difference
Calculates the absolute differences from the mean
Variance
Calculates the average of squared differences from mean
Standard Deviation
Square root of variance
Coefficient of variation
St. dev/ mean used to compare dispersion of data from many sets
Skewness coefficient
Measures asymmetry about the mean (focus on the tail)
Kurtosis Coefficient
Measures Tailness (short tail/long tail)
Excess Kurtosis
Compares tails from normal distribution
Covariance
Measures how two variables vary together in a direction
Correlation Coefficient
Measures the direction and strength of two variables
Sample Space
the set of all possible outcomes
Event
Subset of a sample space
Mutually exclusive events
Events that share no outcome
Mutually exclusive & exhaustive
Events that are mutually exclusive but together cover the entire sample space
Exhaustive event
The same as the sample space
The union of events
Contains outcomes in A OR B
Intersection of events
Outcomes must be both in A AND B
Complement of an event
Set of outcomes NOT in the event
What is the Range of probability ?
0<p<1
What are the 3 ways that probability is estimated ?
Subjective, Empirical, classical
Correlation Coeff. will always be between ?
-1 & 1
Complement Rule
P(A^c) = 1 - P(A)
The Additive Rule
P(A or B) = P(A) + P(B) - P(A and B)
Conditional Probability
the probability that one event happens given that another event is already known to have happened
Conditional Prob Rule
P(AIB)= P(A and B)/P(B)
Normal Distribution has a kurtosis of…
3
Linear Regression
Has a linear basis, draws a line that fits closest average distance to each point
Linear regression model
y = β0 + β1X
X:
Predictor, independent variable, feature, observation
Y:
Outcome, dependent variable, label
If the coefficient B1 is positive…
Relationship is + which means y is increasing
If the coefficient B1 is negative…
Relationship is - which means y is decreasing
If the coefficient B1 is zero…
There is no relationship Y is constant
Multiple Linear Regression
Incorporates more than one predictor and coefficient
Multiple Linear Regression model
y = β0 + β1X1 + β2X2 + ... + βnXn
Goodness of the fit
Evaluates the closeness between observed values and the values expected under a model
Coefficient of Determination also known as…
R²
R² definition
The proportion of variation in y that the model explains
If R² equals 1
Means the model explains all the variation in y perfectly
If R² equals 0
Means the model explains no variation in y (useless)
If P-value < Threshold
Significant
If P-value > Threshold
Not significant
P-value threshold is usually…
0.05
Interaction Equation
Ĺ· = b0 + b1X1 + b2X2 + MX1X2
To model interaction…
We add a new term to the regression which is the product of two interacting variables
Interaction
Focuses on the combined effect of how the variables influences an outcome. Effect of 1 can depend on the level of another variable.
Dummy Variable
A numerical value used to represent categorical data, 0 or 1
Quadratic Equation
y= ax²+bx+c
Quadratic Phenomenon
When the direction of the effect changes from + to - or - to +
Quadratic Regression Model
y=β0​+β1​x+β2​x2+…..+ dx²1
If B2 < 0
Inverted U so a maximum
If B2 > 0
U-shaped so a minimum
To find the exact pt for optimization
X= - b1/ 2*b2
What is the main limitation of linear regression?
It assumes the relationships are linear
What is a measurement of dispersion?
IQR