BUSN 3000 1-4

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/164

There's no tags or description

Looks like no tags are added yet.

Last updated 11:20 PM on 3/26/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

165 Terms

New cards

Response Variable

What we would like to predict

New cards

Explanatory Variable

Variables used to calculate predictions

New cards

Example of Response Variable

Amount spent by online customers

New cards

Example of Explanatory Variables

Number of employees, type of industry, etc.

New cards

Rows are

Rows are Horizontal

New cards

Columns are

Columns are vertical

New cards

Cases are in

Cases are in rows

New cards

Variables are in

Columns

New cards

Quantitative Variable

Tells us how much of something was measured and quantifies exactly how far apart individual items are

New cards

Examples of Quantitative Variables

Height, weight, salary, score, distance, time, GPA

New cards

Categorical Variable

Separate distinct categories that can’t specify exactly how far apart 2 items are, or do math to compute the average

New cards

Examples of Categorical Variables

Gender, race, nationality, hair color, student ID, class grade, zip code

New cards

Identifier

A unique code assigned to each individual or item, listed in the first column of the data table

New cards

Examples of Identifiers

Social security, student ID numbers, transaction numbers

New cards

Time Series Data

Data that consists of the same item measured repeatedly

New cards

Example of Time Series Data

The price of Bitcoin at the end of each day for a year, Monthly Inflation Rate

New cards

To qualify as time series, we should

To qualify as time series, we should be able to plot the data as a line with time on the X-axis

New cards

Cross Sectional Data

Data that is measured only once

New cards

Examples of Cross Sectional Data

A household income study for one year, health snapshots

New cards

What is EDA?

Exploratory data analysis: examines data for patterns, underlying structure, trends, deviations from the trend etc

New cards

To Display Categorical Data in R use:

Bar Charts or Pie Charts

New cards

Bar Charts

Shows the counts for each category

New cards

How to make a Bar Chart:

barplot(table(your variable), main = “Your Title”, xlab = “Your Label”)

New cards

Pie Charts

Pie charts should be used when the focus is on percentages rather than actual counts, (market share)

New cards

How to make a Pie Chart:

pie(table(Your Variable), main = “Your Title”)

New cards

To display Quantitative Variables

To display Quantitative Variables, use a Histogram or Boxplot

New cards

How to make a Histogram

hist(Your Variable, main = “Your Title”, xlab = “Your Label”)

New cards

How to make a Boxplot

boxplot(Your Variable, main = “Your Title”, xlab = “Your Label”)

New cards

Modes

Peaks or humps seen in a histogram are called the modes

New cards

Unimodal

A distribution whose histogram has one main peak

New cards

Bimodal

A distribution whose histogram has two main peaks

New cards

Multimodal

A distribution whose histogram has three or more peaks

New cards

Uniform Histogram

All the bars are approximately the same height, there is no mode

New cards

Symmetric

A distribution is symmetric if the halves on either side of the center look approximately like mirror images

New cards

Skewness

If one tail stretches out longer than the other, the distribution is said to be skewed to the side of the longer tail

New cards

Mean

The average, used to measure the typical value for unimodal, symmetric distributions

New cards

Median

If the data set is skewed or contains outliers: it is better to use the median as a measure of the “typical value”

New cards

Consider the following salaries:

56, 46, 48, 60, 150 which is the best measure of the typical salary ?

The median of 56

New cards

If a distribution is roughly symmetric, the ___ and ____

If a distribution is roughly symmetric, the mean and median will be reasonably similar

New cards

But in a skewed distribution

But in a skewed distribution, the mean always gets pulled towards the longer tail

New cards

The more ___ the values the…

The more spread out the values, the bigger the prediction errors and the less accurate the statistical models

New cards

The two main measures of spread are:

Standard deviation and IQR

New cards

Standard Deviation

A measure of the average distance of points from the mean or center

New cards

The more spread out the points the farther the…

The more spread out the points, the farther the average distance from the mean and the greater the standard deviation

New cards

If all points are close to the center

If all points are close to the center, the standard deviation will be small

New cards

Standard deviation is very sensitive to

Outliers

New cards

IQR

The IQR, Q3-Q1, indicates how far apart the middle 50% is spread out

New cards

When should mean and standard deviation be used?

The mean and standard deviation should be used when the shape is unimodal, symmetric, and no outliers are present

New cards

When should the Median and IQR be used?

The median and IQR should be used if the shape is skewed, or if there are outliers

New cards

If ALL the data points increase/decrease by a constant value

The mean and median will increase/decrease. The IQR and SD will stay the same

New cards

If SOME of the data points increase/decrease

The median and IQR stay the same as long a none of the points cross the median, Q1 or Q3

New cards

Five Number Summary

Min, Q1, Median, Q3, Max

New cards

Boxplot: If the median is exactly halfway between Q1 and Q3 the data is

If the median is exactly halfway between Q1 and Q3 the data is symmetric

New cards

Boxplot: If the Median isn’t exactly symmetric

The data is skewed in the direction of the longer distance

New cards

When the whiskers have different lengths, the longer whisker also indicates the direction of the

When the whiskers have different lengths, the longer whisker also indicates the direction of the skewness

New cards

Parallel Boxplots provide a good method of…

Parallel boxplots provide a good method of comparing a quantitative variable across different categories of another variable

New cards

tapply function

tapply(variable to be analyzed, grouping variable, function)

New cards

Z-Score

Tells us how many standard deviations a data point is from its mean

New cards

Formula for Z score

z = value - mean / SD

New cards

Time Series Plot

A graph of a time series data set, a special type of line graph in which the X-axis is time

New cards

Probability

A useful tool to quantify uncertainty, providing an objective rationale for decision-making

New cards

Example of Probability in Business Sales

Sales: determine which factors increase the probability of making a sale

New cards

Example of Probability in Business Accounting:

Accounting: Identify scenarios most likely to involve fraud to efficiently allocate investigative resources

New cards

Example of Probability in Risk Management:

Risk Management: Calculate the probability of various disruptions and how severely they would affect the company

New cards

Probability can be interpreted as

Probability can be interpreted as the long-run frequency of events to occur

New cards

Basic Probability Rules: #1

Probability is a number between 0 and 1

New cards

Basic Probability Rules: #2

Probabilities sum to 1

New cards

Basic Probability Rules: #3

The Complement rule: P(A) = 1 - P(A^c)

New cards

Basic Probability Rules: #4

The Addition rule: P(A or B) = P(A) + P(B)

New cards

Basic Probability Rules: #5

The General Addition rule: P(A or B) = P(A) + P(B) - P(A intersection B)

New cards

Conditional Probability

Conditional probability is the probability of one event (A), given that another event (B) is known to have occurred

New cards

Conditional Probability Formula

P(A|B) = 𝑃(𝐴 ∩ B) / P(B)

New cards

Example of Conditional Probability Scenario

Suppose you run an online retail business. You want to understand how likely a customer is to make a purchase given that they have added items to their shopping cart

New cards

Independent Events

Events are said to be independent if the probability of one event occurring has no effect on the probability of the other

New cards

Multiplication Rule

Multiplication Rule says that A and B are independent if: P (A and B) = P(A) x P(B)

New cards

Random Variables

A random variable specifies the probability of outcomes which are random (not known with certainty)

New cards

Example of Random Variables

An inflation rate in 5 years’ time, the monthly sales of a particular cellphone in 2 years’ time

New cards

Discrete Variables

A discrete variable is a numerical variable that takes only specific, countable values

New cards

Discrete Random Variable

A discrete random variable is a variable that counts outcomes of a random process and can take only specific, separate values

New cards

Continuous Variable

A continuous variable is a numerical variable that can take infinitely many possible values within a given interval

New cards

Normal Distribution

A normal distribution is a bell-shaped, symmetric distribution where most values cluster around the average, and fewer values occur as you move away from the center

New cards

Normal Distributions follow the

Normal distributions follow the 68-95-99.7 Rule

New cards

68-95-99.7 Rule

68% of the values fall within 1 SD of the mean

95% of the values fall within 2 SDs of the mean

99.7% of the values fall within 3 SDs of the mean

New cards

Normal Probabilities in R

pnorm(value, mean, SD) gives the lower probability by default

New cards

How do we get the upper probability of a Normal Distribution Model

To get the upper probability, we can either subtract the answer from1, or include the option lower.tail=F

New cards

Normal Distribution Cutoff Values

A cutoff value is the value of a variable corresponding to a specified percentile or probability in a normal distribution

New cards

Example of Normal Distribution Cutoff Values

Finding top 10% or bottom 5% of performers

New cards

Cutoff Values in R

qnorm(left tail probability, mean, SD)

New cards

Expected Value

The expected value, E(X), of a random variable X is the mean or average value of X over all possible outcomes

New cards

The Standard Deviation of a Random Variable

The standard deviation of a random variable is its long-run average deviation from the mean, where each deviation is weighted by its probability to occur

New cards

Variance of a Random Variable

The variance of a random variable is the average squared deviation from the mean, with each deviation weighted by its probability to occur

New cards

Law of Large Numbers

The law of large numbers states that as the sample size increases, the sample mean will converge to the mean of the population. Thus, larger sample sizes are guaranteed to produce results that are close to the population, while smaller sample sizes might have a mean that is considerably different

New cards

Empirical Distribution

An empirical distribution is the distribution of a dataset based on observed values and their frequencies or proportions

New cards

Probability Distribution Basis

Theoretical Model

New cards

Probability Distribution Information

Provides complete probability information regarding all outcomes

New cards

Probability Distribution Constant

Yes: based on theoretical assumptions