BUSN 3000 1-4

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/164

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 11:20 PM on 3/26/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

165 Terms

1
New cards

Response Variable

What we would like to predict

2
New cards

Explanatory Variable

Variables used to calculate predictions

3
New cards

Example of Response Variable

Amount spent by online customers

4
New cards

Example of Explanatory Variables

Number of employees, type of industry, etc.

5
New cards

Rows are

Rows are Horizontal

6
New cards

Columns are

Columns are vertical

7
New cards

Cases are in

Cases are in rows

8
New cards

Variables are in

Columns

9
New cards

Quantitative Variable

Tells us how much of something was measured and quantifies exactly how far apart individual items are

10
New cards

Examples of Quantitative Variables

Height, weight, salary, score, distance, time, GPA

11
New cards

Categorical Variable

Separate distinct categories that can’t specify exactly how far apart 2 items are, or do math to compute the average

12
New cards

Examples of Categorical Variables

Gender, race, nationality, hair color, student ID, class grade, zip code

13
New cards

Identifier

A unique code assigned to each individual or item, listed in the first column of the data table

14
New cards

Examples of Identifiers

Social security, student ID numbers, transaction numbers

15
New cards

Time Series Data

Data that consists of the same item measured repeatedly

16
New cards

Example of Time Series Data

The price of Bitcoin at the end of each day for a year, Monthly Inflation Rate

17
New cards

To qualify as time series, we should

To qualify as time series, we should be able to plot the data as a line with time on the X-axis

18
New cards

Cross Sectional Data

Data that is measured only once

19
New cards

Examples of Cross Sectional Data

A household income study for one year, health snapshots

20
New cards

What is EDA?

Exploratory data analysis: examines data for patterns, underlying structure, trends, deviations from the trend etc

21
New cards

To Display Categorical Data in R use:

Bar Charts or Pie Charts

22
New cards

Bar Charts

Shows the counts for each category

23
New cards

How to make a Bar Chart:

barplot(table(your variable), main = “Your Title”, xlab = “Your Label”)

24
New cards

Pie Charts

Pie charts should be used when the focus is on percentages rather than actual counts, (market share)

25
New cards

How to make a Pie Chart:

pie(table(Your Variable), main = “Your Title”)

26
New cards

To display Quantitative Variables

To display Quantitative Variables, use a Histogram or Boxplot

27
New cards

How to make a Histogram

hist(Your Variable, main = “Your Title”, xlab = “Your Label”)

28
New cards

How to make a Boxplot

boxplot(Your Variable, main = “Your Title”, xlab = “Your Label”)

29
New cards

Modes

Peaks or humps seen in a histogram are called the modes

30
New cards

Unimodal

A distribution whose histogram has one main peak

31
New cards

Bimodal

A distribution whose histogram has two main peaks

32
New cards

Multimodal

A distribution whose histogram has three or more peaks

33
New cards

Uniform Histogram

All the bars are approximately the same height, there is no mode

34
New cards

Symmetric

A distribution is symmetric if the halves on either side of the center look approximately like mirror images

35
New cards

Skewness

If one tail stretches out longer than the other, the distribution is said to be skewed to the side of the longer tail

36
New cards

Mean

The average, used to measure the typical value for unimodal, symmetric distributions

37
New cards

Median

If the data set is skewed or contains outliers: it is better to use the median as a measure of the “typical value”

38
New cards

Consider the following salaries:

56, 46, 48, 60, 150 which is the best measure of the typical salary ?

The median of 56

39
New cards

If a distribution is roughly symmetric, the ___ and ____

If a distribution is roughly symmetric, the mean and median will be reasonably similar

40
New cards

But in a skewed distribution

But in a skewed distribution, the mean always gets pulled towards the longer tail

41
New cards

The more ___ the values the…

The more spread out the values, the bigger the prediction errors and the less accurate the statistical models

42
New cards

The two main measures of spread are:

Standard deviation and IQR

43
New cards

Standard Deviation

A measure of the average distance of points from the mean or center

44
New cards

The more spread out the points the farther the…

The more spread out the points, the farther the average distance from the mean and the greater the standard deviation

45
New cards

If all points are close to the center

If all points are close to the center, the standard deviation will be small

46
New cards

Standard deviation is very sensitive to

Outliers

47
New cards

IQR

The IQR, Q3-Q1, indicates how far apart the middle 50% is spread out

48
New cards

When should mean and standard deviation be used?

The mean and standard deviation should be used when the shape is unimodal, symmetric, and no outliers are present

49
New cards

When should the Median and IQR be used?

The median and IQR should be used if the shape is skewed, or if there are outliers

50
New cards

If ALL the data points increase/decrease by a constant value

The mean and median will increase/decrease. The IQR and SD will stay the same

51
New cards

If SOME of the data points increase/decrease

The median and IQR stay the same as long a none of the points cross the median, Q1 or Q3

52
New cards

Five Number Summary

Min, Q1, Median, Q3, Max

53
New cards

Boxplot: If the median is exactly halfway between Q1 and Q3 the data is

If the median is exactly halfway between Q1 and Q3 the data is symmetric

54
New cards

Boxplot: If the Median isn’t exactly symmetric

The data is skewed in the direction of the longer distance

55
New cards

When the whiskers have different lengths, the longer whisker also indicates the direction of the

When the whiskers have different lengths, the longer whisker also indicates the direction of the skewness

56
New cards

Parallel Boxplots provide a good method of…

Parallel boxplots provide a good method of comparing a quantitative variable across different categories of another variable

57
New cards

tapply function

tapply(variable to be analyzed, grouping variable, function)

58
New cards

Z-Score

Tells us how many standard deviations a data point is from its mean

59
New cards

Formula for Z score

z = value - mean / SD

60
New cards

Time Series Plot

A graph of a time series data set, a special type of line graph in which the X-axis is time

61
New cards

Probability

A useful tool to quantify uncertainty, providing an objective rationale for decision-making

62
New cards

Example of Probability in Business Sales

Sales: determine which factors increase the probability of making a sale

63
New cards

Example of Probability in Business Accounting:

Accounting: Identify scenarios most likely to involve fraud to efficiently allocate investigative resources

64
New cards

Example of Probability in Risk Management:

Risk Management: Calculate the probability of various disruptions and how severely they would affect the company

65
New cards

Probability can be interpreted as

Probability can be interpreted as the long-run frequency of events to occur

66
New cards

Basic Probability Rules: #1

Probability is a number between 0 and 1

67
New cards

Basic Probability Rules: #2

Probabilities sum to 1

68
New cards

Basic Probability Rules: #3

The Complement rule: P(A) = 1 - P(A^c)

69
New cards

Basic Probability Rules: #4

The Addition rule: P(A or B) = P(A) + P(B)

70
New cards

Basic Probability Rules: #5

The General Addition rule: P(A or B) = P(A) + P(B) - P(A intersection B)

71
New cards

Conditional Probability

Conditional probability is the probability of one event (A), given that another event (B) is known to have occurred

72
New cards

Conditional Probability Formula

P(A|B) = 𝑃(𝐴 ∩ B) / P(B)

73
New cards

Example of Conditional Probability Scenario

Suppose you run an online retail business. You want to understand how likely a customer is to make a purchase given that they have added items to their shopping cart

74
New cards

Independent Events

Events are said to be independent if the probability of one event occurring has no effect on the probability of the other

75
New cards

Multiplication Rule

Multiplication Rule says that A and B are independent if: P (A and B) = P(A) x P(B)

76
New cards

Random Variables

A random variable specifies the probability of outcomes which are random (not known with certainty)

77
New cards

Example of Random Variables

An inflation rate in 5 years’ time, the monthly sales of a particular cellphone in 2 years’ time

78
New cards

Discrete Variables

A discrete variable is a numerical variable that takes only specific, countable values

79
New cards

Discrete Random Variable

A discrete random variable is a variable that counts outcomes of a random process and can take only specific, separate values

80
New cards

Continuous Variable

A continuous variable is a numerical variable that can take infinitely many possible values within a given interval

81
New cards

Normal Distribution

A normal distribution is a bell-shaped, symmetric distribution where most values cluster around the average, and fewer values occur as you move away from the center

82
New cards

Normal Distributions follow the

Normal distributions follow the 68-95-99.7 Rule

83
New cards

68-95-99.7 Rule

68% of the values fall within 1 SD of the mean

95% of the values fall within 2 SDs of the mean

99.7% of the values fall within 3 SDs of the mean

84
New cards

Normal Probabilities in R

pnorm(value, mean, SD) gives the lower probability by default

85
New cards

How do we get the upper probability of a Normal Distribution Model

To get the upper probability, we can either subtract the answer from1, or include the option lower.tail=F

86
New cards

Normal Distribution Cutoff Values

A cutoff value is the value of a variable corresponding to a specified percentile or probability in a normal distribution

87
New cards

Example of Normal Distribution Cutoff Values

Finding top 10% or bottom 5% of performers

88
New cards

Cutoff Values in R

qnorm(left tail probability, mean, SD)

89
New cards

Expected Value

The expected value, E(X), of a random variable X is the mean or average value of X over all possible outcomes

90
New cards

The Standard Deviation of a Random Variable

The standard deviation of a random variable is its long-run average deviation from the mean, where each deviation is weighted by its probability to occur

91
New cards

Variance of a Random Variable

The variance of a random variable is the average squared deviation from the mean, with each deviation weighted by its probability to occur

92
New cards

Law of Large Numbers

The law of large numbers states that as the sample size increases, the sample mean will converge to the mean of the population. Thus, larger sample sizes are guaranteed to produce results that are close to the population, while smaller sample sizes might have a mean that is considerably different

93
New cards

Empirical Distribution

An empirical distribution is the distribution of a dataset based on observed values and their frequencies or proportions

94
New cards

Probability Distribution Basis

Theoretical Model

95
New cards

Probability Distribution Information

Provides complete probability information regarding all outcomes

96
New cards

Probability Distribution Constant

Yes: based on theoretical assumptions

97
New cards

Empirical Distribution Basis

Observed data from past observations

98
New cards

Empirical Distribution Information

Based on previously observed data. Future outcomes could differ from the past

99
New cards

Empirical Distribution Constant?

No: will change when collecting different data sets

100
New cards

Covariance

Measures the degree to which two random variables move in the same or opposite directions

Explore top notes

note
context
Updated 146d ago
0.0(0)
note
Prepositions (copy)
Updated 169d ago
0.0(0)
note
Gas Exchange
Updated 1159d ago
0.0(0)
note
World History- Ancient Greece
Updated 891d ago
0.0(0)
note
AP Calculus AB - Ultimate Guide
Updated 546d ago
0.0(0)
note
S.I.E.L Method
Updated 1366d ago
0.0(0)
note
1.1: What is Science?
Updated 1209d ago
0.0(0)
note
context
Updated 146d ago
0.0(0)
note
Prepositions (copy)
Updated 169d ago
0.0(0)
note
Gas Exchange
Updated 1159d ago
0.0(0)
note
World History- Ancient Greece
Updated 891d ago
0.0(0)
note
AP Calculus AB - Ultimate Guide
Updated 546d ago
0.0(0)
note
S.I.E.L Method
Updated 1366d ago
0.0(0)
note
1.1: What is Science?
Updated 1209d ago
0.0(0)

Explore top flashcards

flashcards
Auditory Perception
29
Updated 814d ago
0.0(0)
flashcards
roots and shit
72
Updated 302d ago
0.0(0)
flashcards
Discoveries of Cell Theory
89
Updated 1058d ago
0.0(0)
flashcards
MTGE 122
155
Updated 1068d ago
0.0(0)
flashcards
NE Vocabulary 16-25
25
Updated 1128d ago
0.0(0)
flashcards
IB World Religions
36
Updated 932d ago
0.0(0)
flashcards
health assessment exam 1
187
Updated 914d ago
0.0(0)
flashcards
Auditory Perception
29
Updated 814d ago
0.0(0)
flashcards
roots and shit
72
Updated 302d ago
0.0(0)
flashcards
Discoveries of Cell Theory
89
Updated 1058d ago
0.0(0)
flashcards
MTGE 122
155
Updated 1068d ago
0.0(0)
flashcards
NE Vocabulary 16-25
25
Updated 1128d ago
0.0(0)
flashcards
IB World Religions
36
Updated 932d ago
0.0(0)
flashcards
health assessment exam 1
187
Updated 914d ago
0.0(0)