CH 1, CH 2, CH 3

0.0(0)

Studied by 0 people

0.0(0)

View linked note

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/70

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

71 Terms

New cards

Statistics

statistics is the science that deals with the collection, preparation, analysis, presentation and interpretation of data

New cards

What are the 3 steps to good statistics?

Find the right data and prepare for analysis
choose the appropriate techniques for analyzing data
interpreting data into verbal and written form

New cards

data analysis

data analysis allows companies to effectively target and understand their customers

New cards

What two terms fall under the umbrella of data analysis?

data analysis
- data privacy
- data ethics

New cards

data privacy

data privacy is branch of data security related to the proper collection, usage and Transmation of data focusing on…

how data is legally collected and stored
if and how data is being shared with third parties
how data usage meets regulatory obligations

New cards

What are the key principles of data privacy?

conditionality- customer’s data and identify remain private
transparency- data collecting and risk must be transparent to the customer
accountability- data collection must have reasonable use and protection of the customer

New cards

What are the key principles of data ethics?

Human first- human interest should always come before commercial gain
no biases

New cards

What are the two types of statistics? What makes them different?

two types:

descriptive
inferential

What makes them different?

Descriptive statistics summarize and describe data, while inferential statistics analyze data to make predictions or draw conclusions about a larger population.

New cards

descriptive data

Descriptive data refers to the summary of important aspects of a data set

New cards

inferential statistics

Inferential statistics refers to drawing conclusions about larger set of data (population) based on smaller sets of data (sample)

New cards

A population consists of all items/members of ____.

A population consists of all items/members of interest

New cards

sample

sample is a subset of a population

New cards

What are the types of data collection? What makes them different?

types:

cross-sectional data
time series data

What makes them different?

Cross-sectional data captures information at a single point in time across multiple subjects (e.g., income levels of different households in 2025)
time series data tracks information about one subject over a period of time (e.g., monthly sales of a store from 2020 to 2025).

New cards

What are the types of data format? What makes them different?

types:

structed data
unstructed data

what makes the different?

structed data resides in pre-defined tables and lists, while unstructured data does not conform to pre-defined tables but instead uses text or social media.

New cards

variables

a characteristic of interest that differs in a degree among various observations

New cards

What are the two types of variables? What makes them diferent?

types:

categorical data (qualitive)
numeric data (quantitative)

what makes them different?

Categorical data represents labels or groups (e.g., colors, types), while numeric data represents measurable quantities or numbers (e.g., height, temperature).

New cards

What are the 4 types of major scales for variables?

Nominal- simplest type of scale used to label or categorize things without order or value
Ordinal- a way to measure things in a specific order.
- Ex. Very happy, happy, natural, unhappy, very unhappy
Interval- can tell how much larger or smaller one number is to another. The scale does doesn’t have a starting point of zero.
- Ex. tempture scale
Ratio- the most “powerful” type of scale. Numbers show how much or many of something. This scale allows to do calculations
- Ex. length, width, age

New cards

What types are scales are categorical?

nominal and ordinal

New cards

what types of scales are numerical?

interval and ratio

New cards

What are the two stratifies to deal with missing values in a data set

omission strategy- missing values are excluded from the analysis of data
imputation strategy- missing values be replaced with a reasonable input
- numeric variables: replace with average
- categorical variables: replace with predominant category

New cards

subsetting

subletting is the process of extracting a portion of the data set to compare two subsets of data

New cards

relative frequency

relative frequency is the fraction or percentage of item in each group

New cards

function: COUTNA

COUNTA- counts all cells that are not empty in a range

New cards

function: COUNTIF

COUNTIF- counts the cells that meet a specific condition you set

New cards

method to visualize a categorical variable

summarize the data with frequency distribution (fancy way to say table)
1. sort the data into groups and count how many items are in each group
2. then add relative frequency to the table

New cards

method to visualize numerical variables

frequency distribution to summarize a numerical variable. Instead of categories, we construct data into intervals

New cards

What are the decisions to make with intervals

the total number in the interval
- try to use the least amount of numbers of intervals

New cards

approximation formula

(max-min)/ number of intervals wanted

New cards

cumulative frequency, cumulative relative frequency, and cumulative percent frequency

cumulative frequency: adds up total number of observations in a data set
- For example, if 3 people scored 10, 5 people scored 20, and 7 people scored 30, the cumulative frequencies are:
  - Up to 10: 3
  - Up to 20: 3 + 5 = 8
  - Up to 30: 3 + 5 + 7 = 15
cumulative relative frequency: adds up the proportion of observations for each group based on the total
- Example: If the total is 20 observations:
  - Up to 10: 3/20 =0.15
  - Up to 20: 8/20 =0.40
  - Up to 30:15/20 =0.75
cumulative percent frequency: adds up the percentage of observations for each group
- Example (continuing from above):
  - Up to 10: 0.15 ×100 =15%
  - Up to 20: 0.4 ×100=40%
  - Up to 30: 0.75 ×100=75%

New cards

charts used to visualize a categorical variable

bar charts and pie charts

New cards

charts used to visualize a numerical variable

histogram and stem-and-leaf diagram

New cards

explain how to calculate a relative frequency for a frequency distribution?

Count the total number of entries and then divide the number of each individual entry by the total number of entries.

New cards

In general, the shape of most distributions can be categorized as…

In general, the shape of most distributions can be categorized as symmetric or skewed

New cards

A line chart with three lines requires how many variables?

4
You need one variable for each line's y-value and one more variable for the common x-axis value.

New cards

Heat maps are especially useful to identify combinations of the categorical variables that have economic significance.

New cards

central location

central location is how numerical data tends to cluster around some middle or central value

New cards

arithmetic mean (mean) and how do you calculate?

arithmetic mean is the primary measure of central location
calculate by adding all the observations and dividing by them by the total number of observations

New cards

what are the types of measure of central location

types of measures for central location:

mean
median
mode

New cards

population mean symbol

New cards

sample mean symbol

X̄

New cards

median

median is the middle value; it’s the number right in the middle. Median is used when there are outliers in the data set because outliers offset the mean.
- odd number of values: number in the middle is the median
- even number of values: divide the two middle values by 2 to get the median

New cards

mode

mode is the value that appears most often in data. There can be one or more modes, or even no mode. Mode is the measure of central location for categorical values.

one mode: unimodal
two modes: bimodal
two or more: multimodal

New cards

weighted mean

weighted mean is when some observations contribute more than others. Used to calculate the mean for frequency distribution

New cards

histogram: symmetric & skewed

symmetric: if one side of the histogram is a mirror image of the other
positively skewed: mean is greater than the median (mean > median)
negatively skewed: the mean is less than the median (mean < median)

<ul><li><p><strong>symmetric: </strong>if one side of the histogram is a mirror image of the other</p></li><li><p><strong>positively skewed:</strong> mean is greater than the median (mean > median)</p></li><li><p><strong>negatively skewed:</strong> the mean is less than the median (mean < median)</p></li></ul><p></p>

New cards

percentiles

percentiles are a way to show how a number compares to the rest of the data. It’s a measure of location. It’s common to divide percentiles into 4 quatres (25th, 50th, 75th)

Ex. if you are in the 90th percentile for height then you are taller than 90% of people

New cards

boxplots

boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.

symmetry: median center, whisker are equal
positive: median left, right whisker is longer
negative: median right, left whisker is longer

<p>boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.</p><ul><li><p><strong>symmetry:</strong> median center, whisker are equal</p></li><li><p><strong>positive:</strong> median <u>left</u>, right whisker is longer</p></li><li><p><strong>negative: </strong>median <u>righ</u>t, left whisker is longer</p></li></ul><p></p>

New cards

measures of dispersion

measures of dispersion tell how much data varies from the average

0= all observations are identical
increase: the observations are more diverse

New cards

range and formula

range is the simplest form of measure and is the difference between largest and smallest number. However, it’s not considered a good measure of dispersion because it focuses solely on the extreme values

range = max - min

New cards

interquartile range (IQR) and formula

interquartile range (IQR) is the difference between the third (75th) and first (25th) quartile. IQR helps understand how spread-out central values are without being affected by any high or low numbers.

IQR= Q₃ - Q₁

New cards

mean absolute difference (MAD)

mean absolute difference (MAD) is the average absolute difference of all values from the mean in a data set. We use MAD because it avoids using negative and positive numbers that would cancel while calculating the average

New cards

what are the two most widely used way to measure disoperation

to most widely used way to measure disoperation:

variance and standard deviation

New cards

how to calculate variance and standard deviation

find the differences between each value and the mean
square the difference between (this emphasizes larger differences)
calculate the average of the squared differences to find variance

to return to original units, we take the positive square root of the variance which will give us the standard devotion

New cards

Excel commands for growth and value finds

range: MAX - MIN
MAD: AVEDEV
standard deviation and variance: VAR.S and STDEV. S

New cards

coefficient of variation (CV)

coefficient of variation (CV) is a way to measure and compare how spread-out data is even if the data sets have different average and units. It is a relative measure of dispersion.

sample CV: s / X̄
population CV: σ / μ

New cards

sample size symbol

New cards

population size symbol

New cards

population variance symbol

σ²

New cards

population standard deviation

New cards

in a distribution the mean, the median, and the mode are equal when…

in a distribution the mean, the median, and the mode are equal when its symmetric and unimodal

New cards

The pth percentile divides a variable into two parts. What percentage is greater than p?

(100 - p)

New cards

five-number summary

five-number summary is a way to describe a dataset by focusing on five key values. These five numbers give you a quick snapshot of the spread and center of the data.

Minimum: The smallest number in the set.
Q1 (First Quartile): The middle of the lower half of the data.
Median (Q2): The middle number of the entire dataset.
Q3 (Third Quartile): The middle of the upper half of the data.
Maximum: The largest number in the set.

New cards

In a boxplot, when is a data point considered an outlier

In a boxplot, a data point considered an outlier when it’s 1.5 x IOR from Q₁ or Q₃

New cards

What is the relationship between the variance and the standard deviation?

The standard deviation is the positive square root of the variance.

New cards

total sum symbol

∑

New cards

standard deviation symbol

New cards

A summary measure that is computed to describe a characteristic of a sample taken from a population is called

sample- statistic

population- parameter

New cards

When investigating one categorical variable and one numeric variable, what type of graph should you use?

Create a histogram for the numeric variable for each level of the categorical variable.

New cards

When investigating two categorical variables, what type of graphs should you use?

Create either two pie charts or two bar graphs to compare the categories.

New cards

When investigating two numeric variables, you should create a …..

Create a scatterplot to visualize the relationship between the two numeric variables.

New cards

How to get Descriptive Statistics for numeric data in excel

Click on Data Analysis Tab
select Descriptive Statistics,
highlight your data to define the input range,
check off Labels in First Row,
check off Chart Output, OK

New cards

width

width is the range of values in a class