CH 1, CH 2, CH 3

0.0(0)
studied byStudied by 0 people
0.0(0)
linked notesView linked note
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/70

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

71 Terms

1
New cards

Statistics

statistics is the science that deals with the collection, preparation, analysis, presentation and interpretation of data

2
New cards

What are the 3 steps to good statistics?

  1. Find the right data and prepare for analysis

  2. choose the appropriate techniques for analyzing data

  3. interpreting data into verbal and written form

3
New cards

data analysis

data analysis allows companies to effectively target and understand their customers

4
New cards

What two terms fall under the umbrella of data analysis?

  • data analysis

    • data privacy

    • data ethics

5
New cards

data privacy

data privacy is branch of data security related to the proper collection, usage and Transmation of data focusing on…

  • how data is legally collected and stored

  • if and how data is being shared with third parties

  • how data usage meets regulatory obligations

6
New cards

What are the key principles of data privacy?

  • conditionality- customer’s data and identify remain private

  • transparency- data collecting and risk must be transparent to the customer

  • accountability- data collection must have reasonable use and protection of the customer

7
New cards

What are the key principles of data ethics?

  • Human first- human interest should always come before commercial gain

  • no biases

8
New cards

What are the two types of statistics? What makes them different?

two types:

  • descriptive

  • inferential

What makes them different?

Descriptive statistics summarize and describe data, while inferential statistics analyze data to make predictions or draw conclusions about a larger population.

9
New cards

descriptive data

Descriptive data refers to the summary of important aspects of a data set

10
New cards

inferential statistics

Inferential statistics refers to drawing conclusions about larger set of data (population) based on smaller sets of data (sample)

11
New cards

A population consists of all items/members of ____.

A population consists of all items/members of interest

12
New cards

sample

sample is a subset of a population

13
New cards

What are the types of data collection? What makes them different?

types:

  • cross-sectional data

  • time series data

What makes them different?

  • Cross-sectional data captures information at a single point in time across multiple subjects (e.g., income levels of different households in 2025)

  • time series data tracks information about one subject over a period of time (e.g., monthly sales of a store from 2020 to 2025).

14
New cards

What are the types of data format? What makes them different?

types:

  • structed data

  • unstructed data

what makes the different?

structed data resides in pre-defined tables and lists, while unstructured data does not conform to pre-defined tables but instead uses text or social media.

15
New cards

variables

a characteristic of interest that differs in a degree among various observations

16
New cards

What are the two types of variables? What makes them diferent?

types:

  • categorical data (qualitive)

  • numeric data (quantitative)

what makes them different?

Categorical data represents labels or groups (e.g., colors, types), while numeric data represents measurable quantities or numbers (e.g., height, temperature).

17
New cards

What are the 4 types of major scales for variables?

  • Nominal- simplest type of scale used to label or categorize things without order or value

  • Ordinal- a way to measure things in a specific order.

    • Ex. Very happy, happy, natural, unhappy, very unhappy

  • Interval- can tell how much larger or smaller one number is to another. The scale does doesn’t have a starting point of zero.

    • Ex. tempture scale

  • Ratio- the most “powerful” type of scale. Numbers show how much or many of something. This scale allows to do calculations

    • Ex. length, width, age

18
New cards

What types are scales are categorical?

nominal and ordinal

19
New cards

what types of scales are numerical?

interval and ratio

20
New cards

What are the two stratifies to deal with missing values in a data set

  • omission strategy- missing values are excluded from the analysis of data

  • imputation strategy- missing values be replaced with a reasonable input

    • numeric variables: replace with average

    • categorical variables: replace with predominant category

21
New cards

subsetting

subletting is the process of extracting a portion of the data set to compare two subsets of data

22
New cards

relative frequency

relative frequency is the fraction or percentage of item in each group

23
New cards

function: COUTNA

COUNTA- counts all cells that are not empty in a range

24
New cards

function: COUNTIF

COUNTIF- counts the cells that meet a specific condition you set

25
New cards

method to visualize a categorical variable

  • summarize the data with frequency distribution (fancy way to say table)

    1. sort the data into groups and count how many items are in each group

    2. then add relative frequency to the table

26
New cards

method to visualize numerical variables

  • frequency distribution to summarize a numerical variable. Instead of categories, we construct data into intervals

27
New cards

What are the decisions to make with intervals

  • the total number in the interval

    • try to use the least amount of numbers of intervals

28
New cards

approximation formula

(max-min)/ number of intervals wanted

29
New cards

cumulative frequency, cumulative relative frequency, and cumulative percent frequency

  • cumulative frequency: adds up total number of observations in a data set

    • For example, if 3 people scored 10, 5 people scored 20, and 7 people scored 30, the cumulative frequencies are:

      • Up to 10: 3

      • Up to 20: 3 + 5 = 8

      • Up to 30: 3 + 5 + 7 = 15

  • cumulative relative frequency: adds up the proportion of observations for each group based on the total

    • Example: If the total is 20 observations:

      • Up to 10: 3/20 ​=0.15

      • Up to 20: 8/20 =0.40

      • Up to 30:15/20 ​=0.75

  • cumulative percent frequency: adds up the percentage of observations for each group

    • Example (continuing from above):

      • Up to 10: 0.15 ×100 =15%

      • Up to 20: 0.4 ×100=40%

      • Up to 30: 0.75 ×100=75%

30
New cards

charts used to visualize a categorical variable

bar charts and pie charts

31
New cards

charts used to visualize a numerical variable

histogram and stem-and-leaf diagram

32
New cards

explain how to calculate a relative frequency for a frequency distribution?

Count the total number of entries and then divide the number of each individual entry by the total number of entries. 

33
New cards

In general, the shape of most distributions can be categorized as…

In general, the shape of most distributions can be categorized as symmetric or skewed

34
New cards

A line chart with three lines requires how many variables?

  • 4

  • You need one variable for each line's y-value and one more variable for the common x-axis value.

35
New cards

Heat maps are especially useful to identify combinations of the categorical variables that have economic significance.

36
New cards

central location

central location is how numerical data tends to cluster around some middle or central value

37
New cards

arithmetic mean (mean) and how do you calculate?

  • arithmetic mean is the primary measure of central location

  • calculate by adding all the observations and dividing by them by the total number of observations

38
New cards

what are the types of measure of central location

types of measures for central location:

  • mean

  • median

  • mode

39
New cards

population mean symbol

μ

40
New cards

sample mean symbol

41
New cards

median

  • median is the middle value; it’s the number right in the middle. Median is used when there are outliers in the data set because outliers offset the mean.

    • odd number of values: number in the middle is the median

    • even number of values: divide the two middle values by 2 to get the median

42
New cards

mode

mode is the value that appears most often in data. There can be one or more modes, or even no mode. Mode is the measure of central location for categorical values.

  • one mode: unimodal

  • two modes: bimodal

  • two or more: multimodal

43
New cards

weighted mean

weighted mean is when some observations contribute more than others. Used to calculate the mean for frequency distribution

44
New cards

histogram: symmetric & skewed

  • symmetric: if one side of the histogram is a mirror image of the other

  • positively skewed: mean is greater than the median (mean > median)

  • negatively skewed: the mean is less than the median (mean < median)

<ul><li><p><strong>symmetric: </strong>if one side of the histogram is a mirror image of the other</p></li><li><p><strong>positively skewed:</strong> mean is greater than the median (mean &gt; median)</p></li><li><p><strong>negatively skewed:</strong> the mean is less than the median (mean &lt; median)</p></li></ul><p></p>
45
New cards

percentiles

percentiles are a way to show how a number compares to the rest of the data. It’s a measure of location. It’s common to divide percentiles into 4 quatres (25th, 50th, 75th)

  • Ex. if you are in the 90th percentile for height then you are taller than 90% of people

46
New cards

boxplots

boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.

  • symmetry: median center, whisker are equal

  • positive: median left, right whisker is longer

  • negative: median right, left whisker is longer

<p>boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.</p><ul><li><p><strong>symmetry:</strong> median center, whisker are equal</p></li><li><p><strong>positive:</strong> median <u>left</u>, right whisker is longer</p></li><li><p><strong>negative: </strong>median <u>righ</u>t, left whisker is longer</p></li></ul><p></p>
47
New cards

measures of dispersion

measures of dispersion tell how much data varies from the average

  • 0= all observations are identical

  • increase: the observations are more diverse

48
New cards

range and formula

range is the simplest form of measure and is the difference between largest and smallest number. However, it’s not considered a good measure of dispersion because it focuses solely on the extreme values

  • range = max - min

49
New cards

interquartile range (IQR) and formula

interquartile range (IQR) is the difference between the third (75th) and first (25th) quartile. IQR helps understand how spread-out central values are without being affected by any high or low numbers.

  • IQR= Q3 - Q1

50
New cards

mean absolute difference (MAD)

mean absolute difference (MAD) is the average absolute difference of all values from the mean in a data set. We use MAD because it avoids using negative and positive numbers that would cancel while calculating the average

51
New cards

what are the two most widely used way to measure disoperation

to most widely used way to measure disoperation:

  • variance and standard deviation

52
New cards

how to calculate variance and standard deviation

  1. find the differences between each value and the mean

  2. square the difference between (this emphasizes larger differences)

  3. calculate the average of the squared differences to find variance

  • to return to original units, we take the positive square root of the variance which will give us the standard devotion

53
New cards

Excel commands for growth and value finds

  • range: MAX - MIN

  • MAD: AVEDEV

  • standard deviation and variance: VAR.S and STDEV. S

54
New cards

coefficient of variation (CV)

coefficient of variation (CV) is a way to measure and compare how spread-out data is even if the data sets have different average and units. It is a relative measure of dispersion.

  • sample CV: s / X̄

  • population CV: σ / μ

55
New cards

sample size symbol

n

56
New cards

population size symbol

N

57
New cards

population variance symbol

σ2

58
New cards

population standard deviation

σ

59
New cards

in a distribution the mean, the median, and the mode are equal when…

in a distribution the mean, the median, and the mode are equal when its symmetric and unimodal

60
New cards

The pth percentile divides a variable into two parts. What percentage is greater than p?

(100 - p)

61
New cards

five-number summary

five-number summary is a way to describe a dataset by focusing on five key values. These five numbers give you a quick snapshot of the spread and center of the data.

  • Minimum: The smallest number in the set.

  • Q1 (First Quartile): The middle of the lower half of the data.

  • Median (Q2): The middle number of the entire dataset.

  • Q3 (Third Quartile): The middle of the upper half of the data.

  • Maximum: The largest number in the set.

62
New cards

In a boxplot, when is a data point considered an outlier

In a boxplot, a data point considered an outlier when it’s 1.5 x IOR from Q1 or Q3

63
New cards

What is the relationship between the variance and the standard deviation?

The standard deviation is the positive square root of the variance.

64
New cards

total sum symbol

65
New cards

standard deviation symbol

s

66
New cards

A summary measure that is computed to describe a characteristic of a sample taken from a population is called

sample- statistic

population- parameter

67
New cards

When investigating one categorical variable and one numeric variable, what type of graph should you use?

Create a histogram for the numeric variable for each level of the categorical variable.

68
New cards

When investigating two categorical variables, what type of graphs should you use?

Create either two pie charts or two bar graphs to compare the categories.

69
New cards

When investigating two numeric variables, you should create a …..

Create a scatterplot to visualize the relationship between the two numeric variables.

70
New cards

How to get Descriptive Statistics for numeric data in excel

  1. Click on Data Analysis Tab

  2. select Descriptive Statistics,

  3. highlight your data to define the input range,

  4. check off Labels in First Row,

  5. check off Chart Output, OK

71
New cards

width

width is the range of values in a class