CH 1, CH 2, CH 3

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 70

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

71 Terms

1

Statistics

statistics is the science that deals with the collection, preparation, analysis, presentation and interpretation of data

New cards
2

What are the 3 steps to good statistics?

  1. Find the right data and prepare for analysis

  2. choose the appropriate techniques for analyzing data

  3. interpreting data into verbal and written form

New cards
3

data analysis

data analysis allows companies to effectively target and understand their customers

New cards
4

What two terms fall under the umbrella of data analysis?

  • data analysis

    • data privacy

    • data ethics

New cards
5

data privacy

data privacy is branch of data security related to the proper collection, usage and Transmation of data focusing on…

  • how data is legally collected and stored

  • if and how data is being shared with third parties

  • how data usage meets regulatory obligations

New cards
6

What are the key principles of data privacy?

  • conditionality- customer’s data and identify remain private

  • transparency- data collecting and risk must be transparent to the customer

  • accountability- data collection must have reasonable use and protection of the customer

New cards
7

What are the key principles of data ethics?

  • Human first- human interest should always come before commercial gain

  • no biases

New cards
8

What are the two types of statistics? What makes them different?

two types:

  • descriptive

  • inferential

What makes them different?

Descriptive statistics summarize and describe data, while inferential statistics analyze data to make predictions or draw conclusions about a larger population.

New cards
9

descriptive data

Descriptive data refers to the summary of important aspects of a data set

New cards
10

inferential statistics

Inferential statistics refers to drawing conclusions about larger set of data (population) based on smaller sets of data (sample)

New cards
11

A population consists of all items/members of ____.

A population consists of all items/members of interest

New cards
12

sample

sample is a subset of a population

New cards
13

What are the types of data collection? What makes them different?

types:

  • cross-sectional data

  • time series data

What makes them different?

  • Cross-sectional data captures information at a single point in time across multiple subjects (e.g., income levels of different households in 2025)

  • time series data tracks information about one subject over a period of time (e.g., monthly sales of a store from 2020 to 2025).

New cards
14

What are the types of data format? What makes them different?

types:

  • structed data

  • unstructed data

what makes the different?

structed data resides in pre-defined tables and lists, while unstructured data does not conform to pre-defined tables but instead uses text or social media.

New cards
15

variables

a characteristic of interest that differs in a degree among various observations

New cards
16

What are the two types of variables? What makes them diferent?

types:

  • categorical data (qualitive)

  • numeric data (quantitative)

what makes them different?

Categorical data represents labels or groups (e.g., colors, types), while numeric data represents measurable quantities or numbers (e.g., height, temperature).

New cards
17

What are the 4 types of major scales for variables?

  • Nominal- simplest type of scale used to label or categorize things without order or value

  • Ordinal- a way to measure things in a specific order.

    • Ex. Very happy, happy, natural, unhappy, very unhappy

  • Interval- can tell how much larger or smaller one number is to another. The scale does doesn’t have a starting point of zero.

    • Ex. tempture scale

  • Ratio- the most “powerful” type of scale. Numbers show how much or many of something. This scale allows to do calculations

    • Ex. length, width, age

New cards
18

What types are scales are categorical?

nominal and ordinal

New cards
19

what types of scales are numerical?

interval and ratio

New cards
20

What are the two stratifies to deal with missing values in a data set

  • omission strategy- missing values are excluded from the analysis of data

  • imputation strategy- missing values be replaced with a reasonable input

    • numeric variables: replace with average

    • categorical variables: replace with predominant category

New cards
21

subsetting

subletting is the process of extracting a portion of the data set to compare two subsets of data

New cards
22

relative frequency

relative frequency is the fraction or percentage of item in each group

New cards
23

function: COUTNA

COUNTA- counts all cells that are not empty in a range

New cards
24

function: COUNTIF

COUNTIF- counts the cells that meet a specific condition you set

New cards
25

method to visualize a categorical variable

  • summarize the data with frequency distribution (fancy way to say table)

    1. sort the data into groups and count how many items are in each group

    2. then add relative frequency to the table

New cards
26

method to visualize numerical variables

  • frequency distribution to summarize a numerical variable. Instead of categories, we construct data into intervals

New cards
27

What are the decisions to make with intervals

  • the total number in the interval

    • try to use the least amount of numbers of intervals

New cards
28

approximation formula

(max-min)/ number of intervals wanted

New cards
29

cumulative frequency, cumulative relative frequency, and cumulative percent frequency

  • cumulative frequency: adds up total number of observations in a data set

    • For example, if 3 people scored 10, 5 people scored 20, and 7 people scored 30, the cumulative frequencies are:

      • Up to 10: 3

      • Up to 20: 3 + 5 = 8

      • Up to 30: 3 + 5 + 7 = 15

  • cumulative relative frequency: adds up the proportion of observations for each group based on the total

    • Example: If the total is 20 observations:

      • Up to 10: 3/20 ​=0.15

      • Up to 20: 8/20 =0.40

      • Up to 30:15/20 ​=0.75

  • cumulative percent frequency: adds up the percentage of observations for each group

    • Example (continuing from above):

      • Up to 10: 0.15 ×100 =15%

      • Up to 20: 0.4 ×100=40%

      • Up to 30: 0.75 ×100=75%

New cards
30

charts used to visualize a categorical variable

bar charts and pie charts

New cards
31

charts used to visualize a numerical variable

histogram and stem-and-leaf diagram

New cards
32

explain how to calculate a relative frequency for a frequency distribution?

Count the total number of entries and then divide the number of each individual entry by the total number of entries. 

New cards
33

In general, the shape of most distributions can be categorized as…

In general, the shape of most distributions can be categorized as symmetric or skewed

New cards
34

A line chart with three lines requires how many variables?

  • 4

  • You need one variable for each line's y-value and one more variable for the common x-axis value.

New cards
35

Heat maps are especially useful to identify combinations of the categorical variables that have economic significance.

New cards
36

central location

central location is how numerical data tends to cluster around some middle or central value

New cards
37

arithmetic mean (mean) and how do you calculate?

  • arithmetic mean is the primary measure of central location

  • calculate by adding all the observations and dividing by them by the total number of observations

New cards
38

what are the types of measure of central location

types of measures for central location:

  • mean

  • median

  • mode

New cards
39

population mean symbol

μ

New cards
40

sample mean symbol

New cards
41

median

  • median is the middle value; it’s the number right in the middle. Median is used when there are outliers in the data set because outliers offset the mean.

    • odd number of values: number in the middle is the median

    • even number of values: divide the two middle values by 2 to get the median

New cards
42

mode

mode is the value that appears most often in data. There can be one or more modes, or even no mode. Mode is the measure of central location for categorical values.

  • one mode: unimodal

  • two modes: bimodal

  • two or more: multimodal

New cards
43

weighted mean

weighted mean is when some observations contribute more than others. Used to calculate the mean for frequency distribution

New cards
44

histogram: symmetric & skewed

  • symmetric: if one side of the histogram is a mirror image of the other

  • positively skewed: mean is greater than the median (mean > median)

  • negatively skewed: the mean is less than the median (mean < median)

<ul><li><p><strong>symmetric: </strong>if one side of the histogram is a mirror image of the other</p></li><li><p><strong>positively skewed:</strong> mean is greater than the median (mean &gt; median)</p></li><li><p><strong>negatively skewed:</strong> the mean is less than the median (mean &lt; median)</p></li></ul><p></p>
New cards
45

percentiles

percentiles are a way to show how a number compares to the rest of the data. It’s a measure of location. It’s common to divide percentiles into 4 quatres (25th, 50th, 75th)

  • Ex. if you are in the 90th percentile for height then you are taller than 90% of people

New cards
46

boxplots

boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.

  • symmetry: median center, whisker are equal

  • positive: median left, right whisker is longer

  • negative: median right, left whisker is longer

<p>boxplots are a visual representation of particular percentiles. They are a way to graphically display five-number summary. Can also be used to informally gauge the shape of the distribution.</p><ul><li><p><strong>symmetry:</strong> median center, whisker are equal</p></li><li><p><strong>positive:</strong> median <u>left</u>, right whisker is longer</p></li><li><p><strong>negative: </strong>median <u>righ</u>t, left whisker is longer</p></li></ul><p></p>
New cards
47

measures of dispersion

measures of dispersion tell how much data varies from the average

  • 0= all observations are identical

  • increase: the observations are more diverse

New cards
48

range and formula

range is the simplest form of measure and is the difference between largest and smallest number. However, it’s not considered a good measure of dispersion because it focuses solely on the extreme values

  • range = max - min

New cards
49

interquartile range (IQR) and formula

interquartile range (IQR) is the difference between the third (75th) and first (25th) quartile. IQR helps understand how spread-out central values are without being affected by any high or low numbers.

  • IQR= Q3 - Q1

New cards
50

mean absolute difference (MAD)

mean absolute difference (MAD) is the average absolute difference of all values from the mean in a data set. We use MAD because it avoids using negative and positive numbers that would cancel while calculating the average

New cards
51

what are the two most widely used way to measure disoperation

to most widely used way to measure disoperation:

  • variance and standard deviation

New cards
52

how to calculate variance and standard deviation

  1. find the differences between each value and the mean

  2. square the difference between (this emphasizes larger differences)

  3. calculate the average of the squared differences to find variance

  • to return to original units, we take the positive square root of the variance which will give us the standard devotion

New cards
53

Excel commands for growth and value finds

  • range: MAX - MIN

  • MAD: AVEDEV

  • standard deviation and variance: VAR.S and STDEV. S

New cards
54

coefficient of variation (CV)

coefficient of variation (CV) is a way to measure and compare how spread-out data is even if the data sets have different average and units. It is a relative measure of dispersion.

  • sample CV: s / X̄

  • population CV: σ / μ

New cards
55

sample size symbol

n

New cards
56

population size symbol

N

New cards
57

population variance symbol

σ2

New cards
58

population standard deviation

σ

New cards
59

in a distribution the mean, the median, and the mode are equal when…

in a distribution the mean, the median, and the mode are equal when its symmetric and unimodal

New cards
60

The pth percentile divides a variable into two parts. What percentage is greater than p?

(100 - p)

New cards
61

five-number summary

five-number summary is a way to describe a dataset by focusing on five key values. These five numbers give you a quick snapshot of the spread and center of the data.

  • Minimum: The smallest number in the set.

  • Q1 (First Quartile): The middle of the lower half of the data.

  • Median (Q2): The middle number of the entire dataset.

  • Q3 (Third Quartile): The middle of the upper half of the data.

  • Maximum: The largest number in the set.

New cards
62

In a boxplot, when is a data point considered an outlier

In a boxplot, a data point considered an outlier when it’s 1.5 x IOR from Q1 or Q3

New cards
63

What is the relationship between the variance and the standard deviation?

The standard deviation is the positive square root of the variance.

New cards
64

total sum symbol

New cards
65

standard deviation symbol

s

New cards
66

A summary measure that is computed to describe a characteristic of a sample taken from a population is called

sample- statistic

population- parameter

New cards
67

When investigating one categorical variable and one numeric variable, what type of graph should you use?

Create a histogram for the numeric variable for each level of the categorical variable.

New cards
68

When investigating two categorical variables, what type of graphs should you use?

Create either two pie charts or two bar graphs to compare the categories.

New cards
69

When investigating two numeric variables, you should create a …..

Create a scatterplot to visualize the relationship between the two numeric variables.

New cards
70

How to get Descriptive Statistics for numeric data in excel

  1. Click on Data Analysis Tab

  2. select Descriptive Statistics,

  3. highlight your data to define the input range,

  4. check off Labels in First Row,

  5. check off Chart Output, OK

New cards
71

width

width is the range of values in a class

New cards

Explore top notes

note Note
studied byStudied by 11 people
853 days ago
5.0(2)
note Note
studied byStudied by 3 people
489 days ago
5.0(1)
note Note
studied byStudied by 20 people
754 days ago
5.0(1)
note Note
studied byStudied by 22 people
98 days ago
5.0(1)
note Note
studied byStudied by 11 people
56 days ago
5.0(1)
note Note
studied byStudied by 2 people
58 days ago
5.0(1)
note Note
studied byStudied by 21 people
818 days ago
5.0(1)
note Note
studied byStudied by 129 people
695 days ago
5.0(6)

Explore top flashcards

flashcards Flashcard (20)
studied byStudied by 11 people
832 days ago
5.0(2)
flashcards Flashcard (28)
studied byStudied by 9 people
602 days ago
5.0(1)
flashcards Flashcard (29)
studied byStudied by 1 person
647 days ago
5.0(2)
flashcards Flashcard (65)
studied byStudied by 3 people
14 days ago
5.0(2)
flashcards Flashcard (43)
studied byStudied by 1 person
645 days ago
5.0(1)
flashcards Flashcard (25)
studied byStudied by 5 people
103 days ago
5.0(1)
flashcards Flashcard (34)
studied byStudied by 2 people
242 days ago
5.0(1)
flashcards Flashcard (88)
studied byStudied by 2 people
6 hours ago
5.0(1)
robot