Stats 201 notes

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/137

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

138 Terms

1
New cards

What is Statistics?

The art and science of designing studies and anlayazing data

2
New cards

Data analysis

The process of organizing, displaying, summarizing, and asking questions about data

3
New cards

Steps of data analysis

step 1: pose a question that can be answered by data

step 2: determine a plan to collect the data

step 3: summarize the data with graphs and numerical summaries

step 4: answer the question pposed in step 1 using the sata and summaries

4
New cards

What is data?

The information we gather with experiments and surveys. That is, data are numbers with a context.

5
New cards

What are the 3 components of statistics

1) design

2) Description

3) Infernence

6
New cards

What is design?

Planning how to obtain data to answer the question of interest

7
New cards

What is descriptive?/Descriptive Statistics

summarizing and analyzing the data obtained. can use numbbers or graphs

8
New cards

What is inference?/Statistical Inference

Making decisions/conclusions and predictions based on the data to answer the statistical questions

9
New cards

Individuals (the who)

objects (people, animals, things) described by a set of data

10
New cards

Variables (the what)

any characteristics of an individual. A variable can take different values for different individuals. They are 2 types

11
New cards

What are the 2 types of variables in stats?

categorical/qualitative and quantitative

12
New cards

Categorical/Qualitative variable

places individuals into one or several groups oe categories

13
New cards

Numerical/ Quantitative variable

Takes numerical values for which arithmetic operations such as addition and averaging make sense

14
New cards

Population

a group of subjects or people we wish to study. ALL is the keyword

15
New cards

Sample

A collection/ subset/ part of objects or people taken from the population of interest

16
New cards

Census

a survey that measures every member of a population

17
New cards

Statistic

a numerical measure/ vaule that characterizes/ describes some aspect of the sample

18
New cards

Description of a population, sample, parameter, a statistic

19
New cards

We use sample statistics to what?

estimate population parameter vaules

20
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

What is the population

all american citizens

21
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

What is the sample

1500 American citizens surveyed

22
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

what is the individuals of the survey

American citizens

23
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

what is the variable

lactose intolerance

24
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

is the variable qualitative (categorical) or quantitative (numerical)

catergorical

25
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

What is the parameter

15%

26
New cards

A polling agency takes a sample of 1500 American citizens and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant.

What is the statistic

12%

27
New cards

Descriptive statistics

involves methods of organizing. picturing and summarizing information from samples or populations

-graphs and numbers such as averages and percentages

-reduces data to simple summaries without distorting or losing much information

28
New cards

Inferential statistics

Methods of making decisions or predictions about a population, based on data obtained from a sample of that population

-used when data are available from a sample only, but we want to make a decision or prediction about the entire population

29
New cards

Randomness in stats

refers to the inherent uncertainty in the outcome of a process, even when the process is well understood

-Individual outcomes are unpredictable

-long-run patterns may still be stable

30
New cards

Examples of randomness in stats

tossing a coin, daily number of er vists selecting a random sample etc.

31
New cards

Varaibility in stats

describes how much data values differ from one another

-measures the spread in a dataset

-exists even when data are collected carefully

32
New cards

Randomness in a proces leads to

variability in observed data

33
New cards

Observations

data values observed for a variable

34
New cards

In a graph variables are on

the columns (vertically)

35
New cards

In a graph, observations are on

rows (horizontally)

36
New cards

Two Types of variables/data

categorical and quantitative

37
New cards

Categorical

data/variable that places an individual into one or several categories

38
New cards

Quantitative

data/variable takes numerical values for which arithmetic operations such as adding and averaging make sense.

39
New cards

To determine if a variable is categorical or quantitative

40
New cards

Two types of quantitative variables are

discrete and continuous

41
New cards

Discrete

those quantitative variables where possible values form a set of separate numbers. key phrase “the number of “

-outcomes are counts (for example 0,1,2,3)

-no decimals allowed

-finite(not infinite) number of possible values

42
New cards

Examples of discrete

number of pets

number of siblings

number of friends

43
New cards

Continuous

those quantitative variables where possible values form an interval

-outcomes are measurements

-Decimals are allowed but not required

-infinite number of possible values

44
New cards

Examples of continuous

Hieght, weight age, time taken to complete an exam

45
New cards

Two types of catergorical variables

nominal and ordinal

46
New cards

Nominal

a categorical variable that has two or more categories, but there is no intrinsic ordering to the categories

47
New cards

Examples of nominal

hair color, gender , country

48
New cards

Ordinal

A categorical variable that has a clear ordering of the variables

49
New cards

Example of ordinal

Economic status (low, medium, and high)

level of education (elementary, high, and college, etc.)

Financial happiness (very happy, happy, neutral, unhappy, very unhappy)

50
New cards

What is ditribution of a variable?

A distribution that tells us what values it takes and how often it takes these values

51
New cards

What to look for with quantitative variables

shape

center

spread

52
New cards

Shape

do observation cluster in certain intervals and / or are they spread thin in other areas

53
New cards

Center

where does a typical observation falll

54
New cards

Spread or variability

how tightly are the observatuiobs clustering around the center

55
New cards

Explortaory data analysis

statistical tools (such as graphs to display a variable) and ideas to examine the data in order to describe their main features

56
New cards

What are the types of display for categorical data / variables?

1) Frequency table

2) Bar charts and pie charts

3) Paerto chart

57
New cards

Frequency Table

A table that lists the number of cases in each category along with its name.

58
New cards

Frequency

Number of observations for each value, the counts

59
New cards

Relative Frequency (R.F.) / Proportion

number of observation in each category divided by the total number of observations

60
New cards

Percent proportion

proportion multiplied by 100 ( changing the decimal to a percentage)

61
New cards

Example of a frequency table

62
New cards

Pie chart

A circle having a slice of the pie for each category. Wherethe size of slice corresponds to the percetnage of observation in the category

63
New cards

Bar chart

Displays a vertical bar for each category. The height of the bar shows the percentage of observations in the category. Usually each bar is apart

64
New cards

Pareto chart

A bar chart in order from largest to smallest frequency or relative frequency

65
New cards

What are the types of display for quantitative data / variables?

1) Dot plot

2) Stem and leaf plot

3) Historgram

66
New cards

Dot plot

Shows a dot for each observation, placed just above the value on a number line for that observation.

-each dot represents one observation

-stacked dots indicated repeated values

-best for small to moderate-sized datasets

67
New cards

What do dot plot show

center (typical value)

spread (range, clustering)

shape (skewness, symmetry)'

outliers

68
New cards

Dot plot advanatges

simple and easy to interpret

preserves individual data values

69
New cards

Stem and leaf plot

organizes numerical data by separating each value into a stem (leading digit(s)) and a leaf (final digit

-stems are listed vertically

-leaves are listed horizontally in ascending order

-original data values can be reconstructed

70
New cards

Stem and leaf plot advantages

71
New cards

Steaf and leaf plot disadvantages

not suitable for data with wide ranges or many digits

difficult to compare multiple data sets

72
New cards

Histogram

A graph that uses bars to represent the frequencies or the relative frequencies of the possible outcomes for a quantitative variable. Most common graph.

*The bars touch, and exact data are not visible. however effective for large datasets

  • you can describe the overall pattern of a histogram by its shape, center, and variability

73
New cards

How is the shape of a distribution described for a historgram?

by its number of peaks and possession of symmetry (skew or not)

74
New cards

Symmetric distribution

A bell-shaped, a distribution where the right and left sides of the histogram are approximately mirror images of each other

75
New cards

Skewed right / positive skew

a distribution where the right side extends farther out than the left side

76
New cards

skewed left / negative skew

a distribution where the left side extends farther out than the right side

77
New cards

Mode

most common value in a data set

78
New cards

unimodal

one peak in the data

79
New cards

Bimodal

two peak in the data

80
New cards

measures of the center of a quantitative data

1) mean

2)median

81
New cards

1) Mean

The average of the data. The most commonly known and frequently used measure of center.

To find the mean, divide the sum of observed values by the number of observations

82
New cards

Sample mean (mean of a sample) symbol

83
New cards

Population mean symbol

84
New cards

Basic properties of the mean

-also known as the balancing point

-If the collection consists of values of a variable measured in specified units, then the mean has the same units too

-Usually, the mean is not equal to any value that was observed in the sample

-for skewed distributions, the mean is pulled in the direction of the longer tail

-MEAN IS SENSITIVE TO OUTLIERS (unusaully large or unusaully small observation)

85
New cards

2) Median

The midpoint (middle) of a distribution from smallest to largest

*If there is one center observation, the median is the center observation in terms of the ordered list

*If there are two center observations, the median is the average of the two center observation

86
New cards

Property of median

It is a resistant measure of center. It is resistant (robust) to extreme observation, which has little, if any, influence on its value, such as outliers.

good choice of measure of center when outliers are present

87
New cards

Mean vs Median example

88
New cards

If a distribution shape is perfectly symmetric, the mean and median are?

The mean equals the median

89
New cards

If a distribution shape is skewed to the left,the mean and median are?

The mean is less than the median

90
New cards

If a distribution shape is skewed to the right, the mean and median are?

The mean is greater than the median

91
New cards

Which choice of measure of center is best if the distribution is highly skewed

The median

92
New cards

Which choice of measure of center is best if the distribution is symmetric or one midly skewed

The mean

93
New cards

Which choice of measure of center is best if comparing a distribution that is one symmetric and one skewed

Median

94
New cards

Mode

The value that occurs most frequently

*for categorical data, it is the category with the highest frequency

*for discrete quantitative data, the value that occurs most often

*continuous quantitative data, usually not meaningful to look for the mode because there can be multiple modes or no mode at all

95
New cards

Example of mode

96
New cards

Measure of variability of quantitative data

1) Range

2) Standard deviation

3) variance

4) IQR

97
New cards

Variability

describes how far apart data points lie from each other and from the center of a distrubution

98
New cards

Range

The difference between the largest and the smallest value observations in a data set.

*however not a good measure of spread because it ignores other values in the data set, and it is affected by outliers-that is, range is not a resistant statistic

99
New cards

Deviation of an observation x from the mean x-bar is …

100
New cards

Sample Variance

The average squared deviation