STATS Chapter 1

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/147

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

148 Terms

1
New cards

What is stats

  • The science of planning studies and experiments, obtaining data, and organizing, summarizing, presenting, analyzing, and interpreting those data and then drawing conclusions based on them

 


2
New cards

Key Concept

  • The process involved in conducting a statistical study consists of “prepare, analyze, and conclude.”

3
New cards

Statistical Thinking

  • Demands so much more than the ability to execute complicated calculations. It involves critical thinking and the ability to make sense of results 

4
New cards

Data

  • Collections of observations, such as measurements, genders, or survey responses

5
New cards

Population

  • The complete collection of all measurements or data that are being considered. Typically, a population is the complete collection of data about which we would like to make inferences  

6
New cards

Parameter

  •  Any numerical measurement that describes some characteristic of the population

7
New cards

Sample

  • It is a subcollection or subset of measurements, objects, or individuals from the population 

8
New cards

Statistics

  •  A numerical measurement that describes some characteristic of the sample

9
New cards

Prepare

One common but “generally” bad sampling practice: Voluntary Presponse Sample

  • Voluntary Response Sample or Self-Selected Sample is one in which the respondents themselves decide whether to be included.

  • It’s bad because people can decide whether to reply or not 

10
New cards

Analyze: Graph and Explore

  • An analysis should begin with appropriate graphs and explorations of the data.

11
New cards

Analyze: Apply stats methods

  • A good statistical analysis does not require strong computational skills. A good statistical analysis does require using common sense and paying careful attention to sound statistical methods.  


12
New cards

Analyze: Conclude

  • We need to distinguish between statistical significance and practical significance. 

13
New cards

Conclude: Statistical Signifigance

  • Achieved in a study if the likelihood of an event occurring by chance is 5% or less. 

14
New cards

Conclude: Pratical Signfigance

  • It is possible that some treatment or finding is effective, but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical.  


15
New cards

Misleading conclusions

When froming a concusion based on a statistical analysis, we sohudl make statments that are clear even to those who have no understadning

16
New cards

Sample Data Reported Instead of Measured

When collecting data from people it is

17
New cards

Loaded Questions

if survey questions are not worded carefully,

the results of a study can be misleading

18
New cards

Order of Questions

Sometimes survey questions are

unintentionally loaded by the order of the

items being considered

19
New cards

Nonresponse

A nonresponse occurs when someone

either refuses to respond or is unavailable.

20
New cards

Percentages

Some studies cite misleading percentages.

Note that 100% of some quantity is all of it,

but if there are references made to

percentages that exceed 100%, such

references are often not justified

21
New cards

Types of Data: Quantitative

(numerical) data consists of numbers that are

measurements or counts.

• Grams of sugar in a cookie

• Number of books a student owns

• Age (in years) when a person first drove a ca

22
New cards

Types of Data: Categorical

(qualitative/attribute) data consists of labels or names

that do not represent counts or measurements.

• Favourite Colour

• Nationality

• Student Number

23
New cards

Types of Quantitative Data: Discrete

data occurs when there is a finite or “countable” number

of values that the data can have (i.e. the number of possible values is

0, 1, 2, 3, . . .).

• Number of books a student owns

• Age (in years) when a person first drove a ca

24
New cards

Types of Quantitative data Continuous

data occurs when there is an infinite number of

values, such as it is not possible to count the number of values.

• Grams of sugar in a cookie.

• Amount of milk that a cow produce

25
New cards

Levels of Measurement

NOIR

26
New cards

Levels of Measurement N

Nominal consists of names, labels, categories only. Has no logical

order, like from low to high.

• Favourite Colour

• Country of birth

27
New cards

Levels of Measurement O

Ordinal data can be placed in a logical order, but differences

between values cannot be obtained.

• E.g. letter grades in a course (A, B, C, D, or F)

• T-shirt sizes

28
New cards

Levels of Measurement I

Interval data can be placed in order and the differences between

any two data values is meaningful. However, there is no natural

zero starting point (where none of the quantity is present).

• Years 1863, 1867, 2001, 1953

• Temperature in Celsius

29
New cards

Levels of Measurement R

Ratio data is interval data with the additional property that there

is also a natural zero starting point (where zero indicates none of

the quantity is present); This means that ratios between values are

meaningful.

• Amount of money in a bank account

• Height

30
New cards

Missing Data

A data value is missing completely at random if the likelihood of its being missing is

independent of its value or any of the other values in the data set. That is, any data

value is just as likely to be missing as any other data value

31
New cards

Big Data

Data sets so large and so complex that their analysis is beyond the

capabilities of traditional software tools (may require software

simultaneously running in parallel

32
New cards

Data Science

involves applications of statistics, computer science, and

software engineering, along with some other relevant fields (such as

sociology or finance

33
New cards

Collecting Sample Data: Observational Study

data is collected without modifying or

interfering with the study subjects

34
New cards

Collecting Sample Data: Experiment

Involves applying apply some treatment and

then observe its effects on the subject

35
New cards

SImple Random Sample

A sample of n subjects is selected in

such a way that every possible sample

of the same size n has the same

chance of being chosen

36
New cards

Systematic Sampling

Begin at some starting

point then select every

k-th object in the

population

37
New cards

Convenience Sampling

Gather data in an easy way

38
New cards

Stratified sampling

Divide

population into at least two

strata (subgroups) such that

objects in each subgroup

have similar characteristics.

Then randomly sample some

objects from within each

strata

39
New cards

Cluster Sampling

Divide

population into sections or

clusters, then randomly select

some clusters and then use every

person/object in those cluster

40
New cards

Voluntary-response sampling

Individuals self-select to be in a study or surgery

41
New cards

Random sampling error

A discrepancy between a sample

result and the true population

result; such an error results from

chance sample fluctuations.

42
New cards

NOn-sampling error

Sample data incorrectly collected, recorded,

or analyzed (such as by selecting a biased

sample, using a defective instrument,

copying the data incorrectly, or applying

statistical methods not appropriate for the

circumstances)

43
New cards

Non-random sampling error

Result of using a sampling method that is

not random, such as using a convenience

sample or a voluntary response sample

44
New cards

Frequency Distribution

shows how a data set is partitioned among several

classes (categories) by listing all categories along with

the number (frequency) of data values in each

45
New cards

Lower class limits

Are the samllest numbers that belong to each class

46
New cards

Upper class limits

Are the largest numbers that belong to each class

47
New cards

Class boundaries

are the numbers used to separate

classes, but without the gaps created

by class limits

48
New cards

Class midpoints

are the values in the middle of the

classes and can be found by adding

the lower class limit to the upper class

limit and dividing the sum by 2

49
New cards

Class width

is the difference between two consecutive lower class limits or two consecutive upper limits

50
New cards

Relative Frequency Distribution

includes the same class limits as a frequency distribution, but the

frequency of a class is replaced with a relative frequency (a proportion or a

percentage) ( relative frequency = class frequency/sum of all frequencies)

51
New cards

Cumulatiev frequency distribution

You add the number before that and then your value together shoudl add to total value by end of chart

52
New cards

Important characteristics of Data

Center, variation, distribution, outliers, time

53
New cards

Centre

A representative value that indicates where the middle of the

data set is located

54
New cards

Variation

measure of the amount of spread in the data

55
New cards

Distribution

he nature or shape of the set of data over a range of

values (such as bell-shaped, uniform, or skewed)

56
New cards

Outliers

Sample values that lie very far from the vast majority of

other sample values

57
New cards

Time

Changing characteristics of the data over time

58
New cards

Histogram

We can use a visual tool called a histogram to determine the shape of a distribution of data.

• A histogram provides a visual display (graph) of a frequency distribution

59
New cards
60
New cards

SHapes of distribution: Normal

Bell-

curve; most

of the data in

the center,

tails on either

side.

61
New cards

Shapes of distribution: Uniform

Different

possible values occur

with approximately

the same frequency

62
New cards

SHapes of distribution Skewed right (positively)

The data is mostly

on the left, with a

longer tail to the

right

63
New cards

Shapes of distribution: SKewed Left (negative)

The data is mostly on

the right, with a

longer tail to the left

64
New cards

Dotplot

Consists of a graph in which each data value is plotted as a point (or dot) along a scale

of values. Dots that are stacked represent multiple observations of the same values.

65
New cards

Stem-and-leaf plot

Used to display quantitative data by separating values into a “stem” and “leaf”.

• Helps in sorting data and provides a simple visualization of the distribution

66
New cards

Pie Chart

A graph depicting qualitative data as slices of a circle, in which

the size of each slice is proportional to frequency count

67
New cards

4 Measures

Mean, Median, Mode, Midrange

68
New cards

Mean

The mean (or arithmetic mean) of a set of data is the measure

of centre found by adding all data values and dividing the total

by the number of data values

69
New cards

n

represents the number of data

values in a sample.

70
New cards

N

represents the number of data

values in a population.

71
New cards

Resistant

A statistic is resistant if the presence of extreme values

(outliers) does not cause it to change very much.

72
New cards

Mean disadvantage

of the mean is that just one extreme value (outlier) can

change the value of the mean substantially. (Using the following

definition, we say that the mean is not resistant.

73
New cards

Median (resistant)

The median of a data set is the middle value when the

original data values are arranged in order of increasing (or

decreasing) magnitude.

74
New cards

Mode (resistant)

The mode of a data set is the value(s) that occur(s) with the greatest

frequency (qualitative data) Bimodal 2, multimodal 3

75
New cards

Midrange

The midrange of a data is the value midway

between the maximum and minimum values in

the original data set. It is found by adding the

maximum data value to the minimum data

value and then dividing the sum by 2

76
New cards

Round-Off Rules

For the mean, median, and midrange, carry one more decimal

place than is present in the original set of values.

• For the mode, leave the value as is without rounding (because

values of the mode are the same as some of the original data

values).

77
New cards

Range ( Not resistant)

The range of a set of data values is the difference

between the maximum data value and the minimum

data value.

Range = (maximum data value) − (minimum data value)

78
New cards

STDV

The standard deviation of a set of sample values, denoted by s, is

a measure of how much data values deviate away from the mean.

Notation

s = sample standard deviation

(σ = population standard deviation

79
New cards

Important properties

• The standard deviation is a

measure of how much data values

deviate away from the mean.

• The value of the standard deviation

s is never negative. It is zero only

when all of the data values are

exactly the same.

• Larger values of s indicate greater

amounts of variation.

• The standard deviation s can increase

dramatically with one or more outliers

(not resistant).

• The units of the standard deviation s

(such as minutes, feet, pounds) are the

same as the units of the original data

values.

• The sample standard deviation s is a

biased estimator of the population

standard deviation σ, which means that

values of the sample standard deviation

s do not center around the value of σ

80
New cards
81
New cards

Population STDV

is just N for sample we do n-1

82
New cards

Variance of a sample and a population

The variance of a set of values is a measure of

variation equal to the square of the standard

deviation.

• Sample variance:

s² = square of the standard deviation s.

• Population variance:

σ² = square of the population standard

deviation σ

83
New cards

z scores

A z-score is the number of standard deviations that a given value x lies

above or below the mean

84
New cards

Properties of z score

unitless measurement

• A data value is “unusual” if its z-score is less

than -2 or greater than +2

• A negative z-score indicates that the

observation lies below the mean; a positive z-

score indicates that the observation lies above

the mean

85
New cards

Percentiles

Percentiles are measures of

location, denoted P1, P2, . . . ,

P99, which divide a set of data

into 100 groups with about 1%

of the values in each group

86
New cards

To find the percentile k of a data value x

Percentile of x = number of values less than x/total number of values x 100

9

87
New cards

If you are given percentile

L= (k/100)n k=percentil n= total number

88
New cards

Quartiles

are measures of location,

denoted Q1, Q2, and Q3, which

divide a set of data into four groups

with about 25% of the values in

each group. In short:

• Q1 = P25

• Q2 = P50 = MEDIAN!

• Q3 = P75

89
New cards

IQR

IQR= Q3-Q1

90
New cards

5 nbumber summary

• Minimum

• First quartile, Q1

• Second quartile, Q2 (same as the median)

• Third quartile, Q3

• Maximum

91
New cards

outlier

if it is above Q3, by an amount

greater than 1.5 × IQR or below Q1, by an amount

greater than 1.5 × IQR

92
New cards

Sample space

s the collection of all possible simple

events for a procedure, i.e., the set of all possible outcomes

93
New cards

Probability Notation

• P denotes a probability.

• A,B,C,…denote specific events.

• P(A) denotes the “probability of event A occurring.”

94
New cards

Three main ways to determine probability: Classical Approach

Assume that a procedure has n different

simple events, and that each simple event has an equal

chance of occurring

  • P(A) = Number of ways A occurs/ Number of different simple events

95
New cards

Three main ways to determine probability: Relative Frequency

Conduct/observe a

procedure, count the number of times the event occurred

  • P (A) = Number of times A occurred/ number of times the experiment was repeated

96
New cards

Three main ways to determine probability: Subjective probabilities

The probability of an event is

estimated by using knowledge of relevant circumstance.

97
New cards

Law of large numbers

As a procedure is repeated again and again, the relative frequency

probability of an event tends to approach the actual probability.

98
New cards

Important Principles for probability

• The probability of an event is a number between 0 and 1, inclusive.

• If it is impossible for an event to occur, then the probability it will occur is 0.

• If an event is certain to occur, then the probability it will occur is 1.

99
New cards

Complement of an Event

Definition: The complement of an event A, consists of all outcomes that do not

belong to A and is denoted by A with line over it

100
New cards

2 important rules

1. Rule of Complementary Events P (A) + P (A) = 1

or P (A) = 1 - P ( A with line) "

or P (A with line) = 1 - P (A)

2. The sum of all probabilities in a sample space must equal