statistcs chapter 7-9

Chapter 7

Understanding confidence intervals

A sample is a selection of objects or observations taken from the population of interest

Inference = is when we draw conclusions about the population from the sample

It always be different results this is called sampling error or variation due to sampling.

Confidence intervals/ when we express an estimate of a population parameter. It is good practice to give it as a confidence interval a confidence interval communicates how accurate are estimated is likely to be

For lower bound / change percent to decimal, then subtract 1 from decimal, then multiply the decimal percent by the answer you got, then should be all the number number long divide that by the sample size, then square root answer then multiply that by 1.645 then subtract the margin error by the original decimal placement then move 2 decimal placements to the right, then round it

For upper bound you change percent to decimal then subtract from 1 then multiply percent by the answer you got, then divide that by the sample size, then square root, then multiply by 1.645 [z number] then add the margin number by original placement, then move two decimals placement, then round it

Example: let’s say we want to find out how big the apples are in our trees we put this as an investigation question

What is the mean weight of all apples in the tree? We take a sample and calculate the sample mean then we have our sample mean of population we use a confidence interval to express the range in which we are pretty sure the population parameter

Lies in this case a population perimeter is the mean weight for all the apples in the tree

What affects the width of confidence?-

The width of a confidence interval depends on two things, the variation within the population of interest and the size of the sample.

If all values in the population, we’re almost the same then our sample will also have little variation. Any sample we take is likely to be pretty similar to any other sample.

Population with low variation - leads to similar samples with low variation leads to narrow confidence interval

But a more varied population will lead to a more varied simple

Different samples, taken of the same population will differ more and our confidence interval would be larger

Population with lots of variation - lead to varied samples with high variation, leads to wider confidence interval

Simple size / affects the width of a confidence interval, small samples very more from each other and have less information

Large samples are more similar to each other and have more information which leads to narrow confidence intervals

Calculating confidence intervals - informal, traditional normal based, bootstrapping

Traditional normal based confided confidence, affects the width of the confidence interval

All estimates of parameters, such as mean medians differences of mean and differences of medians should be expressed as confidence intervals

Mean

Sample size

Confidence

First find the z then divide the sd but first square root it will be four number long the s size then divided that by the sd then it will be three number long multiply that by the z then subtract . add from mean then round

Point estimate/ are sample statistics used to estimate the exact value of a population parameter

An interval estimates confidence intervals, we use a range of values which the population perimeter may fall

Two requirements for constructing meaningful confidence intervals of the population proportion

1: the size of your sample is no more than 5% of the size of the population. It was drawn from.

2: if the sample meets this requirement,(1-p)>10 it means that it has an approximately normal distribution

Find the Z score then go to your Z table

The variation and the sample will be an indicator of the variation in the population

We use the standard deviation for the sample as a measure of variation in the population. The standard deviation tells us the average distance, the values are from the mean in the sample

We take the standard deviation and divide by the square root of the sample size this = the standard error

The standard error is how spread out we would expect those means to be

90% = 1.645

95% = 1.96

99%= 2.576

The more confident we want be the larger our confidence interval will be

T distribution - depends on the sample size and the chosen level of confidence the bigger, the t the bigger our confidence interval will be

We multiply the standard error by the T value to get the margin of error so now we add and subtract the margin of error from the sample mean to find the confidence.

This confidence gifts us a range of values that we can be confident contains the mean from the population

Turn percent into decimal

Write down your sample size N

Find the Z score

Step 1: change percent to decima

Step 2: subtract 1 from decimal

Step3 : multiply the decimal percent by the answer you got

Step 4 : divide that by the sample size

Step 5 : square root the answer

Step 6 : multiply that by 1.645

Step 7: subtract the margin error by the original decimal percent

Step 8 : move two decimal placements to make it a percent

Spot 9 : round it

Estimation: is a process whereby we select a random sample from a population and use a sample statistic to estimate a population perimeter

Chapter 8

The students t distribution

Looks like a normal distribution but has fatter tails = a higher dispersion of variables as there is more uncertainty

Similar to the z statistic

The formula that allows us to calculate it is t n -1 degrees of freedom the degrees of freedom is 19

Student t table = row indicates different degrees of freedom after 30 degrees of freedom the t statistics table becomes almost the same as the z statistic = normal distribution

To find degrees of freedom df you find degress then add them = -1 + -1 = -2

Then you add your 2 different samples

Then you subtracts answer by 2

Or you add the number of people in the sample not the numbers then you add then my the other group then subtract by 2

If a sample has 50 observations we use a z table instead of a t table

Hypothesis testing

Hypothesis = A premise or claim that we want to test or to investigate

Null hypothesis = h zero - the currently accepted value for a parameter = everyone people consider to be true based on previous studies but then some guy comes and presents an ALTERNATIVE HYPOTHESIS which the person text and studies on surveys to prove a different hypothesis

Alternative hypothesis - h a - also called the research hypothesis = involves the claim to be tested

H0 & Ha are mathematical opposites

The alternative always has to be greater number than the null hypothesis

U assume that the null hypothesis is true unless you have evidence

Possible outcomes = reject the null hypothesis - according to evidence

fail to reject null hypothesis H0 then we believe that the null hypothesis the first statement to be true if we reject it then we say we believe in the alternative hypothesis

test statistic - calculated from sample data 📊 and used to decided if we reject or fail to reject

Ex= sample 50 bars

5 g bars

Monday Avg 5.12g wed 5.72g this is close to 6 so we start to think what if we reject is since it’s farther than what is believe hypothesis Friday 7.23g

Then we reject that we’re making bars that are 5 g because we samples 50 bars and 7.23 which is far away from 5 that there’s no way the null hypothesis is not true

We get average value of the mass of the bar

we calculate test statistic and use this to help you determine is the data that u have enough to reject the null hypothesis or not

Statistically significant- where do we draw the line if we should reject the null hypothesis or not

Level of confidence - c - 95%, 99%

How confident are we in our decision

Level of significance - Alfa = 1 - confidence

So level of confidence is - 95%

Then the c is 0.95

Then A = 1 - 0.95

A = 0.05

Both are the same

The = sign is the "Null" (the boring idea that age doesn't matter). The > sign is the "Research"

1. The "Equals" Sign (μ1 =μ2 )

This symbol means "is exactly the same as."

Which hypothesis is it? This is almost always the Null Hypothesis. It’s the "boring" version where nothing special is happening.

two tailed test.

The "Greater Than" Sign (μ1 >μ2 )

This symbol means "is bigger than."

What it means: One group has a higher average than the other.

Which hypothesis is it? This is a Research Hypothesis. It is "directional" (a one-tailed test) because you are predicting a specific direction.

TYPE 1 error TYPE 2 error

TYPE 1 error = reject a true null hypothesis - false positive - the probability of this error is alpha - THE LEVEL OF SIGNIFICANCE

TYPE 2 error = you accept a false null fail to reject a null hypothesis hypothesis - false negative - the probability of this error is is by beta - beta mainly depend on sample size and population varince

Rejected a null hypothesis is = 1 - b

Power of the test = is increasing a sample size

-2 5

16.66

Degrees of freedom

Sample size - N - 1

1. Subtract	Sample Mean - Population Mean	Sample Mean - Population Mean teat value
2. Divide	Divide by Population SD / n	Divide by Sample SD / n
3. Result	This is your Z	This is your T
4. Final Step	Look at Z-table	Look at T-table using n−1

Example

If your sample mean is 10, the population mean is 8, your standard deviation is 4, and your sample size is 16:

Subtract: 10−8=2
Square Root: 16 = 4
Divide SD: 4÷4=1 (This is your Standard Error)
Final Math: 2÷1=2.0

If you used population SD, your Z = 2.0.
If you used sample SD, your T = 2.0.

Difference between a z statistic and a t statistic

Z-Statistic	T-Statistic
When to use	When you know the PopulationStandard Deviation (σ).	When you only have the Sample Standard Deviation (s).
Sample Size	Best for large samples (n>30).	Used for smaller samples (n<30), but works for any size.
The Curve	A perfect, fixed Bell Curve.	A "flatter" Bell Curve with thicker tails.
Confidence	You are more certain about the results.	You are being "cautious" because you're estimating the spread.

Chapter 9

A sample is a selection of objects or observations taken from the population of interest. Inference is when we draw conclusions about the population from the sample. Different samples can yield different results, known as sampling error or variation due to sampling.

Confidence Intervals

A confidence interval provides an estimate of a population parameter.
It is good practice to express this estimate as a confidence interval; this communicates how accurate our estimate is likely to be.

Calculating the Bounds of a Confidence Interval

Lower Bound:

Convert the percent to decimal.
Subtract 1 from the decimal.
Multiply by the answer obtained in step 2 by the percent.
Divide that by the sample size.
Take the square root of the result.
Multiply it by 1.645 (the z-value for a 90% confidence level).
Subtract this margin of error from the original estimate and convert back to a percentage, rounding to two decimal places.

Upper Bound:

Convert the percent to decimal.
Subtract from 1.
Multiply the obtained decimal by the answer from step 2.
Divide that by the sample size, square root the answer, and multiply by 1.645.
Add this margin to the original estimate and convert back to a percentage, rounding to two decimal places.

Example:
To find the mean weight of all apples on a tree:

Ask: What is the mean weight of all apples?
Take a sample and calculate the sample mean.
Use a confidence interval to express the estimated range for the population parameter (the mean weight).

Width of Confidence Intervals

The width depends on:
- Variation within the population: Low variation leads to similar samples and narrower confidence intervals. High variation leads to varied samples and wider confidence intervals.
- Sample Size: Smaller samples have more variability and less information, while larger samples provide more reliable estimates leading to narrower confidence intervals.

Calculating Confidence Intervals
There are various methods: informal, traditional normal-based, and bootstrapping.

For traditional normal-based intervals (mean calculation):
- Begin with the z-value.
- Calculate the standard error using the standard deviation (SD) divided by the square root of sample size (n).
- Multiply the standard error by the z-value and find the confidence interval by adding and subtracting from the mean.

Point Estimates and Interval Estimates

Point estimates are single sample statistics used to estimate a population parameter.
Interval estimates (confidence intervals) provide a range of values within which the population parameter may fall.

Requirements for Confidence Intervals of Population Proportion

The sample size should not exceed 5% of the population size.
If the sample meets this requirement (1-p) > 10 ensures approximately normal distribution.

Understanding Variation and Standard Deviation

The standard deviation indicates how much variation exists from the mean.
The standard error represents how spread out the means would be.

Confidence Levels:

90% confidence level corresponds to a z-value of 1.645.
95% confidence level corresponds to a z-value of 1.96.
99% confidence level corresponds to a z-value of 2.576.

T Distribution:

Similar to the normal distribution but has wider tails (more uncertainty).
Depends on sample size and confidence levels used.

Hypothesis Testing:

Hypothesis: A claim to test.
Null Hypothesis (H0): Accepted value for a parameter.
Alternative Hypothesis (Ha): Claim to be tested.
Outcomes can be rejecting or failing to reject the null hypothesis based on evidence.