statistcs chapter 7-9
Chapter 7
Understanding confidence intervals
A sample is a selection of objects or observations taken from the population of interest
Inference = is when we draw conclusions about the population from the sample
It always be different results this is called sampling error or variation due to sampling.
Confidence intervals/ when we express an estimate of a population parameter. It is good practice to give it as a confidence interval a confidence interval communicates how accurate are estimated is likely to be
For lower bound / change percent to decimal, then subtract 1 from decimal, then multiply the decimal percent by the answer you got, then should be all the number number long divide that by the sample size, then square root answer then multiply that by 1.645 then subtract the margin error by the original decimal placement then move 2 decimal placements to the right, then round it
For upper bound you change percent to decimal then subtract from 1 then multiply percent by the answer you got, then divide that by the sample size, then square root, then multiply by 1.645 [z number] then add the margin number by original placement, then move two decimals placement, then round it
Example: let’s say we want to find out how big the apples are in our trees we put this as an investigation question
What is the mean weight of all apples in the tree? We take a sample and calculate the sample mean then we have our sample mean of population we use a confidence interval to express the range in which we are pretty sure the population parameter
Lies in this case a population perimeter is the mean weight for all the apples in the tree
What affects the width of confidence?-
The width of a confidence interval depends on two things, the variation within the population of interest and the size of the sample.
If all values in the population, we’re almost the same then our sample will also have little variation. Any sample we take is likely to be pretty similar to any other sample.
Population with low variation - leads to similar samples with low variation leads to narrow confidence interval
But a more varied population will lead to a more varied simple
Different samples, taken of the same population will differ more and our confidence interval would be larger
Population with lots of variation - lead to varied samples with high variation, leads to wider confidence interval
Simple size / affects the width of a confidence interval, small samples very more from each other and have less information
Large samples are more similar to each other and have more information which leads to narrow confidence intervals
Calculating confidence intervals - informal, traditional normal based, bootstrapping
Traditional normal based confided confidence, affects the width of the confidence interval
All estimates of parameters, such as mean medians differences of mean and differences of medians should be expressed as confidence intervals
Mean
Sd
Sample size
Confidence
First find the z then divide the sd but first square root it will be four number long the s size then divided that by the sd then it will be three number long multiply that by the z then subtract . add from mean then round
Point estimate/ are sample statistics used to estimate the exact value of a population parameter
An interval estimates confidence intervals, we use a range of values which the population perimeter may fall
Two requirements for constructing meaningful confidence intervals of the population proportion
1: the size of your sample is no more than 5% of the size of the population. It was drawn from.
2: if the sample meets this requirement,(1-p)>10 it means that it has an approximately normal distribution
Find the Z score then go to your Z table
The variation and the sample will be an indicator of the variation in the population
We use the standard deviation for the sample as a measure of variation in the population. The standard deviation tells us the average distance, the values are from the mean in the sample
We take the standard deviation and divide by the square root of the sample size this = the standard error
The standard error is how spread out we would expect those means to be
90% = 1.645
95% = 1.96
99%= 2.576
The more confident we want be the larger our confidence interval will be
T distribution - depends on the sample size and the chosen level of confidence the bigger, the t the bigger our confidence interval will be
We multiply the standard error by the T value to get the margin of error so now we add and subtract the margin of error from the sample mean to find the confidence.
This confidence gifts us a range of values that we can be confident contains the mean from the population
Turn percent into decimal
Write down your sample size N
Find the Z score
Step 1: change percent to decima
Step 2: subtract 1 from decimal
Step3 : multiply the decimal percent by the answer you got
Step 4 : divide that by the sample size
Step 5 : square root the answer
Step 6 : multiply that by 1.645
Step 7: subtract the margin error by the original decimal percent
Step 8 : move two decimal placements to make it a percent
Spot 9 : round it
Estimation: is a process whereby we select a random sample from a population and use a sample statistic to estimate a population perimeter
Chapter 8
The students t distribution
Looks like a normal distribution but has fatter tails = a higher dispersion of variables as there is more uncertainty
Similar to the z statistic
The formula that allows us to calculate it is t n -1 degrees of freedom the degrees of freedom is 19
Student t table = row indicates different degrees of freedom after 30 degrees of freedom the t statistics table becomes almost the same as the z statistic = normal distribution
To find degrees of freedom df you find degress then add them = -1 + -1 = -2
Then you add your 2 different samples
Then you subtracts answer by 2
Or you add the number of people in the sample not the numbers then you add then my the other group then subtract by 2
If a sample has 50 observations we use a z table instead of a t table
Hypothesis testing
Hypothesis = A premise or claim that we want to test or to investigate
Null hypothesis = h zero - the currently accepted value for a parameter = everyone people consider to be true based on previous studies but then some guy comes and presents an ALTERNATIVE HYPOTHESIS which the person text and studies on surveys to prove a different hypothesis
Alternative hypothesis - h a - also called the research hypothesis = involves the claim to be tested
H0 & Ha are mathematical opposites
The alternative always has to be greater number than the null hypothesis
U assume that the null hypothesis is true unless you have evidence
Possible outcomes = reject the null hypothesis - according to evidence
fail to reject null hypothesis H0 then we believe that the null hypothesis the first statement to be true if we reject it then we say we believe in the alternative hypothesis
test statistic - calculated from sample data 📊 and used to decided if we reject or fail to reject
Ex= sample 50 bars
5 g bars
Monday Avg 5.12g wed 5.72g this is close to 6 so we start to think what if we reject is since it’s farther than what is believe hypothesis Friday 7.23g
Then we reject that we’re making bars that are 5 g because we samples 50 bars and 7.23 which is far away from 5 that there’s no way the null hypothesis is not true
We get average value of the mass of the bar
we calculate test statistic and use this to help you determine is the data that u have enough to reject the null hypothesis or not
Statistically significant- where do we draw the line if we should reject the null hypothesis or not
Level of confidence - c - 95%, 99%
How confident are we in our decision
Level of significance - Alfa = 1 - confidence
So level of confidence is - 95%
Then the c is 0.95
Then A = 1 - 0.95
A = 0.05
Both are the same
The = sign is the "Null" (the boring idea that age doesn't matter). The > sign is the "Research"
1. The "Equals" Sign (μ1 =μ2 )
This symbol means "is exactly the same as."
Which hypothesis is it? This is almost always the Null Hypothesis. It’s the "boring" version where nothing special is happening.
two tailed test.
The "Greater Than" Sign (μ1 >μ2 )
This symbol means "is bigger than."
What it means: One group has a higher average than the other.
Which hypothesis is it? This is a Research Hypothesis. It is "directional" (a one-tailed test) because you are predicting a specific direction.
TYPE 1 error TYPE 2 error
TYPE 1 error = reject a true null hypothesis - false positive - the probability of this error is alpha - THE LEVEL OF SIGNIFICANCE
TYPE 2 error = you accept a false null fail to reject a null hypothesis hypothesis - false negative - the probability of this error is is by beta - beta mainly depend on sample size and population varince
Rejected a null hypothesis is = 1 - b
Power of the test = is increasing a sample size
-2 5
16.66
Degrees of freedom
Sample size - N - 1
1. Subtract | Sample Mean - Population Mean | Sample Mean - Population Mean teat value |
2. Divide | Divide by Population SD / n | Divide by Sample SD / n |
3. Result | This is your Z | This is your T |
4. Final Step | Look at Z-table | Look at T-table using n−1 |
Example
If your sample mean is 10, the population mean is 8, your standard deviation is 4, and your sample size is 16:
Subtract: 10−8=2
Square Root: 16 = 4
Divide SD: 4÷4=1 (This is your Standard Error)
Final Math: 2÷1=2.0
If you used population SD, your Z = 2.0.
If you used sample SD, your T = 2.0.
Difference between a z statistic and a t statistic
Z-Statistic | T-Statistic | |
When to use | When you know the PopulationStandard Deviation (σ). | When you only have the Sample Standard Deviation (s). |
Sample Size | Best for large samples (n>30). | Used for smaller samples (n<30), but works for any size. |
The Curve | A perfect, fixed Bell Curve. | A "flatter" Bell Curve with thicker tails. |
Confidence | You are more certain about the results. | You are being "cautious" because you're estimating the spread. |
Chapter 9
A sample is a selection of objects or observations taken from the population of interest. Inference is when we draw conclusions about the population from the sample. Different samples can yield different results, known as sampling error or variation due to sampling.
Confidence Intervals
- A confidence interval provides an estimate of a population parameter.
- It is good practice to express this estimate as a confidence interval; this communicates how accurate our estimate is likely to be.
Calculating the Bounds of a Confidence Interval
Lower Bound:
- Convert the percent to decimal.
- Subtract 1 from the decimal.
- Multiply by the answer obtained in step 2 by the percent.
- Divide that by the sample size.
- Take the square root of the result.
- Multiply it by 1.645 (the z-value for a 90% confidence level).
- Subtract this margin of error from the original estimate and convert back to a percentage, rounding to two decimal places.
Upper Bound:
- Convert the percent to decimal.
- Subtract from 1.
- Multiply the obtained decimal by the answer from step 2.
- Divide that by the sample size, square root the answer, and multiply by 1.645.
- Add this margin to the original estimate and convert back to a percentage, rounding to two decimal places.
Example:
To find the mean weight of all apples on a tree:
- Ask: What is the mean weight of all apples?
- Take a sample and calculate the sample mean.
- Use a confidence interval to express the estimated range for the population parameter (the mean weight).
Width of Confidence Intervals
- The width depends on:
- Variation within the population: Low variation leads to similar samples and narrower confidence intervals. High variation leads to varied samples and wider confidence intervals.
- Sample Size: Smaller samples have more variability and less information, while larger samples provide more reliable estimates leading to narrower confidence intervals.
Calculating Confidence Intervals
There are various methods: informal, traditional normal-based, and bootstrapping.
- For traditional normal-based intervals (mean calculation):
- Begin with the z-value.
- Calculate the standard error using the standard deviation (SD) divided by the square root of sample size (n).
- Multiply the standard error by the z-value and find the confidence interval by adding and subtracting from the mean.
Point Estimates and Interval Estimates
- Point estimates are single sample statistics used to estimate a population parameter.
- Interval estimates (confidence intervals) provide a range of values within which the population parameter may fall.
Requirements for Confidence Intervals of Population Proportion
- The sample size should not exceed 5% of the population size.
- If the sample meets this requirement (1-p) > 10 ensures approximately normal distribution.
Understanding Variation and Standard Deviation
- The standard deviation indicates how much variation exists from the mean.
- The standard error represents how spread out the means would be.
Confidence Levels:
- 90% confidence level corresponds to a z-value of 1.645.
- 95% confidence level corresponds to a z-value of 1.96.
- 99% confidence level corresponds to a z-value of 2.576.
T Distribution:
- Similar to the normal distribution but has wider tails (more uncertainty).
- Depends on sample size and confidence levels used.
Hypothesis Testing:
- Hypothesis: A claim to test.
- Null Hypothesis (H0): Accepted value for a parameter.
- Alternative Hypothesis (Ha): Claim to be tested.
- Outcomes can be rejecting or failing to reject the null hypothesis based on evidence.