ST 311 Final

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/103

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

104 Terms

New cards

Statistics (the science)

The science of planning studies and experiments, obtaining data, and organizing,

summarizing, analyzing, and interpreting those data and then drawing conclusions based on them

New cards

1st step of conducting a statistical analysis

Prepare: Consider the population, data types, and sampling method

New cards

2nd step of conducting a statistical analysis

Analyze: Describe the data you collected and use appropriate statistical methods to help with drawing conclusions

New cards

3rd step of conducting a statistical analysis

Conclude: Using statistical inference, make reasonable judgments and answer broad questions

New cards

Data

Collections of observations, such as measurements, counts, descriptions, or survey responses

New cards

Population

The complete collection of all data that we would like to better understand or describe

New cards

Sample

A subset of members selected from a population

New cards

Parameter

a numerical measurement describing some characteristic of a population

New cards

Statistic

a numerical measurement describing some characteristic of a sample

New cards

Quantitative Data

consists of numbers representing counts or measurements

New cards

Qualitative/Categorical Data

consists of names or labels (not numbers that represent counts or measurements)

New cards

Discrete Data

result when the data values are quantitative and the number of values is finite or “countable.”

New cards

Continuous Data

result from infinitely many possible quantitative values, where the collection of values is not countable.

New cards

Biased Samples

samples are more likely to produce some outcomes than others. The resulting statistics may be too high or too low

New cards

Convenience Samples

easy to collect, often have some bias or do not represent the population in general.

New cards

Volunteer Responses

a self-selected sample of people who respond to a general appeal

New cards

Simple Random Sample (SRS)

A sample of n subjects is selected in such a way that every possible sample of the

same size n has the same chance (or probability) of being chosen.

New cards

Stratified Sample

Subdivide the population into at least two different subgroups (or strata) so that the subjects within the same subgroup share the same characteristics. Then draw a sample from each subgroup (or stratum). The number sampled from each stratum may be done proportionally with respect to the size of the population.

New cards

Cluster Sample

Divide the population area into naturally occurring sections (or clusters) then randomly select some of those clusters and choose all the members from those selected clusters.

New cards

Systematic Sample

Select some starting point and then select every kth element in the population. This works

well when units are in some order (assembly lines, houses on a block, etc.).

New cards

Multistage Sample

Collect data by using some combination of the basic sampling methods.

New cards

Bad Sampling Frame

When attempting to list all members of a population, some subjects are missing. It

can be difficult to obtain a full, complete list.

New cards

Undercoverage

The sampling frame is missing groups from the population or the groups have smaller

representation in the sample than in the population.

New cards

Non-response Bias

Some part of the population chooses not to respond, or subjects were selected but

are not able to be contacted.

New cards

Response Bias

Responses given to questions or surveys are not truthful. This may occur when people

are unwilling to reveal personal matters, admit to illegal activity, or otherwise tailor their responses to “please” the investigator.

New cards

Wording and Order

The way questions are worded may be leading or inflammatory to elicit a particular

response. The order in which questions are asked may influence the answers.

New cards

x-bar

the sample mean

New cards

the population mean

New cards

Mean

Uses every data value
Highly affected by outliers
Not good for skewed data sets (but is best for symmetric data!)

New cards

Median

Not affected by outliers
Can use with any data set

New cards

Mode

Not necessarily in the center
Not affected by outliers
Only useful for multimodal or qualitative data

New cards

Histogram

horizontal scale representing classes of quantitative data values, and a vertical scale represents frequency.

New cards

Dotplot

shows each value in a dataset as a dot above a number line, no y-axis

New cards

Standard Deviation (sigma)

a measure of how much data values deviate from the mean

New cards

Variance

Standard Deviation squared

New cards

Experiment

The process of applying some treatment and then observing its effects is called an experiment. Has a control group and a treatment group.

New cards

Observational Study

The process of observing and measuring specific characteristics without attempting to

modify the individuals being studied

New cards

Response Variable

measures an outcome of a study

New cards

Explanatory Variable

explains or influences changes in the response variable

New cards

Reasons for variability in responses

Treatment effects
Experimental error
Confounding variables
Lurking variables

New cards

Control in Experimental Design

control the effects of lurking/confounding variables and other sources of variability on the

response by carefully planning the study

New cards

Randomization in Experimental Design

randomly assign experimental units to treatments to reduce or eliminate bias

New cards

Replication in Experimental Design

measure the effect of each treatment on many units to reduce chance variation in the

results

New cards

Completely Randomized Design

participants are randomly assigned to treatments (including control

groups)

New cards

Randomized Block Design

the experimenter divides participants into subgroups called blocks, such that the variability within blocks is less than the variability between blocks. Then, participants within each block are randomly assigned to treatment groups

New cards

Matched Pairs Design

used when the experiment has only two treatment groups; and participants can be grouped into pairs, based on one or more blocking variables. Then, within each pair, participants are randomly assigned to different treatments.

New cards

Bias of the Subjects

subjects may want to please the researcher or hope for a specific outcome

New cards

Hawthorne Effect

When people behave differently because they know they are being watched

New cards

Bias of the Researcher

They may assign subjects to groups or report results in a biased way, and may treat people or animals differently when holding certain expectations of their treatment

New cards

Blinding

when individuals associated with an experiment are not aware of how subjects have been assigned

New cards

Single Blind Study

those who could influence the results are blinded

New cards

Double Blind Study

those who evaluate the results are blinded as well as those who influence

New cards

z-score

the number of standard deviations away from the mean a certain data value is

New cards

Positive z-score

data value is above average

New cards

Negative z-score

data value is below average

New cards

Standardizing

The process of converting a data value (x) to a z-score

New cards

Significantly low values

considered significant or unusual if they are (µ − 2σ) or lower

New cards

Significantly high values

considered significant or unusual if they are (µ + 2σ) or higher

New cards

Values not significant

Between (µ − 2σ) and (µ + 2σ)

New cards

Density Curve

Probability is represented by the area underneath it

New cards

Normal Distribution properties

Mean, median, and mode are equal
Normal curve is bell-shaped and symmetric about the mean
Total area under the curve is equal to 1
Normal curve approaches, but never touches, the x-axis

New cards

Standard Normal Distribution

distribution of z-scores

New cards

Percentile

finding x-values when given probability, solve with z-score formula: 𝑥 = 𝜇 + 𝑧𝜎

New cards

Probability Distribution

describes how likely the values of the variable are to occur

New cards

Binomial Random Variable four criteria

There are a fixed number of trials/observations (n)
The trials are independent of each other
Each outcome is either a success (s), the outcome being counted, or a failure (f)
The probability of a success P(S) = p is constant for each trial

New cards

Summarize the shapes of Binomial Distributions

For small n, the shape tends to be skewed
As n increases, we see more bell-shaped/symmetric distributions (for any p).
When p is closer to 0 or 1, the shape starts to skew

New cards

population proportion

New cards

p-hat

sample proportion

New cards

In order to look at the distribution of a statistic, we need to know

the possible values of the random variable and how likely they are to occur

New cards

Standard Error

The standard deviation of the sample mean, gets smaller the larger the sample size is

New cards

Point Estimate of a Parameter

the value of the sample statistic that corresponds to that parameter

New cards

Level of Confidence (C)

the probability that the interval estimate contains / captures the population parameter

New cards

Confidence Interval (CI)

a range/interval of values used to estimate the true value of a population parameter

New cards

Margin of Error (MOE)

tells us the amount of random sampling error in our results and how far we might be off

New cards

How to narrow a confidence interval

Decrease the confidence level

New cards

Standard error gets smaller as the sample size

increases

New cards

Null Hypothesis

H0: Only claims using =. We assume the equality value in the null hypothesis is true and conduct the test under this assumption.

New cards

Alternative Hypothesis

HA: The complement of the null. Only strict inequalities may be used in the alternative

New cards

Type I Error

if the null hypothesis is rejected when it is actually true

New cards

Type II Error

if the null hypothesis is not rejected when it is actually false

New cards

Left-Tailed Test

we are only interested in showing that the parameter is less than a particular value

New cards

Right-Tailed Test

we are only interested in showing that the parameter is more than a particular value

New cards

Two-Tailed Test

we are interested in showing that the parameter is not equal to a particular value (less than or more than)

New cards

P-value (probability value)

the probability of observing this value or something more extreme, under the assumed distribution of the null hypothesis

New cards

If the p-value is less than α

reject the null

New cards

If the p-value is greater than α

fail to reject the null

New cards

If 0 is not included in the confidence interval for the difference of means

Then the means are significantly different

New cards

Confidence intervals and 2-sided hypothesis tests are

equivalent

New cards

Correlation Coefficient: r

a measure of the strength and the direction of a linear relationship between two variables

New cards

Strong r values

values greater than 0.8 or smaller than -0.8

New cards

Moderate r values

values between 0.5 and 0.8 or -0.5 and -0.8

New cards

Weak r values

values between -0.5 and 0.5 (closer to 0)

New cards

Residual

observed y (points) - predicted y (points on line)

New cards

Regression line

Best fitting straight line of the sample data

New cards

β₀

intercept of the population regression model and is the expected value (mean) of Y when x=0

New cards

β₁

the slope of the population regression model, and is the expected change in Y relative to one unit change in x

New cards

The smaller the SSE (sum of squares error),

the better the line fits

New cards

Condition 1 for Linear Regression: Linear data

If the data do have a linear association/correlation, then a linear regression model is not a good choice

New cards

Condition 2 for Linear Regression: Constant Variance

The errors/deviations around the regression line should be the same at each value of x

100

New cards

Coefficient of Determination: r-squared

the proportion of observed y variation that can be explained by the simple linear regression model