$go to Math$

Statistics Normal, Binomial, Poisson, & Chi-Square

Statistics Final Flashcards (ST 311 NCSU)

Studied by 0 people

0.0(0)

Get a hint

Hint

Statistics

1 / 196

Earn XP

Description and Tags

Statistics

Normal, Binomial, Poisson, & Chi-Square

st 311

correlation and regression

statistics

nc state

University/Undergrad

197 Terms

Statistics

the science of planning studies and experiments, obtaining data, and organizing, summarizing, analyzing, and interpreting those data and then drawing conclusions based on them

New cards

Conducting a statistical study includes 3 phases:

Prepare: consider the population, data types, and sampling method
Analyze: describe the data you collected and use appropriate statistical methods to help with drawing conclusions
Conclude: using statistical inference, make reasonable judgements and answer broad questions

New cards

Data

collections of observations, such as measurements, counts, descriptions, or survey responses

New cards

Population

the complete collection of all measurements or data that are being considered. Typically, a population is the complete collection of all data we would like to better understand or describe. We also call it the population of interest

New cards

Sample

a subset of members selected from a population (random)

New cards

Parameter

a numerical measurement describing some characteristic of a population

New cards

Statistic

a numerical measurement describing some characteristic of a sample

New cards

Quantitative (numerical) data

consists of numbers representing counts or measurements (2 types: discrete or continuous)

New cards

Categorical data (qualitative)

consists of names or labels (NOT numbers)

New cards

Discrete data (quantitative)

result when the data values are quantitative and the number of values is finite or countable (ex: # of tosses of a coin before getting tails)

New cards

Continuous data (numerical)

result from infinitely many possible quantitative values where the collection of values is not countable (ex: the arm spans in inches of high school seniors)

New cards

Our goal is to answer a question about a ___

population

New cards

We want our sample to be random and ___ of the population

representative

New cards

Simple Random Sample (SRS)

A sample of n subjects is selected in such a way that every possible sample of the size size n has the same chance (probability) of being chosen

New cards

Stratified Sample

Subdivide the population into 2+ subgroups (or strata) so that the subjects in the same subgroups share the same characteristics. Then draw a sample from each subgroup. The number sampled from each stratum may be done proportionally with respect to population size.

New cards

Cluster Sample

Divide the population area into naturally occurring sections (or clusters), then randomly select some of these clusters and choose all the members for those selected clusters)

New cards

Systematic Sample

select some starting point and then select every kth element in the population. Works well when units are in the same order like an assembly line

New cards

Multistage sample

Collect data by using some combination of the basic sampling methods

New cards

Convenience Sampling

Select the first k # of subjects that you come across

New cards

Bad Sampling Frame

When attempting to list all members of a population, some subjects are missing. It can be difficult to make a complete list

New cards

Non-response bias

Some part of the population chooses not to respond, or subjects were selected but are not able to be contacted

New cards

Response bias

Responses to questions are not truthful. This may occur when people are unwilling to reveal personal matters, admit to illegal activity, or tailor their responses to “please” the investigator

New cards

Wording and Order Bias

The way questions are worded may be leading/inflammatory to elicit a response. Or the order of questions influences answers.

New cards

Measure of center

a value at or near the center or middle of a data set, “typical” values for a group

EX: mean, median, mode

New cards

denotes a sum, “sigma”

New cards

denotes individual data value

New cards

denotes # of values in a sample, “sample size”

New cards

denotes number of values in a population

New cards

x̄

denotes the same mean, “x bar”

New cards

denotes the population mean, “mew”

New cards

Mean

found by adding all values and dividing by the number of values in the set. A sample mean is the mean of a sample. A population mean is the mean of an entire population.

New cards

Median

the value that is in the middle when listed in ascending order. Shows what # separates the bottom 50% of the data from the top 50%. Roughly half of all values are below, and half are above it.

New cards

Mode

the value that occurs with the greatest frequency. Could be no mode. One mode: unimodal, two modes: bimodal, 2+ modes: multimodal

New cards

Histogram

the graph of a frequency distribution, a graph of bars of equal width drawn adjacent to each other, a horizontal scale representing classes of quantitative data values, a vertical scale (height) represents frequency

<p>the graph of a frequency distribution, a graph of bars of equal width drawn adjacent to each other, a horizontal scale representing classes of quantitative data values, a vertical scale (height) represents frequency</p>

New cards

Dotplot

shows each value in a dataset as a dot above a number line

New cards

Measures of variation (or spread)

Range, IQR, variance, standard deviation

New cards

Range

max data value - min data value (highly affected by outliers)

New cards

Interquartile Range (IQR)

uses quartiles to provide a range of values that are not as affected by potential outliers as the range

(Q1, Q2, Q3)…1/4 of the data lies between 2 consecutive quartiles

IQR= Q3-Q1

New cards

3 IQR quartiles together with the min and max values constitutes the 5-number summary:

minimum
Q1 (median of the first half of the dataset)
Median
Q3 (median of the second half of the dataset)
Maximum

<ol><li><p>minimum</p></li><li><p>Q1 (median of the first half of the dataset)</p></li><li><p>Median</p></li><li><p>Q3 (median of the second half of the dataset)</p></li><li><p>Maximum</p></li></ol><p></p>

New cards

Variance

(Standard deviation)²

New cards

Standard deviation

sqrt(variance)

Defined as a measure of how much data values deviate from the mean, the value of it is never negative, zero ONLY when data is all the same, larger values indicate greater amounts of variation, SD can increase a lot with one or more outliers, units of SD are the same as the units of the OG data values

New cards

a²

Population variance

New cards

σ or s

standard deviation

New cards

s²

sample variance

New cards

Experiment

the process of applying some treatment and then observing the effect

almost always compares 2+ groups: treatment and control group
the individuals in an experiment are called units

New cards

Control group

no treatment

New cards

Units

the individuals in an experiment

New cards

Observational study

the process of observing and measuring specific characteristics without attempting to modify the individuals studied

tell “what’s happening” and can’t describe cause-effect relationships
accessing reliable records counts as observational

New cards

Response variable

measures outcome of a study

New cards

explanatory variable

explains/influences changes in the response variable

New cards

Design of experiment

plan for collecting the sample

New cards

Treatment

a specific experimental condition applied to the units/subjects

New cards

Variability in Experiments

There will be variability from treatment effects, experimental error, lurking variables, and confounding variables

New cards

Treatment effects

different treatments cause different outcomes

New cards

Experimental error

variability among observed values of the response variable for units receiving some treatment, small as possible

New cards

Lurking variables

a variable not among the explanatory variables in a study but has impact

New cards

Confounding variables

2 variables confounded when the effects on the response variable can’t be distinguished

New cards

Principles of Experiment Design

Control, randomization, and replication

New cards

Control

Control the effects of lurking/confounding variables by carefully planning

New cards

Randomization

randomly assign experimental units to treatments to decrease bias

New cards

Replication

measure the effect of each treatment on many units to increase chance variation

New cards

Completely Randomized Design

participants randomly assigned to treatments, so lurking variables affect each group equally

New cards

Randomized Block Design

the experimenter divides participants into subgroups called blocks, so variability in blocks is less than between blocks. Then, part of each block are randomly assigned to treatment groups.

New cards

Matched Pairs Design

a special case of randomized block design; used when only 2 treatment groups are present. Participants grouped in pairs on one or more blocking variables. Then, in each pair, participants randomly assigned to different treatments

New cards

Placebo

false drug that subjects believe is real

New cards

Placebo effect

tendency to react to a drug/treatment regardless of function

New cards

Bias of Subjects

subjects may want to please researcher/hope for specific outcome (Hawthorne Effect, when people behave differently b/c they know they are being watched)

New cards

Bias of Researchers

people behave in ways that favor what they believe; researchers may assign subjects to groups/report results in a bias way

New cards

Blinding

when individuals in experiments are not aware of how subjects are assigned, so they are less likely to respond with bias

New cards

Single-blind study

those who could influence the results are blinded

New cards

Double-blind study

those who evaluate the results are blinded too

New cards

z-score

the number of standard deviations away from the mean a certain data value is

New cards

positive z-score

data value is above average

New cards

negative z-score

data value is below average

New cards

Standardizing

the process of converting a data value (often labeled x) to a z-score

New cards

𝑧 = (𝑥−𝜇) / 𝜎

converting x-value to z-score

New cards

Empirical Rule

When a distribution is bell-shaped/normal, the mean and standard deviation have the following relationship:

99.9% of the data is within 3 standard deviations of the mean, 95% of the data is within 2 SD’s, and 68% of the data is within 1 SD of the mean (34% is within -1 SD, 34% is within +1SD).

The 34, 14, 2.5 rule

New cards

Significantly low value

values are generally considered significant or unusual if they are (u-2a) or lower

New cards

Significantly high value

values are generally considered significant or unusual if they are |u + 2a | or higher

New cards

Values not significant

between (u-2a) and (u + 2a)

New cards

We will use a significance % of ___ as a general guide for significant values

New cards

Density curve

If we scale the bell curve model so the area under the curve = 1

New cards

Probability, in a contin. prob. distri., is consequently the ____ the density curve.

area under

New cards

Probability Statement

P (small # </= x </= bigger #)

New cards

The graph of a normal distri. is called the

normal curve

New cards

In a normal curve…

The mean, median, and mode are EQUAL

The normal curve is bell-shaped and is symmetric on the mean..

The total area under the normal curve is EQUAL TO 1.

The normal curve approaches, but never touches, the x-axis as it extends further away from the mean.

New cards

Distribution of z-scores

Standard normal distribution

New cards

Notation

X ~ N(u, σ) where the ~ symbol reads “is distributed

New cards

The random variable X is distri. normally with mean u and SD σ and

Z ~ N(0,1)

New cards

Distribution

describes the possible values of a variable, how often they occur, and what pattern they create

New cards

Probability description

does the same thing as other distributions but describes how likely (instead of how often) the values of the variable are to occur)

New cards

Continuous Random Variable

has an uncountable number of possible outcomes, represented by an interval on the number line

New cards

Discrete Random Variable

has a finite or countable number of possible outcomes that can be listed. Countable refers to the fact that they might be infinitely many values, but they can be associated with a counting process.

New cards

Criteria for Binomial Distribution

There are a fixed number of trials/observation. Labled n.
The trials are independent (the outcome of any individual trial doesn’t affect the probabilities in the other trials)
Each outcome can be classified as a success or failure. The outcome that a random variable is counting is labeled the success.
The probability of a success is constant for each trial. The probability of success is denoted by P(S) = p.

New cards

Binomial Notation

X ~ Bin (n,p)

New cards

parameters of the distribution

number of trials (n), probability of success (p)

New cards

Expected Value

E(x), mean of a random variable

New cards

The expected value of a random variable is a ___

weighted mean of the outcomes