1/193
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
When referring to the field, this is the science of planning studies and experiments, organizing and data and organizing, summarizing, analyzing and interpreting those data.
Prepare
First phase of constructing a statistical study. Consider the population, data types, and sampling method.
Analyze
Second phase of constructing a statistical study. Describe the data you collected and use appropriate statistical methods to help with drawing conclusions.
Conclude
Third phase of constructing a statistical study. Using statistical inference, make reasonable judgments and answer broad questions.
Data
collections of observations, such as measurements, counts, descriptions, or survey responses. Helps us to understand our world. Varies and are imperfect.
Population
The complete collection of all measurements or data that are being considered. Typically is the complete collection of all data that we would like to better understand or describe. Also called the population of interest.
Sample
A subset of members selected from a population. For good results, this should be random and representative of the population.
Parameter
A numerical measurement describing some characteristic of a population. Note: both this and population begin with p!
Statistic
A numerical measurement describing some characteristic of a sample. Different from the field.
Quantitative data
Also known as numerical data. Consists of numbers representing counts or measurements. Examples: Age of a professional athlete, weight of a letter.
Categorical data
Also known as qualitative or attribute data. Consists of names or labels. When numbers are used as labels, it still pertains to this. Examples: college major, hometown.
Discrete Data
result when the data values are quantitative and the number of values is finite or “countable.” Example: the number of tosses of a coin before getting tails.
Continuous data
Results from infinitely many possible quantitative values, where the collection of values is not countable.
Important information
What you want to know and who you want to know it about.
Biased samples
Samples that are more likely to produce some outcomes than others. The resulting statistic may be too high or too low.
Convenience samples
Samples that are easy to collect. Often have some bias or do not represent the population in general.
Volunteer response sample
A self-selected sample of people who respond to a general appeal.
Random samples
Lead to results that follow a predictable pattern.
Simple random sample
A sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen. Compile a numbered list of the units in the population. Then use a computer, calculator, or table to generate random numbers. Those whose numbers are generated are selected to be in the sample.
Stratified sample
Subdivide the population into at least two different subgroups so that the subjects within the same subgroup share the same characteristics. Then draw a sample from each subgroup. The number sampled from each stratum may be done proportionally with respect to the size of the population.
Cluster sample
Divide the population area into naturally occurring sections then randomly select some of those clusters and choose all the members from those selected clusters.
Systematic sample
Select some starting point and then select every kth element in a population. Works well when units are in some order.
Multi stage sample
Collect data by using some combination of the basic sampling methods.
Bad sampling frame
when attempting to list all members of a population, some subjects are missing. It can be difficult to obtain a full, complete list.
Undercoverage
The sampling frame is missing groups from the population or the groups have smaller representation in the sample than in the population.
Non-response bias
Some part of the population chooses not to respond, or subjects were selected but are not able to be contacted.
Response bias
Responses given to questions or surveys are not truthful. This may occur when people are unwilling to reveal personal matters, admit to illegal activity, or otherwise tailor their responses to “please” the investigator.
Wording and order
The way questions are worded may be leading or inflammatory to elicit a particular response. The order in which questions are asked may influence the answers.
Experiment
The process of applying some treatment and then observing its effects. Almost always compares two (or more) groups. Typically this involves a treatment group and a control group. Type of study better at establishing causation.
Experimental units
The individuals in experiments. Called subjects when referring to people. Object upon which the response is measured or individuals on which the treatment is done.
Observational study
the process of observing and measuring specific characteristics without attempting to modify the individuals being studied. Tell what’s happening and cannot describe cause effect relationships. Good for establishing whether two variables are related, or to learn characteristics of a population.
Response variable
measures the outcome of a study
Explanatory variable
Explains or influences changes in the response variable.
Design of experiment
Plan for collecting the sample.
Treatment
A specific experimental condition applied to the units or subjects.
Treatment effects
What we’re looking for in an experiment. Different treatments causing different outcomes.
Experimental error
variability among observed values of the response variable for experimental units that receive the same treatment. We want this to be as small as possible.
Lurking variables
A variable that is not among the explanatory variables in a study and yet may influence the interpretation of the relationship among response and explanatory variables.
Confounding variables
Two variables are confounded when the effects on the response variable cannot be distinguished from each other.
Control group
Is a group that recieves no treatment and is used as a baseline or comparison for the treatment group.
Randomization
Randomly assign experimental units already in a sample to a treatment groupto reduce or eliminate bias.
Replication
Measure the effect of each treatment on many units to reduce chance variation in the result.
Completely randomized design
Participants are randomly assigned to treatments (including control groups). By randomly assigning subjects to treatments, the experimenter assumes that, on average, lurking variables will affect each treatment group equally. Any significant differences between groups can be fairly attributed to the explanatory variable.
Randomized block design
The experimenter divides participants into subgroups called blocks, such that the variability within blocks is less than the variability between blocks. Then, participants within each block are randomly assigned to treatment blocks.
Matched pairs designs
A special case of the randomized block design. It is used when the experiment has only two treatment groups; and participants can be grouped into pairs based on one or more blocking variables. Then, within each pair, participants are randomly assigned to different treatments. This can also be done as a before and after experiment where the same subject is recorder and after the treatment.
Placebo
A false drug or treatment that the subjects believe is real. Examples include sugar pills, saline solutions, fake treatments, etc.
The placebo effect
The tendency to react to a drug or treatment regardless of its actual physical function. People believe that a drug will make them better, so they get better whether the drug is real or not.
Bias of the subjects
similar to response bias in sampling, subjects may want to please the researcher or hope for a specific outcome.
Hawthorne effect
when people behave differently because they know they are being watched.
Bias of the researcher
People subconsciously behave in ways that favor what they believe. Researchers, even when following a protocol, are no different. They may assign subjects to groups or report results in a biased way. They may treat people or animals differently when holding certain expectations of their research.
Blinding
When individuals associated with an experiment (as a subject or experimenter) are not aware of how subjects are assigned (treatment or control, treatment or placebo). Without this knowledge, the subjects are less likely to respond with bias and the researchers are less likely to allow thier biases to influence the study.
Single-blind study
Those who could influence the results (subjects, administrators, technicians, etc.)
Double-blind study
Those who evaluated the results (judges, physicians, analysts, etc.) are blinded as well.
Frequency distribution
When working with large data sets, this is helpful in organizing and summarizing data. Ex: In a sample, how many people say each flavor of ice cream is their favorite?
Measure of center
A value at or near the center or middle of a data set. Often interpreted as “typical” values of a group. The most common ones are mean, median, and mode.
Σ
Denotes a sum, sigma
x
denotes an individual data value
n
denotes the number of values in a sample
N
denotes the number of values in a population
x̄
Denotes the sample mean. Pronounced “x-bar”
μ
Denotes the population mean. Pronounced “mew”
Mean
Found by adding all values and divided by the number of values in the set. Is highly affected by outliers and is not good for skewed data sets.
Median
The value that is in the middle when listed in ascending order. Shows what separates the bottom 50% from the top 50%. Not affected by outliers and can use with any data set.
Mode
The value that occurs with the greatest frequency in a data set. Not necessarily in the center, not affected by outliers. Only useful for qualitative data.
Unimodal
A data set with one mode
Bimodal
A data set with two modes.
multimodal
A data set with more than two modes.
Histogram
The graph of a frequency distribution. Consists of bars of equal width drawn adjacent to each other, a horizontal scale representing classes of quantitative data values, and a vertical scale representing frequency.
Normal distribution
Unimodal and symmetric, the bell curve. A continuous probability distribution for a random variable, x. Historically, it is a very important distribution in statistics.
Right-skewed distribution
positively-skewed. mode<median<mean. Outliers appear on the right side.
Left-skewed distribution
Negatively skewed. mean<median<mode. Outliers appear on the left side.
Uniform distribution
equal spread, no peaks.
Symmetric distribution
mean=median=mode
Variability
The extent to which data points in a statistical distribution or data set diverge from the average value, as well as the extent to which these data points differ from each other.
Range
The difference between the maximum and minimum. Since this is calculated using only the two most extreme data values, it is highly affected by outliers.
Interquartile range
Uses what are called quartiles to provide a range of values that are not as affected by potential outliers as the range. The difference between the third and first quartiles.
Quartiles
values that separate a data set into fourths.
Q1
The first quartile
Q2
The second quartile, a.k.a. the median
Q3
The third quartile
Five number summary
1: Minimum, 2: Q1, 3:Median, 4: Q3, 5: Maximum,
Boxplot
A visual representation of the 5 number summary and also helps to identify outliers. Can be displayed vertically or horizontally.
variance
(standard deviation)²
Standard deviation
How much data values deviate from the mean. Never negative. Zero only when all the data values are exactly the same. Can increase dramatically with one or more outliers. Units are the same as the original data value.
Population variance
σ² (sigma squared)
Standard deviation
σ (sigma)
Sample variance
s² (s-squared)
Sample standard deviation
s. Value used to estimate the standard deviation for mean confidence intervals at it is often not known.
Z-score
the number of standard deviations away from the mean a certain data value is.
Positive z-score
Data value is above average
Negative z-score
Data value is below average.
Standardizing
The process of converting a data value (which is often labelled x) to a z-score. The formula used to do this is
z= (x - μ)/σ
z=value of interest-mean/standard deviation
Significantly low
When values are (μ-2σ) or lower (beyond z=-2)
Significantly high
When values are (μ+2σ) or higher (beyond z=+2)
Density curve
A curve with a total area under the curve equal to one.
Area under the density curve
represents probability in a continuous probability distribution.
Probability statement
P(A<x<B). Saying, the probability that we observe a random value between A and B is some number.
Normal curve
the graph of a normal distribution.
Properties:
The mean, median, and mode are equal
Bell shaped and symmetric about the mean
Total area under the curve is equal to one
Approaches, but never touches the x-axis as it extends farther and farther away from the mean.
X ~ N(mean, standard deviation)
The random variable x is distributed normally with mean μ and standard deviation σ.
Standard Normal Distribution
The distribution of z-scores. Has a mean, μ, of 0 and a standard deviation, σ, of 1.