1/63
Definitions
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
information about a specified collection
Statistics
the science of planning and studies and experiments; obtaining data; organizing and summarizing those data; and then drawing conclusions.
Population
the collection of ALL potential persons/objects/etc under study.
Sample
the best portion of persons/objects/etc under study for which data has been gathered.
For best results, a sample should have…
the same characteristics as the population it is representing.
Variable
a specific type of measurment
Parameter
is a numerical summary for a variable of a population
Statistic
is a numerical summary for a variable of a sample
Simple Random
each individual in the population has an equal chance of being chosen. Hard to obtain in real life; even a process like picking a ball from a bin blindfolded may deviate if not done carefully. This type of sample is most likely to imitate the characteristics of its host pop.
Systematic
taking items from a sorted list at regular intervals. Example, checking every 100th item made for quality control along a production line.
Stratified
First, divide a whole population into categories, then sample proportionally from each category. Note that there can be no one left out nor any overlap from the categories.
Cluster
First, identify clusters within a population, these may overlap and/or leave some individuals out. Then, choose a sample of clusters and gather samples from within each chosen cluster. Fro example, choosing 4 coffee shops around town and then asking 10 people from each survey a question.
Convenience
A sample which is taken quickly without any particular coordination. For example, a marketer may ask people leaving a nearby store about their options on a new product. The goal is usually not high-quality info.
Categorical
data consists of names or labels. These are called qualitative.
Numerical
data are numbers which actually represent the amount of measurement of something. These are called quantitive.
Discrete
data have gaps between possible values. The # of possible values within a window is finite.
Continuous
data have no gaps between possible values. The # of possible values within a window is infinite.
nominal
a variable with values that are only names, no structure.
ordinal
if its values are labels with a natural order
A variable is called interval if its values…
differences have consistent meaning.
A variable is called ratio If its values…
quotients (division) have consistent meaning.
Ratio values have a…
natural zero which indicates a lack of measurement.
A measure of center…
is a value (or values) toward the middle of a data set, which tends to be close to the other data values.
Mean
sum of all the data values divided by the amount of values
Median
a data set is a value where the count of smaller data values and the count of larger data values is equal
Mode
a data set is/are the value(s) which are repeated most often.
A summary is consistent…
if its values varies little between samples from the same population.
A summary is accurate…
if its value tends to be close to the associated population parameter.
A summary is robust…
if its values remains nearby with the addition or removal of extreme (large or small) values from a data set.
Measure of Spread
is a value which indicates how far data points tend to lay either from each other, or from a measure of center
Range
difference between max/min values
Inter-quartile
difference between the third and first quartiles. Q3-Q1+ IQR
Variance
of a data set is the average square deviation from the mean
Skew
of a data set measures how far data values tend to lean in either direction, relative to the center.
Skewed down/left…
if data values smaller than center are further from center than data values larger than center(-)
Skewed up/right
if data values smaller than center are closer to center than data values larger than center(+)
z-score
(or standard score or standardized value) is the number standard deviations that a given value x is above or below the mean.
If a z-score is les than two
the data is considered extremely small
if the z-score is more than two
the data is considered very large
A frequency table
shows how data are portioned among the different catergories called classes
Frequency
of a class is the count of data values that lie in the class
Bar graph
has gaps
histogram
has NO gaps
Statistical Study Approach
ask questions
determine how to gather the sample and which variables to measure
analyze the data and create appropriate summarizes
Create a picture of the results, drawing conclusions where appropriate
Sample error
arises directly from the sampling methodology and varies between samples of the same population
Sample bias
occurs when some individuals from the population are more/less likely to be chosen than others.
measurment
error arises from the data collection process as a result of how information is measured from an individual. Two types; random and systematic.
random
errors occur fresh from one observation to the next
systematic
errors occur in a related fashion between observations
Outlier
an observation lies very far from the other values in a sample, definition depends on context.