1/56
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
variable
a characteristic that can vary in value among subjects in a sample or a population, mutually exclusive and collectively exhaustive values
values
categories, possible options for responses
mutually exclusive
events that cannot happen at the same time - only belong to one category, no ambiguity
collectively exhaustive
everyone should be able to fit into one category, no one left over
qualitative (categorical)
scale of measurement is a set of unordered categories that differ in quality, not quantity or magnitude
quantitative (numerical)
scale of measurement is a set of ordered categories that differ in quantity or magnitude (can be ranked)
discrete variable
can only assume integer values - no fractional values
continuous variable
can assume any real value, including fractions - only limited by precision of instrument
nominal variable
qualitative/categorical, unordered and discrete (ex: hair, religion, color)
ordinal variable
qualitative/categorical, ordered and discrete (ex: preference for food, army ranks)
interval variable
quantitative/numerical, discrete or continuous - uniform intervals between adjacent values, arbitrary 0, subtraction and addition makes sense (ex: calendar year, degrees Celsius)
ratio variable
quantitative/numerical - has non-arbitrary true zero that means a complete absence of something, multiplication and division make sense (ex: height, number of siblings)
cross-sectional data
observation on different individual units at the same point in time (ex: the current presidential approval rating)
time series data
observations on a variable over time (ex: how does the amazon stock price vary year after year)
pooled cross sections
data from multiple years based on different cross-sectional samples of the same population - take a cross section of individuals and ask them a question, and do this year after year with new cross sections each time
panel or longitudinal survey
time series for each cross-sectional member in a data set - choose a cross-section of individuals and ask them the same questions over a time period (ex: Terman's termites)
tables
units of analysis are placed in top row, variables in columns
bar charts
qualitative data, use categories
ogive
uses first column of table as x-axis and cumulative frequency or percentage as vertical axis - will always trend upward or plateau, will never dip down
stem and leaf plots
no loss of data, can be rotated to show spread of data
histograms
quantitative data, gives frequency of ordered data - all bins on horizontal axis should have the same width, use Sturge's rule to calculate the number of bins
things to watch out for in visual displays
dramatic title, 3D and rotated graphs, gratuitous effects, appeal to authority figures, vague/no source, estimated data, funky axis scaling, non-zero origin
descriptive research questions
describe the problem (how many, what)
explanatory research questions
explain why/how the problem is occurring
theory
answer to a "how" or "why" question or speculative idea offered as an explanation - somewhat contested, becomes a law after its been repeatedly verified
concepts
turn into a theory (ex: religion, success)
hypothesis
theory that has been made concrete (replace concepts in a theory with variables)
instrument
measurement device like a survey, test, scale, ruler
unit of analysis
the entity about which we collect information - characteristics/properties of these entities are called variables
unit of measurement
units used to record measurements of a variable (ex: dollars, inches)
robust/resistant statistics
statistics not affected by outliers (median, mode)
mode
most common value - can be determined for nominal, ordinal, and interval-ratio data, may have more than one mode for a set of data
median
the 50th percentile, can only be determined for ordinal and interval-ratio data
mean
average - can only be calculated for interval-ratio data, takes into account the value of each item in a set of data (not resistant, can be affected by outliers), cannot be determined for grouped data if there's an open class
trimmed means
calculate mean after getting rid of the lowest and highest numbers (ex: remove lowest three and highest three numbers)
range
max-min, can be misleading if there are outliers
average deviation from the mean
calculate how far away on average the values are from the mean - numbers below the mean will have a negative distance, so this value will always equal zero because positive and negative signs will cancel out, suggesting that there is no variation
average absolute deviation from the mean
take distance of each value from the mean, but put it into absolute value before averaging- solves sign problem and gives you a whole number
variance
take distance of each value from the mean and then square it before averaging- solves sign problem but gives you units squared
standard deviation
the square root of variance - take distance of each value from the mean and then square it before averaging, then take square root of your result
coefficient of variation
std dev/mean *100%, helps us assess which of two or more interval-ratio variable has more variation (smaller CV = less variation)
standard unit (z-score)
(x-mean)/std dev, tells us by how many standard deviations a value lies above or below the mean of the data set - helps standardize data and makes it easier to compare
use CV when
comparing two or variables and want to know which has more variation, or when comparing two or more groups with respect to a single variable and want to know which has relatively more dispersion
use z-score when
comparing two or more individuals values of different variables and want to know which value is relatively more extreme or exceptional
empirical rule
works well for bell-shaped distributions, most data should fall within three standard deviations of the mean
histogram skew
based on where the tail of histogram lies - if tail goes to left, you have a left skew
combinations
order doesn't matter (AB=BA)
permutation
order matters (AB=/=BA)
random experiment
experiment must have two or more outcomes, and there must be uncertainty as to which outcome will occur (ex: flipping a coin, drawing a card)
sample space (s)
set of all basic outcomes (ex: heads, tails)
basic outcome
one of the possible results from a random experiment (ex: getting heads)
events
a combination of one or more basic outcomes, typically represented by uppercase letters (ex: Event A = rolling an even number on a die)
empirical estimation
necessary when we have no prior knowledge of events, hard to figure out with just logic but can be done with data
law of large numbers
as a sample size increases, so does probability
classical probability
don't need actual data, can reason it out logically
subjective probability
necessary when a repeatable random experiment is not available, reflects personal judgement or expert opinion about the likelihood of an event - often when an event is new and we don't have past data to work from
probability tree
to find probability of a basic outcome, multiply the probability of each branch leading to that outcome