data analysis midterm

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/29

There's no tags or description

Looks like no tags are added yet.

Last updated 6:36 AM on 4/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

New cards

how to identify research questions

specific events → general patterns (greater applicability)

New cards

hypothesis

theory-based statement about what we would expect to observe if our theory is correct

New cards

how to develop theories

examine previous research on the topic

what other causes of DV did previous research miss
can their theory be applied elsewhere

New cards

hypothesis testing

measurement of variables
data collection
data analysos
judge whether the results favor hypothesis or null hypothesis

New cards

types of causality

deterministic: if x occurs, then y will occur
probabilistic (focus of social sciences): increases in x are associated w increases/decreases in the probability of y occurring

New cards

steps to establish causal relationship

develop credible causal mechanism linking x to y (how does x cause y? what is it specifically abt increases in x that will likely lead to increases/decreases in y?)
consider possibility that y causes x
think of any other causes of y
evaluate if x and y covary even after controlling for other causes (z) (if not, the rltshp btwn x and y is spurious)

New cards

planning data analysis

research design (experiment or observational) → setup (cross-sectional or time series) → measurement (reliability and validity)

New cards

research design

strategies to test the suggested causal relationship btwn IV and DV

New cards

types of research design

experimental: control/treatment group; can control for confounding variables
observational: no control over IV; can still lead to informed evaluations of causality when accounting for reverse causality and confounding variables

New cards

types of observational studies

cross-sectional: focus on variation btwn individuals or spatial units in the DV
time-series: comparison over time w/i a single unit

New cards

operationalization

process of translating an abstract concept into an observable measure

New cards

qualities of a good measure

reliability (consistency): applying the measurement to the same case will produce identical results (consistent responses from the same respondents regardless of when or how the question is asked)
validity: the measure accurately represents the concept

New cards

types of variables by measurement metric

categorical: variables that take a set of fixed and known values
- nominal: categorical variables with NO ranking distinctions (ex. religious identification, regime type)
- ordinal: variables w/ values that can be ordered (ex. likert scale: strongly disagree - disagree ...)
continuous: variables that can take on any value w/i a certain range
- equal-unit difference; one-unit increase in the value always means the same thing (ex. age in years)

New cards

frequency table

table showing the values the variable takes and the number of time each value appears in the variable

New cards

descriptive statistics

numerical summary of main traits of the distribution of the data

New cards

measure of central tendency

typical values for a variable at the center of its distribution

mean
median

New cards

mean (aka expected value)

of a non-binary variable: average
of a binary variable: proportion of the value 1
zero-sum property: sum of the difference btwn each observation and mean is equal to 0

New cards

measure of spread

summarizes amt of variation of distribution relative to its center

variance: [sum of (y1-mean)²]/2
sd: sqrt of variance

New cards

visualizing data

categorical variable: bar graph
continuous variable: box and whiskers
- iqr = q3-q1
- outliers

New cards

why probability plays an important role in inferential statistics

tells us how we generalize from sample to population and helps us decide whether the relationships in the sample occured by chance

New cards

multiplication law for independent events

New cards

probability distribution

list of outcomes and their associated probabilities

New cards

discrete propability function

probability that x can take a SPECIFIC value, a, is p(a): P[X=a] = p(a)
p(x) is non-negative for all real x
sum of p_j = 1 where j is all possible values that x can have
0 <= p(x) <= 1

New cards

continuois propability distribution

when a variable is continous, its probability distribution will be a smooth continuous curve
probabilities are measures over an interval of values, not single point (ex. p(-1<x<1) instead of p(x=1))

New cards

continuous probability function

f(x) is non-negative for all real x

<ul><li><p>f(x) is non-negative for all real x</p></li></ul><p></p>

New cards

normal distribution

N(u, o²)
mean = median = mode
68% of data = mean +- 1SD
95% of data = mean +- 2SD
99.7% of data = mean +- 3SD

<ul><li><p>N(u, o²)</p></li><li><p>mean = median = mode</p></li><li><p>68% of data = mean +- 1SD</p></li><li><p>95% of data = mean +- 2SD</p></li><li><p>99.7% of data = mean +- 3SD</p></li></ul><p></p>

New cards

z-score

how likely it is to get an observed value given that the data follows a normal distribution
useful bc it converts any normal dist. into the standard normal N(0,1), making values across different distributions directly comparable

<ul><li><p>how likely it is to get an observed value given that the data follows a normal distribution</p></li><li><p>useful bc it converts any normal dist. into the standard normal N(0,1), making values across different distributions directly comparable</p></li></ul><p></p>

New cards

sampling distribution

probability distribution of a statistic drawn from repeated sampling

New cards

sampling distribution of sample mean

mean of distribution = population mean
SD of distribution (standard error) is population SD/sqrt of n (sample size)
normal dist
variance of distribution = (popuilation SF)² / n
but we don’t know the population SD, so we estimate it using s (sample SD)

<ul><li><p>mean of distribution = population mean</p></li><li><p>SD of distribution (<strong>standard error</strong>) is population SD/sqrt of n (sample size)</p></li><li><p>normal dist</p></li><li><p>variance of distribution = (popuilation SF)² / n</p></li><li><p>but we don’t know the population SD, so we estimate it using s (sample SD)</p></li></ul><p></p>

New cards

central limit theorem

for random sampling w n >= 30, the sampling distribution of the sample mean is approximately normal, regardless of the population data’s distribution shape

useful bc we can still use characteristics of normal distribution for the mean’s distribution to build confidence intervals and perform significance tests even when population distribution is skewed.