data analysis midterm

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/29

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:36 AM on 4/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

30 Terms

1
New cards

how to identify research questions

specific events → general patterns (greater applicability)

2
New cards

hypothesis

theory-based statement about what we would expect to observe if our theory is correct

3
New cards

how to develop theories

examine previous research on the topic

  • what other causes of DV did previous research miss

  • can their theory be applied elsewhere

4
New cards

hypothesis testing

  1. measurement of variables

  2. data collection

  3. data analysos

  4. judge whether the results favor hypothesis or null hypothesis

5
New cards

types of causality

  • deterministic: if x occurs, then y will occur

  • probabilistic (focus of social sciences): increases in x are associated w increases/decreases in the probability of y occurring

6
New cards

steps to establish causal relationship

  1. develop credible causal mechanism linking x to y (how does x cause y? what is it specifically abt increases in x that will likely lead to increases/decreases in y?)

  2. consider possibility that y causes x

  3. think of any other causes of y

  4. evaluate if x and y covary even after controlling for other causes (z) (if not, the rltshp btwn x and y is spurious)

7
New cards

planning data analysis

research design (experiment or observational) → setup (cross-sectional or time series) → measurement (reliability and validity)

8
New cards

research design

strategies to test the suggested causal relationship btwn IV and DV

9
New cards

types of research design

  • experimental: control/treatment group; can control for confounding variables

  • observational: no control over IV; can still lead to informed evaluations of causality when accounting for reverse causality and confounding variables

10
New cards

types of observational studies

  • cross-sectional: focus on variation btwn individuals or spatial units in the DV

  • time-series: comparison over time w/i a single unit

11
New cards

operationalization

process of translating an abstract concept into an observable measure

12
New cards

qualities of a good measure

  • reliability (consistency): applying the measurement to the same case will produce identical results (consistent responses from the same respondents regardless of when or how the question is asked)

  • validity: the measure accurately represents the concept

13
New cards

types of variables by measurement metric

  • categorical: variables that take a set of fixed and known values

    • nominal: categorical variables with NO ranking distinctions (ex. religious identification, regime type)

    • ordinal: variables w/ values that can be ordered (ex. likert scale: strongly disagree - disagree ...)

  • continuous: variables that can take on any value w/i a certain range

    • equal-unit difference; one-unit increase in the value always means the same thing (ex. age in years)

14
New cards

frequency table

table showing the values the variable takes and the number of time each value appears in the variable

15
New cards

descriptive statistics

numerical summary of main traits of the distribution of the data

16
New cards

measure of central tendency

typical values for a variable at the center of its distribution

  • mean

  • median

17
New cards

mean (aka expected value)

  • of a non-binary variable: average

  • of a binary variable: proportion of the value 1

  • zero-sum property: sum of the difference btwn each observation and mean is equal to 0

18
New cards

measure of spread

summarizes amt of variation of distribution relative to its center

  • variance: [sum of (y1-mean)²]/2

  • sd: sqrt of variance

<p>summarizes amt of variation of distribution relative to its center</p><ul><li><p>variance: [sum of (y1-mean)²]/2</p></li><li><p>sd: sqrt of variance</p></li></ul><p></p>
19
New cards

visualizing data

  • categorical variable: bar graph

  • continuous variable: box and whiskers

    • iqr = q3-q1

    • outliers

20
New cards

why probability plays an important role in inferential statistics

tells us how we generalize from sample to population and helps us decide whether the relationships in the sample occured by chance

21
New cards

multiplication law for independent events

<p></p>
22
New cards

probability distribution

list of outcomes and their associated probabilities

23
New cards

discrete propability function

  • probability that x can take a SPECIFIC value, a, is p(a): P[X=a] = p(a)

  • p(x) is non-negative for all real x

  • sum of pj = 1 where j is all possible values that x can have

  • 0 <= p(x) <= 1

24
New cards

continuois propability distribution

  • when a variable is continous, its probability distribution will be a smooth continuous curve

  • probabilities are measures over an interval of values, not single point (ex. p(-1<x<1) instead of p(x=1))

25
New cards

continuous probability function

  • f(x) is non-negative for all real x

<ul><li><p>f(x) is non-negative for all real x</p></li></ul><p></p>
26
New cards

normal distribution

  • N(u, o²)

  • mean = median = mode

  • 68% of data = mean +- 1SD

  • 95% of data = mean +- 2SD

  • 99.7% of data = mean +- 3SD

<ul><li><p>N(u, o²)</p></li><li><p>mean = median = mode</p></li><li><p>68% of data = mean +- 1SD</p></li><li><p>95% of data = mean +- 2SD</p></li><li><p>99.7% of data = mean +- 3SD</p></li></ul><p></p>
27
New cards

z-score

  • how likely it is to get an observed value given that the data follows a normal distribution

  • useful bc it converts any normal dist. into the standard normal N(0,1), making values across different distributions directly comparable

<ul><li><p>how likely it is to get an observed value given that the data follows a normal distribution</p></li><li><p>useful bc it converts any normal dist. into the standard normal N(0,1), making values across different distributions directly comparable</p></li></ul><p></p>
28
New cards

sampling distribution

probability distribution of a statistic drawn from repeated sampling

29
New cards

sampling distribution of sample mean

  • mean of distribution = population mean

  • SD of distribution (standard error) is population SD/sqrt of n (sample size)

  • normal dist

  • variance of distribution = (popuilation SF)² / n

  • but we don’t know the population SD, so we estimate it using s (sample SD)

<ul><li><p>mean of distribution = population mean</p></li><li><p>SD of distribution (<strong>standard error</strong>) is population SD/sqrt of n (sample size)</p></li><li><p>normal dist</p></li><li><p>variance of distribution = (popuilation SF)² / n</p></li><li><p>but we don’t know the population SD, so we estimate it using s (sample SD)</p></li></ul><p></p>
30
New cards

central limit theorem

for random sampling w n >= 30, the sampling distribution of the sample mean is approximately normal, regardless of the population data’s distribution shape

  • useful bc we can still use characteristics of normal distribution for the mean’s distribution to build confidence intervals and perform significance tests even when population distribution is skewed.