STSCI 2100: Intro to Statistics & Data Science Unit 1

0.0(0)
studied byStudied by 1 person
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/35

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

36 Terms

1
New cards

Statistics

A body of principles for designing the process of data collection
and making inferences about the population from information in the sample

2
New cards

Data analysis steps

  1. Set clearly defined goals for study

  2. Make a plan of what data to collect & how to collect it

  3. Collect the data

  4. Data summary & preliminary analysis

  5. Apply appropriate methods for formal data analysis

  6. Interpret info & draw conclusions

3
New cards

Observational units (cases)

the entity from which we observe and measure characteristics, horizontal in a dataset

4
New cards

Variables

the characteristics that have been measured or observed, vertical in a data set

5
New cards

Wide format

Repeated measurements or variables as distinct columns

6
New cards

Long format

Repeated measurements or variables as distinct column

7
New cards

Missing data is usually handled by….

listwise deletion, removes entire case/row from a dataset

8
New cards

Continuous variable

Quantitative NON-COUNTABLE variable that takes value in any interval, infinite number of values within given range. Ex. temperature of patients w/ flu, length, mass, price, time

9
New cards

Discrete variable

Quantitative COUNTABLE variable that takes value in a set interval. Ex. counts, age (if you take it in years)

10
New cards

Ordinal variable

Qualitative variable that characteristics have an order. Ex. do you agree scale 1-5

11
New cards

Nominal variable

Qualitative variable that characteristics have no order. Ex. blood type, movie ratings

12
New cards

Population of units vs population vs sample

Sample: subset of measurements from population that is actually collected

Population: set of all measurements or record of some qualitative trait corresponding to each unit in the collection of units

  • Census: look at every unit in the population

Population of units: the collection of units in which we have scientific interest—what we normally mean by like “population”

Ex. Population of units: all the undergrad students @cornell; population: height of all undergrad students @cornell; sample: height of undergrad students taking ILRST/STSCI2100—continuous

Ex. Population of units: all the trees in Sapsucker woods, population: species of all trees in sapsucker woods, sample: species of all trees within 100 feet of sapsucker woods pond—nominal

13
New cards

Sample size (n)

the number of observations in the sample

14
New cards

Types of bias (selection, measurement/response, nonresponse)

Selection bias: bias incurred when a sample systematically excludes some part of the population (ex. only picking the first few rows)

Measurement or response bias: bias incurred when a method of observation produces values diff from the true value of the observational unit (ex. thermometer giving wonky values)

Nonresponse bias: bias from when data is not collected from all observational units selected for the study

15
New cards

Observational study

A study that observes characteristics of an existing population.

16
New cards

Simple random sample

A sample selected in a way that gives every different sample of size n an equal chance of being selected.

17
New cards

Stratified sampling

Dividing a population into subgroups (strata) and then taking a separate random sample from each stratum.

18
New cards

Cluster sampling

Dividing a population into subgroups (clusters) and forming a sample by randomly selecting clusters and including all individuals or objects in the selected clusters in the sample.

19
New cards

1 in k systematic sampling

A sample selected from an ordered arrangement of a population by choosing a starting point at random from the first k individuals on the list and then selecting every kth individual thereafter.

20
New cards

Confounding variable

A variable that is related both to group membership and to the response variable.

21
New cards

Extraneous variable

A variable that is not an explanatory variable in the study but is thought to affect the response variable.

22
New cards

Direct control

Holding extraneous variables constant so that their effects are not confounded with those of the experimental conditions.

23
New cards

Blocking

Using extraneous variables to create groups that are similar with respect to those variables and then assigning treatments at random within each block, thereby filtering out the effect of the blocking variables.

24
New cards

Replication

A strategy for ensuring that there is an adequate number of observations on each experimental treatment.

25
New cards

Placebo treatment

A treatment that resembles the other treatments in an experiment in all apparent ways but that has no active ingredients.

26
New cards

Control group

A group that receives no treatment.

27
New cards

Single-blind experiment

An experiment in which the subjects do not know which treatment they received but the individuals measuring the response do know which treatment was received, or an experiment in which the subjects do know which treatment they received but the individuals measuring the response do not know which treatment was received.

28
New cards

Double-blind experiment

An experiment in which neither the subjects nor the individuals who measure the response know which treatment was received.

29
New cards

Simple Random Sample

Sampling method where 1) every set of n units of the population has same chance of being sampled and 2) selections are independent

typically generated in R with sample()

30
New cards

Sampling with/without replacement

Sampling without replacement: Selecting units so that once a unit is chosen it cannot be chosen again (common in SRS).

Sampling with replacement: Selecting units where chosen units are put back and can be selected again.

R example. SRS: sample(1:103, 11) picks 11 unique integers from 1 to 103 (SRS without replacement).

31
New cards

Pros/cons of SRS

Simple to analyze; equal chance for each unit; conceptually straightforward.

Can be costly or logistically hard; may still give high variability or unbalanced samples by chance.

32
New cards

Observational study

A study where the researcher does not control exposures/treatments; they observe naturally occurring conditions—harder to establish causality bc of confounding variables.

33
New cards

Experiment

Study where researcher applies treatments under controlled conditions & randomizes assignments to evaluate causal effects

34
New cards

Response variable

Main outcome measured in a study; dependent variable (ie. amount of bacteria)

35
New cards

Treatment (factor)

A condition applied in an experiment (ie. packaging conditions: a, b, c, d)

36
New cards

Experimental unit

The physical entity to which a treatment is assigned (the unit of randomization).