Data Science
Basic Workflow
ASK a question
Plan an analysis or model to answer the question
COLLECT the data
CLEAN and organize data
INTERPRET and communicate the results
Four important aspects of data sci:
Exploration - identifying patterns in data (visualization)
Inference - Using data to draw reliable conclusions about the world (statistics)
Prediction - Making informed guesses about unobserved data (Machine Learning)
Communication - How you present the results, what story can you craft? You would communicate differently depending on who you’re talking to.
Cause and Effect
Experiments and Observational studies:
Y = Response variable: The primary variable of interest
X = Explanatory variable(s) or factors: Other information about each subject that may explain why the subjects differ with respect to the response.
Data: Known cases where (X,Y) are observed
The objective is to use the data to predict new observations of Y using X. The data may be used to fit a model relating X and Y
Can get this data using either an experiment or an observational study
Observational: don’t have that level of control, just have to observe (duh)
Experiment: can set X’s or number of X’s to be whatever you want
Confounding variable: A confounding variable is a variable that s related to both X and Y. We day that the effect X has on Y cannot be separated from the effect of the confounding variable.
The presence of possible confounding variables makes drawing a conclusion of causality very difficult. Randomization (conducting an experiment) allows much greater control.
Interacting variables: A variable that interacts with the explanatory variable is a variable that changes the way the explanatory variable affects the response.
Meta-analysis combines results from multiple separate studies