1/13
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Donho argues that a “data science” framework is needed beyond classical statistics primarily because:
The real work of learning from data includes many essentials activities outside modeling, such as data cleaning, transforming, computing, communicating
Donho proposes bringing the Common Task Framework (CTF) into today’s statistical and data science training. Which of the following would be an example of that?
Designing a course project where all students work to answer the same question using the same data set.
Which pairing best describes one benefit and one limitation of the Common Task Framework (CTF)?
Benefit: provides objective performance measurement and enables cumulative progress in the field.
Limitation: can lead to overemphasis on leaderboard rankings rather than understanding the underlying problem or developing generalizable methods.
Data Exploration and Preparation
Using detective-like work to identify artifacts
Data Representation and Transformation
Restructuring the data or variables within the dataset to suit the needs of the project
Computing with Data
Knowledge of a number of programming languages/packages as well as approaches to computational efficiency
Data Modeling
Using inferential and predictive data to answer interesting questions
Data Visualization and Presentation
Generation of plots that help you explore your data as well as those that help you effectively communicate your feelings
Science about Data Science
Studying how data science is studied by others
Donho dicusesses the concept of “reproducibility” and “replicability” in data science. Why does he argue that computational reproducibility (being able to re-run someone’s code and get the same results) is important for data science as a science?
It allows others to verify results, understand exactly what was done, and build upon previous work
Donho says we should study the “science of data science” or how data science is actually done (tools, workflows; habits, error sources). Why does that matter for trust in results?
It measures which practices actually improve reliability (e.g., reproducibility, fewer mistakes), so we can build evidence-based standards and be more confident in conclusions.
Donho discusses the “Two Cultures” of data science: the generative/inferential culture that focuses on modeling for explanation and interpretation, and the predictive culture, which focuses on predicting outcomes.
Which of the following is an example of the generative culture of modeling for explanation/interpretation?
A study examining which factors (like income, education, and location) are most strongly associated with voter turnout.
Donho discusses the “Two Cultures” of data science: the generative/inferential culture that focuses on modeling for explanation and interpretation, and the predictive culture, which focuses on predicting outcomes.
Which of the following is an example of modeling for prediction?
A hospital using a model to predict which patients need surgery within 30 days.
What is Donoho’s main concern about the dominance of predictive culture in modern data science?
Predictive culture ignores the importance of understanding mechanisms and interpretability.