SI unit 2

1. Define “data science,” “data/information analysis,” and “big data”

Q: How does Field Cady define data science?
A: A process-driven discipline: framing problems → understanding messy data → extracting features → modeling → presenting/deploying results. Distinct from software engineering because it is iterative and exploratory. At its core, it is turning observations into knowledge.
Source: The Data Science Road Map – Field Cady

Q: How does Max Shron describe data analysis vs. statistics?
A: Data analysis is inductive, predictive, and designed to generate new knowledge from messy real-world data. Statistics is deductive and largely descriptive. Data analysis transforms insights into actionable meaning for decisions.
Source: Thinking with Data – Max Shron

Q: What is big data according to Tiell & O’Connor?
A: More than “large datasets.” Big data involves volume (scale), velocity (real-time generation), and variety (different structures like text, social relationships, sensors). It creates risks of bias amplification, consent violations, and inequity.
Source: Building Digital Trust – Tiell & O’Connor

Q: What historical example shows that “big data” is not new?
A: The 1890 U.S. Census used Hollerith punch cards (early big data processing).
Source: 50 Years of Data Science – David Donoho


2. Characterize what a data scientist does & the history of the field

Q: What are the two main types of deliverables data scientists produce?
A: (1) For humans – reports, visualizations, presentations for decision-making.
(2) For machines – production code/models for automation, batch analytics, or real-time systems.
Source: The Data Science Road Map – Field Cady

Q: What key skills should a data scientist have?
A: Technical: programming, quantitative analysis, feature engineering, modeling, visualization.
Soft: scoping problems, communication, storytelling, teamwork, product intuition.
Source: The Data Science Road Map – Field Cady; Lecture notes

Q: What is the “snarky definition” of a data scientist from consensus curricula?
A: “A data scientist is better at stats than any software engineer, and better at software engineering than any statistician.”
Source: 50 Years of Data Science – David Donoho

Q: How did John Tukey (1962) shape the history of data science?
A: He argued that “data analysis” should be its own science, with workflows, visualization, and empirical validation.
Source: 50 Years of Data Science – David Donoho

Q: How did Leo Breiman’s “Two Cultures” paper (2001) influence data science?
A: He distinguished between generative modeling (traditional statistics) and predictive modeling (machine learning), arguing that statistics ignored prediction.
Source: 50 Years of Data Science – David Donoho


3. Main considerations for any data science project

Q: What is Max Shron’s main warning about starting data projects?
A: Don’t start with a dataset or tools. First scope the project using CoNVO: Context, Need, Vision, Outcome.
Source: Thinking with Data – Max Shron

Q: What are the key stages of the data science roadmap?
A: Frame the problem → Understand the data → Extract features → Model → Present results or Deploy code.
Source: The Data Science Road Map – Field Cady

Q: Why is iteration essential in data science?
A: Unknowns in data/features/models require constant back-and-forth. Early results and refinements save time and help avoid sunk costs.
Source: The Data Science Road Map – Field Cady

Q: What tools does Shron suggest for aligning teams early?
A: Mockups (visual previews of results) and argument sketches (reasoning outlines). These clarify expectations and sharpen the project scope.
Source: Thinking with Data – Max Shron

Q: Why is communication central in data science?
A: Because results are for mixed audiences (business, technical, policy). Clear storytelling is needed to overcome bias and intuition.
Source: The Data Science Road Map – Field Cady; Thinking with Data – Max Shron


4. Ethical considerations of data science

Q: What do Tiell & O’Connor mean by “digital trust”?
A: The belief that an organization is safe, transparent, reliable, and truthful in its data practices. It is hard to build but easy to lose.
Source: Building Digital Trust – Tiell & O’Connor

Q: Why do they argue ethics now extend “beyond cybersecurity”?
A: Risks today include bias, consent violations, and unfair outcomes, not just technical breaches. Ethics must be integrated across the data supply chain.
Source: Building Digital Trust – Tiell & O’Connor

Q: What principles are included in a new code of data ethics?
A:

  • Respect people behind the data

  • Consider downstream uses

  • Ensure transparency & accountability

  • Treat compliance as the floor, not the ceiling
    Source: Building Digital Trust – Tiell & O’Connor

Q: What example illustrates harm from unethical data use?
A: A dating app amplified racial/ethnic bias in its algorithm to drive engagement, scaling social harm.
Source: Building Digital Trust – Tiell & O’Connor

1. Define “machine learning,” “artificial intelligence,” and “algorithms”

Q: What is an algorithm?
A: A set of instructions or rules to follow to complete a task. Examples: recipes, GPS routes, spam filters.
Source: Notes on Machine Learning, AI, and Algorithms (SI110)

Q: What is machine learning (ML)?
A: A subset of AI where algorithms learn from data to make predictions/classifications, improving automatically with experience.
Source: Notes on Machine Learning, AI, and Algorithms (SI110); Machine Learning for Everyone

Q: What is artificial intelligence (AI)?
A: The broad field of replicating human-like behaviors (planning, learning, reasoning, perception). Includes Narrow AI, AGI, and potential superintelligence.
Source: Notes on Machine Learning, AI, and Algorithms (SI110)


2. Benefits and concerns of AI/ML

Q: What are the benefits of AI/ML?
A: Efficiency, pattern recognition, automation, personalization. Examples: fraud detection, healthcare diagnostics, personalized learning, autonomous driving.
Source: Notes on Machine Learning, AI, and Algorithms (SI110)

Q: What are the main concerns of AI/ML?
A: Bias (racial, gender, class), privacy violations, job loss, deepfakes/misinformation, weaponization, lack of accountability, surveillance.
Source: Notes on Machine Learning, AI, and Algorithms (SI110); Biases in AI Systems – Srinivasan & Chander

Q: How do Srinivasan & Chander classify AI bias?
A: Bias can occur at all stages: data creation (sampling/labeling), problem formulation, analysis (proxies, confounding), and evaluation/validation.
Source: Biases in AI Systems – Srinivasan & Chander (2021)

Q: How does Safiya Noble argue algorithms can reproduce inequality?
A: Search engines are ad-driven, not neutral. They amplify racism and sexism (e.g., pornification of “Black girls” search results; radicalization pathways).
Source: Algorithms of Oppression – Safiya Noble


3. Different forms of information visualization & why you’d use them

Q: What is information visualization?
A: Computer-supported visual representations of abstract data to amplify cognition (e.g., dashboards, scatterplots).
Source: Information Visualization Lecture Notes

Q: What are the key functions of visualization?
A: To record, analyze, aid memory, communicate/persuade.
Source: Information Visualization Lecture Notes

Q: What is the difference between information visualization and infographics?
A: Visualizations are for analysis/interaction; infographics are for storytelling and communication (often simplified).
Source: Information Visualization Lecture Notes

Q: Why might visualization matter more than statistics alone?
A: Example: Anscombe’s Quartet – datasets look identical statistically but appear very different when graphed.
Source: Information Visualization Lecture Notes

Q: What are “aesthetics” in visualization?
A: Visual properties like position, size, shape, color that encode data values.
Source: Visualizing Data: Mapping Data onto Aesthetics – SI Textbook


4. Basics of prompt engineering

Q: What is prompt engineering?
A: The process of refining inputs to generative AI so outputs are useful, accurate, and aligned.
Source: AI Made Simple (Kapur)

Q: What are some strategies for effective prompting?
A: Be specific and clear; provide context; use step-by-step instructions; rephrase when needed; avoid bias; set constraints.
Source: AI Made Simple (Kapur)

Q: What is prompt chaining?
A: Iteratively refining prompts (using follow-ups) to guide the AI toward a desired result.
Source: AI Made Simple (Kapur)


5. Making data visualizations more accessible

Q: What is the equity issue in data visualization?
A: Most visualizations assume able-bodied users, excluding people with visual, cognitive, or motor disabilities — affecting >1B people globally.
Source: Inclusive Data Visualization for People with Disabilities: A Call to Action – Marriott et al. (2021)

Q: What are barriers for users with visual impairments?
A: Limited alt text, lack of tactile/sonification tools; current solutions (like SAS Graphics Accelerator) often incomplete or costly.
Source: Inclusive Data Visualization… – Marriott et al.

Q: What are barriers for users with cognitive/learning disabilities?
A: Struggles with abstraction and symbolic conventions; suggested fixes include chunking, coupling visuals with text, and guided exploration.
Source: Inclusive Data Visualization… – Marriott et al.

Q: What are barriers for users with motor disabilities?
A: Difficulty with interactive features like zoom or lassoing; alternatives include eye tracking, speech input, or custom timing controls.
Source: Inclusive Data Visualization… – Marriott et al.

Q: What is the main call to action in Marriott et al.’s article?
A: Build multimodal, evidence-based, inclusive visualization tools; involve disability communities; treat accessibility as central to design.
Source: Inclusive Data Visualization… – Marriott et al. (2021)

1. Characterize arguments on the benefits and drawbacks of social media to the individual

Q: What benefits of social media are highlighted in Social Media for Public Health (Jafar et al., 2023)?
A: Provides direct access to health info, peer support, and community; enhances patient–provider connection; enables rapid alerts in crises; reduces isolation; and fosters positive mental health interactions.
Source: Jafar et al., Social Media for Public Health (2023)

Q: What drawbacks of social media are identified in Jafar et al. (2023)?
A: Amplifies misinformation, polarizes users, harms youth mental health (anxiety, depression, body image issues), fuels overuse, promotes unsafe self-diagnosis, and exposes teens to cyberbullying and harmful challenges.
Source: Jafar et al., Social Media for Public Health (2023)

Q: What does Khalaf et al. (2023) find about social media and adolescent mental health?
A: Dual role: builds social connection, access to resources, and identity expression; but correlates with depression, anxiety, sleep disruption, body dissatisfaction, and cyberbullying. Effects depend on individual vulnerabilities.
Source: Khalaf et al., Impact of Social Media on Mental Health of Adolescents and Young Adults (2023)

Q: According to Owens & Lenhart (2020), what myths distort how we think about social media’s impact?
A: Myths like “social media is addictive,” “tech companies can fix harms with tech,” and “less screen time = healthier” oversimplify experiences and ignore structural inequities.
Source: Owens & Lenhart, Good Intentions, Bad Inventions (2020)

Q: How does Sherry Turkle describe the benefits of social media for teens?
A: Provides connection, intimacy, collaborative identity, and new ways to manage fear/loneliness; allows identity play via avatars and profiles.
Source: Turkle, Adolescents & Social Media (lecture/reading notes)

Q: What are the drawbacks Turkle emphasizes?
A: Fragile identities dependent on validation, anxiety from constant texting, risks like texting while driving, reduced privacy, parental surveillance, and exhaustion from self-curation.
Source: Turkle, Adolescents & Social Media


2. Define “social media,” “social capital,” and “the networked self”

Q: Define social media.
A: Platforms that enable user-generated content, sharing, and direct user-to-user interaction (e.g., Facebook, TikTok, Reddit). Also serve as stages for identity performance and self-presentation.
Source: Lampe lecture; Turkle, Adolescents & Social Media

Q: Define social capital in the context of social media.
A: The resources and benefits individuals gain through online networks, such as support, validation, and status. Includes both strong-tie support and weak-tie informational access.
Source: Lampe lecture; Turkle, Adolescents & Social Media

Q: Define the networked self.
A: Identity shaped through constant digital connectivity and peer validation. Adolescents co-create selfhood in dialogue with others, often fragmented across multiple profiles.
Source: Papacharissi (concept), reinforced in Turkle readings


3. Characterize Lampe’s view on the two sides of the social media “debate”

Q: How does Lampe frame the “negative narrative” about social media?
A: Critics claim social media replaces authentic intimacy with shallow ties, creates envy/anxiety, and fosters harassment. These echo historical moral panics (jazz, telephones, writing).
Source: Lampe, Social Media Is Good for You

Q: What benefits of social media does Lampe highlight?
A: Builds social capital, strengthens weak ties, enables grassroots organizing, lowers mobilization costs, fosters lightweight but meaningful “social grooming,” and empowers users as their own media.
Source: Lampe, Social Media Is Good for You

Q: What is Lampe’s overall stance in the debate?
A: Social media is not inherently good or bad—it’s what we make of it. With media literacy, intentional design, and user responsibility, it can be a force for positive connection.
Source: Lampe, Social Media Is Good for You

Q: How do Khalaf et al. (2023) echo Lampe’s framing of the debate?
A: They emphasize a balanced view: social media can build connection and resilience but also creates risks. Outcomes depend on context, use, and individual vulnerabilities—not a simple “good vs bad.”
Source: Khalaf et al., Impact of Social Media on Mental Health

Q: How does Turkle’s perspective connect to Lampe’s debate framing?
A: Like Lampe, she rejects binary thinking: technology affords both benefits (connection, visibility, identity play) and drawbacks (anxiety, fragile selves, surveillance). The impact depends on cultural context and use.
Source: Turkle, Adolescents & Social Media