1/17
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Q: What types of private data does data science often collect?
Medical records, census data, browser history, location data, social media data.
Q: What can be learned from credit card metadata?
Sensitive patterns about a person’s location, identity, and shopping behavior.
Q: What did the study by De Montjoye et al. (2015) find about credit card data?
Four spatiotemporal points can reidentify 90% of individuals.
Q: How does knowing transaction price affect reidentification risk?
Increases reidentification risk by 22% on average.
Q: Are women more or less reidentifiable than men in credit card metadata?
More reidentifiable.
Q: How can medical data combined with voter data breach privacy?
Linking datasets can reveal identities even if medical data is anonymized.
Q: What is one method to anonymize location data?
Coarsening latitude/longitude into larger regions like zipcodes.
Q: What is randomized response used for?
Protecting individual privacy in sensitive surveys.
Q: How does randomized response work?
Individuals answer truthfully or randomly according to a known probability to protect privacy.
Q: What inequality is linked to high-probability accuracy in randomized response?
Chebyshev’s inequality.
Q: What is differential privacy?
A method that ensures the addition or removal of a single data point doesn’t significantly affect outcomes.
Q: What does adversarial machine learning study?
How to secure ML models against attacks like input perturbations.
Q: What famous example shows adversarial ML vulnerabilities?
A panda image slightly perturbed to be misclassified by a neural network.
Q: What are three types of attacks on ML models?
Inversion, extraction, and data poisoning.
Q: What is inversion attack?
Reconstructing sensitive input data from model outputs.
Q: What is extraction attack?
Stealing model parameters or training data.
Q: What is data poisoning?
Maliciously injecting bad data into training to corrupt a model.
Q: What is federated learning?
Training models across decentralized devices without transferring raw data