1/25
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Anonymity
A process of protecting the privacy and confidentiality of individuals or entities whose data is being used or shared
Data Privacy Considerations in Handling Personal Data
Defining Private Data
Preprocessing Data
Minimizing Disclosure Risk
Internal Data Risks
Ethical Data Analysis
Privacy-Preserving Data Publishing (PPDP)
A set of techniques, methods, and practices used to publish or share data while ensuring the privacy and confidentiality of individuals or entities whose data is included in the dataset
Common Privacy Protecting Methods
Data Suppression
Data Grouping
Perturbation
Bias Mitigation
Preprocessing analyses that detect and remove bias that might be present in the data
Data Suppression
Removing entire data instances (rows) or specific variables (columns) from a dataset
Explicit Identifiers
Pieces of information in a dataset that directly and
unequivocally identify individuals
Quasi-identifiers
Combine uniquely and can hence be used to
re-identify (or de-anonymize) persons
Data Grouping
Generalizing individual information by clustering the individual data points (instances) or at data categories (variables)
Values of continuous variables are grouped into categories
Values of nominal variables are grouped into higher-level concepts
Perturbation
Adds noise to data while preserving overall statistics and pattern
Concern with applying multiple data protection methods
Applying too much data protection methods results in a completely randomized dataset or a dataset where all data instances have the same general value for all remaining variables
K-Anonymity
A property of a dataset where for each combination of quasi identifiers in the dataset, there are at least k-1 other instances with the same value combination
Goal of K-Anonymity
Protect the privacy of individuals by ensuring that no individual can be distinguished from at least k- 1 other individuals based on their quasi-identifiers while also preserving the utility of the data for analysis purposes
Sensitive Attribute
Any attribute in a dataset that could potentially reveal sensitive or confidential information about individuals
Microaggregation
Grouping similar records and applying aggregation functions within these groups to create representative values
Microaggregation Process
Identify Quasi-Identifiers
Choose K for K-Anonymity
Sort the Data
Create Microgroups
Apply Aggregation Functions
Replace Original Data
Finalize the Dataset
Common Aggregation Functions
Mean (average) for numeric values
Median for dates and other ordered data
Mode for categorical data
Generalization for attributes like ZIP codes or age
Transforming a Dataset to a K-Anonymous Dataset
Automated k-anonymization
NP-hard (nondeterministic polynomial-time hard) problem for large datasets
Formalizing privacy guarantee
Issues with K-Anonymity
Homogeneity Attacks
Background Knowledge Attack
Linkage Attacks
Homogeneity Attacks
An attacker exploits the lack of diversity within a
K-Anonymous group
The lack of diversity makes it easier for the attacker to make educated guesses about the identities of individuals
Background Knowledge Attack
The attacker leverages external information or
background knowledge to re-identify individuals within K-Anonymous groups
Linkage Attacks
When an adversary has access to two datasets, with the sensitive variable present in both
Solution to Homogeneity Attacks
Limit Dataset Size
What is the problem with limiting data size to solve homogeneity attacks?
Reducing the dataset's size may result in a loss of valuable information
What extension of K-Anonymity deals with Homogeneity Attacks?
L-diversity
L-diversity
Promote diversity among sensitive values within each group of k (or more) indistinguishable data instances concerning quasi-identifiers
Anonymizes data and ensures that sensitive information within groups is sufficiently varied