data200 midterm 2 - data anonymity

0.0(0)

Studied by 14 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/25

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

26 Terms

New cards

Data Anonymity

A process of protecting the privacy and confidentiality of individuals or entities whose data is being used or shared

New cards

Data Privacy Considerations in Handling Personal Data

Defining Private Data

Preprocessing Data

Minimizing Disclosure Risk

Internal Data Risks

Ethical Data Analysis

New cards

Privacy-Preserving Data Publishing (PPDP)

A set of techniques, methods, and practices used to publish or share data while ensuring the privacy and confidentiality of individuals or entities whose data is included in the dataset

New cards

Common Privacy Protecting Methods

Data Suppression

Data Grouping

Perturbation

New cards

Bias Mitigation

Preprocessing analyses that detect and remove bias that might be present in the data

New cards

Data Suppression

Removing entire data instances (rows) or specific variables (columns) from a dataset

New cards

Explicit Identifiers

Pieces of information in a dataset that directly and
unequivocally identify individuals

New cards

Quasi-identifiers

Combine uniquely and can hence be used to
re-identify (or de-anonymize) persons

New cards

Data Grouping

Generalizing individual information by clustering the individual data points (instances) or at data categories (variables)

Values of continuous variables are grouped into categories

Values of nominal variables are grouped into higher-level concepts

New cards

Perturbation

Adds noise to data while preserving overall statistics and pattern

New cards

Concern with applying multiple data protection methods

Applying too much data protection methods results in a completely randomized dataset or a dataset where all data instances have the same general value for all remaining variables

New cards

K-Anonymity

A property of a dataset where for each combination of quasi identifiers in the dataset, there are at least k-1 other instances with the same value combination

New cards

Goal of K-Anonymity

Protect the privacy of individuals by ensuring that no individual can be distinguished from at least k- 1 other individuals based on their quasi-identifiers while also preserving the utility of the data for analysis purposes

New cards

Sensitive Attribute

Any attribute in a dataset that could potentially reveal sensitive or confidential information about individuals

New cards

Microaggregation

Grouping similar records and applying aggregation functions within these groups to create representative values

New cards

Microaggregation Process

Identify Quasi-Identifiers

Choose K for K-Anonymity

Sort the Data

Create Microgroups

Apply Aggregation Functions

Replace Original Data

Finalize the Dataset

New cards

Common Aggregation Functions

Mean (average) for numeric values

Median for dates and other ordered data

Mode for categorical data

Generalization for attributes like ZIP codes or age

New cards

Transforming a Dataset to a K-Anonymous Dataset

Automated k-anonymization

NP-hard (nondeterministic polynomial-time hard) problem for large datasets

Formalizing privacy guarantee

New cards

Issues with K-Anonymity

Homogeneity Attacks

Background Knowledge Attack

Linkage Attacks

New cards

Homogeneity Attacks

An attacker exploits the lack of diversity within a
K-Anonymous group

The lack of diversity makes it easier for the attacker to make educated guesses about the identities of individuals

New cards

Background Knowledge Attack

The attacker leverages external information or
background knowledge to re-identify individuals within K-Anonymous groups

New cards

Linkage Attacks

When an adversary has access to two datasets, with the sensitive variable present in both

New cards

Solution to Homogeneity Attacks

Limit Dataset Size

New cards

What is the problem with limiting data size to solve homogeneity attacks?

Reducing the dataset's size may result in a loss of valuable information

New cards

What extension of K-Anonymity deals with Homogeneity Attacks?

L-diversity

New cards

L-diversity

Promote diversity among sensitive values within each group of k (or more) indistinguishable data instances concerning quasi-identifiers

Anonymizes data and ensures that sensitive information within groups is sufficiently varied