data200 midterm 2 - data anonymity

0.0(0)
studied byStudied by 14 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/25

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

26 Terms

1
New cards

Data Anonymity

A process of protecting the privacy and confidentiality of individuals or entities whose data is being used or shared

2
New cards

Data Privacy Considerations in Handling Personal Data

Defining Private Data

Preprocessing Data

Minimizing Disclosure Risk

Internal Data Risks

Ethical Data Analysis

3
New cards

Privacy-Preserving Data Publishing (PPDP)

A set of techniques, methods, and practices used to publish or share data while ensuring the privacy and confidentiality of individuals or entities whose data is included in the dataset

4
New cards

Common Privacy Protecting Methods

Data Suppression

Data Grouping

Perturbation

5
New cards

Bias Mitigation

Preprocessing analyses that detect and remove bias that might be present in the data

6
New cards

Data Suppression

Removing entire data instances (rows) or specific variables (columns) from a dataset

7
New cards

Explicit Identifiers

Pieces of information in a dataset that directly and
unequivocally identify individuals

8
New cards

Quasi-identifiers

Combine uniquely and can hence be used to
re-identify (or de-anonymize) persons

9
New cards

Data Grouping

Generalizing individual information by clustering the individual data points (instances) or at data categories (variables)

Values of continuous variables are grouped into categories

Values of nominal variables are grouped into higher-level concepts

10
New cards

Perturbation

Adds noise to data while preserving overall statistics and pattern

11
New cards

Concern with applying multiple data protection methods

Applying too much data protection methods results in a completely randomized dataset or a dataset where all data instances have the same general value for all remaining variables

12
New cards

K-Anonymity

A property of a dataset where for each combination of quasi identifiers in the dataset, there are at least k-1 other instances with the same value combination

13
New cards

Goal of K-Anonymity

Protect the privacy of individuals by ensuring that no individual can be distinguished from at least k- 1 other individuals based on their quasi-identifiers while also preserving the utility of the data for analysis purposes

14
New cards

Sensitive Attribute

Any attribute in a dataset that could potentially reveal sensitive or confidential information about individuals

15
New cards

Microaggregation

Grouping similar records and applying aggregation functions within these groups to create representative values

16
New cards

Microaggregation Process

Identify Quasi-Identifiers

Choose K for K-Anonymity

Sort the Data

Create Microgroups

Apply Aggregation Functions

Replace Original Data

Finalize the Dataset

17
New cards

Common Aggregation Functions

Mean (average) for numeric values

Median for dates and other ordered data

Mode for categorical data

Generalization for attributes like ZIP codes or age

18
New cards

Transforming a Dataset to a K-Anonymous Dataset

Automated k-anonymization

NP-hard (nondeterministic polynomial-time hard) problem for large datasets

Formalizing privacy guarantee

19
New cards

Issues with K-Anonymity

Homogeneity Attacks

Background Knowledge Attack

Linkage Attacks

20
New cards

Homogeneity Attacks

An attacker exploits the lack of diversity within a
K-Anonymous group

The lack of diversity makes it easier for the attacker to make educated guesses about the identities of individuals

21
New cards

Background Knowledge Attack

The attacker leverages external information or
background knowledge to re-identify individuals within K-Anonymous groups

22
New cards

Linkage Attacks

When an adversary has access to two datasets, with the sensitive variable present in both

23
New cards

Solution to Homogeneity Attacks

Limit Dataset Size

24
New cards

What is the problem with limiting data size to solve homogeneity attacks?

Reducing the dataset's size may result in a loss of valuable information

25
New cards

What extension of K-Anonymity deals with Homogeneity Attacks?

L-diversity

26
New cards

L-diversity

Promote diversity among sensitive values within each group of k (or more) indistinguishable data instances concerning quasi-identifiers

Anonymizes data and ensures that sensitive information within groups is sufficiently varied