Privacy Review

0.0(0)
studied byStudied by 2 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/61

flashcard set

Earn XP

Description and Tags

CSC 533 Privacy Final Exam Review

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

62 Terms

1
New cards

Privacy Attacks on Data

Attribute Inference

Property Inference

2
New cards

Kinds of Privacy Attacks on Model

Membership Inference
Model Extraction

3
New cards

Attribute Inference Attack/Model Inversion Attack

Given an output of a machine learning model, infer something about the input

Example: Given a patients dosage of a drug, infer something about their genome

4
New cards

Property Inference Attack

The ability to extract dataset properties which were not explicitly encoded as features or not correlated to the learning task

Information that the model learned unintentionally

Example: A classifier identifying gender can also be used to infer information about whether someone wears glasses

5
New cards

Meta Classifier

Predicts if the target model was trained on a dataset that has property P or not

6
New cards

Model Extraction Attack

Learning a close approximation of the model using as few queries as possible

Example: Logistic regression model has n+1 inputs and can be queries n+1 times and solve a system of linear equations

7
New cards

How many unknowns in a logistic regression model

n+1
W is a vector of size n
b is a scalar

8
New cards

Model Extraction Countermeasure

Only give the decision classifier and not the floating point result

9
New cards

Model Extraction Countermeasure Attack

Query until you find points on the decision boundary and linear search between the two points

10
New cards

Membership Inference

Determining if data was part of the training set which is possible since cloud models are overfitted.

Needs target model as a whitebox
Shadow model which has same inputs and outputs as target model
Attack model which takes in the classification distribution from the shadow model as an input and returns binary true false

11
New cards

Federated Learning

Server has an untrained model

Sends a copy of that model to the local nodes

Nodes train on their own data

Each node sends the trained model back to the server

The server combines them by taking an average

The server now has a general model

12
New cards

Federated Learning Benefits

Ever user has different data

Some users have more data then others

Distributed between many nodes

Limited Communication between nodes

13
New cards

SGD

Stochastic Gradient Descent

Updating the weights proportional to the error

14
New cards

DSSGD

Distributive Selective Stochastic Gradient Descent

Each node locally trains the model and computes weights

Select gradients to upload but doesn’t have to be all of them

Server averages uploaded weights and updates parameters for the next iteration

15
New cards

DSSGD Privacy Properties

Participants’ Data remains private

Full control over parameter selection

Known learning objective

Resulting model available to all parties

16
New cards

Secure Aggregation

Server aggregates users updates but doesn’t inspect the individual updates

17
New cards

Secure Aggregation Noise

Random noise of positive and negative pairs that cancel each other out and don’t influence the model

18
New cards

Differential Private Aggregation

The nodes each have their own noise and give it back to the server. Some utility lost as the model is impacted

19
New cards

Fairness through blindness

Ignore all irrelevant/protected attributes

Issue: You don’t need to see an attribute to be able to predict it

20
New cards

Statistical Parity

S: Protected Subset

Sc: Rest of population

Want Pr(Outcome | S) = Pr(Outcome | Sc)

21
New cards

Quantitative Input Influence (QII)

A technique for measuring the influence of an input of a system on its outputs

Replaces feature with random values from population and examines distribution over outcomes

22
New cards

k-Eidetic Memorization

If a string is extractable from the model and appears in at most k samples from the training data

Okay for large k when its like words

Bad when its an address or a name and k is small

23
New cards

How to Mitigate Privacy Leakage in LLMs

Train with differential privacy so that one entry or row doesn’t result in a significantly different model. They don’t memorize any single training sample

Curate training data from trusted sources

Deduplicate training data

24
New cards

HIPAA

Health Insurance Portability & Accountability Act

Establishment of nationwide protection for patient confidentiality

Fines range from 100 to 50k per incident

25
New cards

FERPA

Family Educational Rights and Privacy Act

Gives rights to students enrolled at an educational institution to inspect, review, and amend their records and control disclosure.

26
New cards

COPPA

Children Online Privacy Protection Act

Grants parents control over the information collected from children online and designed with consumer protection in mind

27
New cards

GDPR

General Data Protection Regulation (EU)

Providing uniform data protection regulations and is one of the highest standards of privacy and data protection in the world

28
New cards

CCPA

California Consumer Privacy Act

For profit entities who have 25m+ in revenue or 50K+ consumer PII or mainly sell consumer data.

Provides rights to request information and deletion of their data

29
New cards

OECD

Organization for Economic Co-operation and Development

Framework that dictates how data should be collected, limited, safeguarded and be transparent

Most commonly used privacy framework

30
New cards

APEC

Asia-Pacific Economic Cooperation

Similar to OECD but mainly for Asia-Pacific Region

31
New cards

NIST

National Institute of Standards and Technology

Identify, Govern, Control, Communicate, and Protect

32
New cards

IAPP

International Association of Privacy Professionals

Proactive not Reactive and privacy as the default

Privacy all the way through

33
New cards

Privacy Nutrition Label

Emulate nutrition label with what data, what purpose, and who is it being shared with

34
New cards

Privacy Rating Labels

Rate website on scales inspired by energy labels which show a rating compared to specific alternatives

35
New cards

Privacy Notice Timing

At setup

Just in time

Context-dependent (checkup)

Periodic (do you want to continue to allow this)

Persistent (Showing an icon the lifetime of the privacy notice)

On Demand (opting out through settings)

36
New cards

Privacy Notice Channel

Primary (Gives you a policy)

Secondary (Something else gives you an email)

Public (Sign or public notice)

37
New cards

Privacy Notice Modality

Visual

Auditory

Haptic

Machine Readable

38
New cards

Privacy Notice Control

Blocking (blocked by default)

Non blocking (allowed by default)

Decoupled (relies on a third party setting)

39
New cards

Platform for Privacy Preferences Project (P3P)

AN easy way for websites to communicate about their privacy policies in a standard machine readable format

40
New cards

Labelling Privacy Practices (Food Label)

Shows the types of data collected, general data collection practices and the sharing practices.

Each policy received an evaluation of YES, NO, or UNCLEAR

41
New cards

Terms of Service Didn’t Read

Tos;DR

Terms are divided into small points

Each point gets assigned one or several topics

Topics are then scored

42
New cards

Privee Privacy Extension

Policy Rating Extension

Uses NLP techniques to find the presence or absence of topics

43
New cards

Automated Policy Analysis

Extract websites data practices through natural language processing and machine learning

44
New cards

Privacy Policy Annotation Tool

Segment policy broken into paragraphs

Paragraphs then categorized

Goes through and asks questions on whether a paragraph does something or not

45
New cards

Westin’s Privacy Index Survey

Asks 3 questions which people either agree or disagree with

1) Consumers have lost all control over how personal information is collected and used

2) Most businesses handle the personal information they collect in a proper and confidential way

3) Existing laws and organizational practices provide a reasonable level of protection for consumer privacy

46
New cards

Westin’s Privacy Segmentation

Fundamentalist: Consumers lost control, most businesses don’t care about consumers, and existing laws are not enough

Unconcerned: Consumers haven’t lost control, businesses care, and existing laws are enough

Pragmatist: Anyone else

47
New cards

Factors that increase privacy concerns

Data aggregation

Data distortion

Data sharing

Data breaches

48
New cards

Factors that reduce privacy concerns

Privacy policy, License agreements

Privacy Laws

Anonymizing all data

Technical Details

Details on usage

49
New cards

Distributed Ledger

Book of all transactions where the one with the most pages is deemed accurate

50
New cards

Blockchain

Linked list with has pointers is literally a chain of blocks that relies on the previous one to show work

51
New cards

Deanonymization Attack

Transaction graph and some side channel information that can be used to link pseudonyms to real identities with blockchain transactions

52
New cards

Multiple Input Transactions (Conjoining)

Having multiple senders and receivers in a single block so nobody knows where it went exactly

53
New cards

Zcash

Uses 0 knowledge proofs but has issues with a trusted setup

54
New cards

Remote Device Identification

Works by taking a look at clocks on a machine and looking at the differences between them.

Can identify machines even after they change location or ISP

55
New cards

Website Fingerprinting

A tracking method that creates a unique digital profile of a user based on their browser and device's configuration, such as screen size, operating system, fonts, and installed plugins

56
New cards

kNN Fingerprinting

Uses KNN and packet information that tunes weight and difference calculations to determine a user when they visit a website. Results in 90-95% accuracy

57
New cards

CUMUL Fingerprinting

A website fingerprinting attack that uses the cumulative packet size of a data flow to identify the content of encrypted web traffic

90-93% accurate

58
New cards

K fingerprinting

Uses random forest to identify users data even when encrypted.

Next performs knn on that fingerprint compared to known fingerprints

90% accurate on onion services

59
New cards

Site level feature analysis

Sites have features that make them identifiable between themselves such as number of links, fonts, videos etc

60
New cards

Website Fingerprinting Countermeasures

Network Layer

  • Add padding

  • Add latency

  • Make packets look similar

Page Design

  • Small size

  • Dynamic Pages

61
New cards

Side Channel Attack

Any attack based on information gained from physical implementation of a system rather than a weakness in an algorithm

Unintentional leakage

62
New cards

Acoustic Side Channel Attack

Using frequencies or other noises from a physical device like a keyboard or microphone to interact with a system or make unintentionally deductions.

What key is pressed on a keyboard or talking to a smart device in a frequency humans can’t hear