1/64
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Database
A collection of tables containing records (rows) and fields (columns).
Record/Entity
A group of related data (row).
Field/Element
A column in a table (e.g., name, address).
Schema
The logical structure of a database.
Subschema
A structure for part of the database.
Attribute
A column in a database table.
Relation
A set of related tables referencing one another.
DBMS
Software that enables user interaction with databases.
Primary Key
A unique identifier for each record in a table.
Foreign Key
A reference that links rows in one table to rows in another table.
NoSQL
A non-relational database (e.g., MongoDB, Cassandra, Redis).
SQL
A language for querying databases (e.g., SELECT, FROM, WHERE).
Physical Integrity
Ensures no physical damage to data.
Logical Integrity
Prevents unauthorized schema changes.
Element Integrity
Ensures data entries are unchanged by unauthorized users.
Auditability
Regular reviews of access and usage.
Access Control
Similar to OS controls, to restrict unauthorized access.
User Authentication
Verifies the identity of users accessing the database.
Availability
Ensures the data is available when needed.
Inherently Sensitive Data
Data that is sensitive by nature (e.g., passwords, location).
From Sensitive Sources
Data coming from confidential sources (e.g., confidential informants).
Declared Sensitive
Data marked as sensitive (e.g., classified documents).
Exact Data
Full database clone.
Bounds
Data within a defined range.
Negative Result
Data showing something is not in the database.
Probable Value
Disclosure of a value's probability.
Direct Inference
Inference from non-sensitive data.
Inference by Arithmetic
Calculating sensitive data from non-sensitive data.
Aggregation
Statistical disclosures.
Data Mining
Uses machine learning and statistics to discover patterns in large datasets.
Challenges
Mistakes in data, privacy issues, secure data storage, and real-time monitoring.
Privacy vs Security
Differential privacy aims to protect individual privacy in aggregated data.
Suppression
Remove or hide sensitive information.
Tracking
Monitor past queries to detect potential leaks.
Disguise Data
Conceal data to prevent inferences.
What is anonymized data?
Data that has been processed to remove or obscure personal identifiers.
What is the purpose of anonymized data?
To allow for research and analysis without compromising individual privacy.
What are some applications of anonymized data in healthcare?
Anonymized data in healthcare is used to improve patient outcomes and support medical research.
How is anonymized data used in public planning?
Anonymized data in public planning helps improve decision-making and resource allocation.
What are challenges in data anonymization?
Issues that arise when attempting to anonymize data.
What is a re-identification risk?
The risk of identifying individuals from anonymized data.
What case demonstrated re-identification risks in data anonymization?
Massachusetts Governor's Data case.
Which company's data was used to demonstrate re-identification risks?
Netflix Data.
What is k-Anonymity?
A property of anonymized data.
What does k-Anonymity ensure for restricted queries?
The result set contains at least k records.
What is the purpose of k-Anonymity?
To make it difficult to identify any individual.
What are quasi-identifiers?
Attributes that can partially identify a subject but do not uniquely identify them.
What can combining quasi-identifiers lead to?
Unique identification.
Examples of Quasi-identifiers
Common quasi-identifiers include Zip Code, City, Gender, and Race.
What is generalization in the context of data anonymization?
A technique used to make specific information less revealing.
How does generalization work with Zip Codes?
It converts exact Zip Codes into ranges.
Example of Generalization
Zip Codes: 46060, 46061 → 46**; Phone Numbers:
What is suppression in data reporting?
The process of removing information that cannot be generalized.
When is gender information suppressed?
When the number of men or women is fewer than k.
3-Anonymous Data
Data that has been anonymized to ensure that any attribute query returns at least three individuals, enhancing privacy.
Differential Privacy
A privacy standard that aims to reveal aggregate properties of a dataset while protecting individual data through the addition of noise.
Example of Differential Privacy
Off-by-one results are easy to generate, such as identifying the only male over 30 who dislikes football.
What is the first step in Differential Privacy?
Add probabilistic noise to the dataset for plausible deniability.
What is the second step in Differential Privacy?
Generate a release dataset based on the noisy data.
What is the third step in Differential Privacy?
Ensure the dataset changes sufficiently to protect future releases.
Probabilistic Inferences
Inferences about a subject's characteristics that can be made based on statistical data, such as the likelihood of male pattern baldness.
Perfect Deniability
The concept that some details, like specific personal health issues, may still be inferred even when deniability is provided.
Database Security Requirements
Essential elements for securing databases, including physical, logical, and element integrity, auditability, access control, user authentication, and availability.
Sensitive Data Disclosure
The various ways sensitive data can be inadvertently disclosed, highlighting the lack of a single solution for prevention.
Challenges in Data Mining & Big Data
Numerous open privacy and security challenges that arise in the context of data mining and big data analytics.