Secure Database Design
Unique Identifiers
Such as primary keys
Are unique and easily identifiable
UUID is the universal standard
UUID
UUID - Universally Unique IDentifier
128-Bit Integer
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx (Hex format)
M indicates the UUID version N indicates the variant
Versions:
UUID version 1: Timestamp + MAC address of the node.
f81d4fae-7dec-11d0-a765-00a0c91e6bf6
UUID version 3: sequentially UUID + name, hashed using MD5
Very popular in legacy models; MD5 broken, modern systems avoid this
UUID version 5: version 3 with SHA1 hashes
Mostly avoided to avoid MD5 repeat, SHA1 if it ever became broken.
UUID version 4: 122 random bits.
Pros: simple, widely scripted, no info to leak.
Cons: zero structure, pseudo randomness, randomness must be secure.
UUID version 7: Combines Unix timestamp (48 bits) with random bits (74 bits).
Very common for modern systems, as it mixes the best of both worlds from version 1 and version 4.
UUID version 8: Custom implementation; 122 of 128 could be used for anything and come from anywhere; non common for obvious reasons
Hashing
One-way function that takes arbitrary size input, and “random” fixed-length output
Hashing algorithms must be deterministic - know what will happen before running the algorithm
Must be destructive to input; the original data cannot be retrieved by working backwards from hash, ensuring confidentiality and integrity.
Types of Hashes
Cryptographic Hashes
Low chance of collision
2 inputs unlikely to produce the same output
Small changes to input drastically change the output
Fuzzy Hashes
Small changes in the input have small changes in the output
Can measure the similarity of different input
More then these 2 but these are most common
When to Hash
When you store passwords or verification data
Do note store passwords or other data used for verification purposes in plain text, store with hash instead
Take input, hash, compare to stored hash
When verifying the integrity of other data
Is sensitive data likely to be modified
Hashing digital signatures
Want to ensure major changes go through an approval process
Store fuzzy hash
Adding Salt
Salt: random bit of data appended to input before the hashing algorithm
Prevents issue of comparing the pre-generated hashes
Unique Salts
Modern approach to use unique salt value for every entry.
Store this plain text in database for each entry
Pros:
No pre-computed hashes
No issue with the same passwords being reused
Cons
Takes up more space in the database
Encryption
Process of disguising input (plaintext) as unreadable output (ciphertext)
Encryption is reversable
Data at rest should be encrypted
Encrypt disk database is stored on
Encrypt files data is stored in
Encrypt data inside files
Types of Encryption
Symmetric
Encrypts data, one key that is shared to encrypt and decrypt data
AES, ChaCha20
Asymmetric
Encrypts data, uses a pair of keys; one public key for encryption and one private key for decryption; keys freely shared
RSA, ECC
Which is better
Symmetric
More performant
Doesn’t require recipient public key to be available
Asymmetric
Relies less on trust
You don’t have to pre-exchange keys
Shadowing
Refer to different things depending on context
Confidentiality: Securing sensitive data separately from main database
Availability: Data that exists, but not formally tracked
Integrity: Creating copies of data to verify changes
Confidential Data Shadowing
One database for storing general data and another for storing sensitive data
The sensitive database has additional monitoring and stricter polices
/etc/passwd - basic user data, most users can see
/etc/shadow - stores user password, root only
Integrity Data Shadowing
Real-time duplication of data
Pros:
Provides backup option from shadowed database
Verify transactions, valid interactions will update both
Cons:
Doubles resource usage
Increase overhead and complexity
Widens the attack surface
Availability Data Shadowing
Rogue data generated that is untracked
Is not easily available to audit
Auditing useful for debugging
Auditing is useful monitoring
Is not being monitored for data leaks
What is in that rogue data?
Soft Deletion
Deleting data takes time and can fragment data and memory
Soft deletion - don’t delete data, instead identify it as deleted
Looks like adding a column for when the data was deleted