1/57
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Database
Structured collection of data, stored electronically in a computer system
Controlled by a database management system (DBSM)
Flat file database
Contains individual, unrelated tables (collections of data)
e.g., excel
Relational database
Contains multiple tables that share related fields (ID, customer)
Use structured query language (SQL) to store, manipulate and retrieve data
e.g., SQLite
Data v
Raw, unorganised facts that need to be processed
Info
Data that is processed, organised, structured or presented in a given context to make it useful
SQL data types
INTEGER - whole
REAL - decimal
Boolean - true/false
TEXT - string
date
TEXT as ISO8601 (“yyyy-mm-dd hh:mm:ss.sss”)
INTEGER as Unix Time (number of seconds since 1970-01-01 00:00:00 UTC
Relationships
Associations or connections between tables in a relational database
Help organise and structure data, allowing for efficient data retrieval and maintaining data integrity
Related table
Tables that share a relationship
Established using foreign keys, which are columns in a table that refer to the primary key in another table
One-to-one
Each record in Table A is associated with one and only one record in Table B, and vice versa
e.g., Tables ‘users’ and ‘user_profiles’, where each user has a single corresponding profile
One-to-many
Each record in Table A can be associated with multiple records in Table B, but each record in Table B is associated with only one record in Table A
e.g., Tables ‘departments’ and ‘employees’, where each department can have multiple employees, but each employee belongs to one department
Many-to-many
Each record in Table A can be associated with multiple records in Table B, and vice versa
e.g., Tables ‘students’ and ‘courses’, where each student can enroll in multiple courses, and each course can have multiple students
Data anomalies
Inconsistency in data resulting from an operation like an update, insertion or deletion
Normalisation
To reduce the chance of creating data anomalies when modifying database
Minimise or eliminate redundancy
Redundancy
Several, unrelated copies of the same data in the database
Update anomaly
Results from data redundancy and partial update
Delete anomaly
Unintended loss of data due to deletion of other data
Insert anomaly
Inability to add data to the database due to the absence of other data
Composite keys
Creating a primary key by combining more than one attribute
Constraints
Used to limit the type of data that can go into a table
Ensures accuracy and reliability of data
Specified when table is created
e.g., primary key, autoincrement, not null, unique
Data integrity/quality
Accuracy, completeness and quality of data as it’s maintained over time and across formats
e.g., currency, authenticity, relevance, accuracy, outliers
Entity integrity
Feature of relational DB
Each entity (row) must have a unique and non-null PK
Ensures that data can’t be listed multiple times
Domain integrity
Enforced by types and constraints
Ensures the accuracy of values of data within a domain (entity)
Referential integrity
Relationship between tables
Each table in a DB must have a PK which can appear in other tables
Relationship between FK leads to hierarchy of dependencies
Updating or deleting rows can break this
To prevent use ON UPDATE/DELETE CASCADE/SET NULL
Currency
Accurate, up to date and relevant
Authenticity
Trustworthy info collected from real life events
Relevance
Degree to which data provides insight into the problem or purpose being addressed
Accuracy
Percentage of errors across the total number of records
Outliers
Data point that differs significantly from other observations that can distort statistical results
Accuracy vs validity
Both seek to describe the quality of usability of a data set
Accuracy - how well the data corresponds to the real world or true value
Validity - how well data values are consistent, based on defined rules
Referential integrity
Making sure that the relationships between tables remain valid, even if data is modified or deleted
Foreign keys join tables and establish dependencies - if change or delete row in one table, meaning of rows in other table is destroyed
Aggregate function
Performs a calculation on a set of values and returns a single value
e.g., COUNT, SUM AVG, MAX, MIN
Aliases
Give table or column in table a temporary name, making them more readable
Use AS
Concatenated fields
Created by combining the values of multiple fields using the | | operator, often with additional characters for formatting
Calculated fiedls
Columns of a table whose values are a function of other columns in the same row
Open database connectivity (ODBC)
Open standard API used for accessing a database
Allows programs to use SQL requests that access databases without knowing the proprietary interfaces to the databases
Enables a developer to cover and effectively work with many database systems whereby for relational database it enables a developer to make connections with tabular data in ODBC server
ACID
Ensures reliable and consistent database transactions
Atomicity, consistency, isolation, durability
Role of ACID
Provide a mechanism to ensure the correctness and consistency of a database in a way such that each transaction is a group of operations that acts as a single unit, produces consistent results, acts in isolation from other operations, and updates that it makes are durably stored
Atomicity
Entire transaction takes place at once or doesn’t happen at all
Consistency
Database must be consistent before and after the transaction
Isolation
Multiple transactions occur independently without interference
Durability
The changes of a successful transaction occurs even if the system failure occurs
Normalisation
Process of organising data in a database
Creating tables and establishing relationships between them according to rules designed to protect data and make it more flexible
Eliminates redundancy and inconsistent dependency
Form of a relation
No repeated attributes
All cells must be atomic (only contain single piece of data)
1NF
In the form of a relation
Each record must have a primary or composite key
2NF
In 1NF
Have no partial dependencies - when a non-key attribute is only dependent on part of the composite key
3NF
In 2NF
Have no transitive dependencies - when a non-key field is dependent on a field other than the primary key
Data quality
Assesses whether information can serve its purpose in a particular context
Cleaning outliers
Deleting the outlier values, transforming the data or treating them as missing values
Ethics in data collection
Entities in DBs are people which contain their personal info
Important to obtained informed consent for collection, collect only necessary info and ensure data is secured
Privacy concerns
As DBs store personal info, they are attractive targets for cyberattacks
Protecting data privacy is crucial to maintain trust and comply with regs
Appropriate use of data
DBs enable data analysis and decision making based on store info (can lead to discrimination)
Personal info datasets are often ‘on sold’ without knowledge or consent of people whose info that’s sold
Reliability of data sources
DBs often aggregate data from various sources
Ensure the reliability of these sources is crucial for data quality and validity of subsequent analyses or decisions
Acknowledgement of data sources
DBs often contain data complied or curated by others
Acknowledging data sources is important for respecting IP rights, ensuring transparency and enabling reproducibility of research
Use of data mining
DBs are the raw material for data mining algorithms, which extract patterns and insights from large datasets
Can be used for various purposes (e.g., market research, scientific discovery)
Keep personal data private practices
Store personal data in encrypted form
Collect only min necessary data
Implement strict access controls based on roles and responsibilities
Define clear policies for how long it’s retained and how to dispose of it
Backups of organisational data practices
Schedule regular backups
Store it in multiple, separate locations
Regularly test the restoration process to ensure they’re usable and valid
Maintain multiple versions
Restricting access to data practices
Role-based access control (RBAS): assign permissions based on job roles and responsibilities
Principle of least privilege (POLP): grant users only min necessary access to perform their tasks
Use strong authentication
Log in and monitor data base access to detect unauthorised activities
Ownership and control of data practices
Policies and procedures to define data ownership, roles and responsibilities and processes for handling data
Compliance with data protection regs
Establish legal contracts that specify data ownership, usage, restrictions and liabilities when dealing with third parties