Task 6: databases (CS)

0.0(0)

Studied by 1 person

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/57

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

58 Terms

New cards

Database

Structured collection of data, stored electronically in a computer system

Controlled by a database management system (DBSM)

New cards

Flat file database

Contains individual, unrelated tables (collections of data)

e.g., excel

New cards

Relational database

Contains multiple tables that share related fields (ID, customer)

Use structured query language (SQL) to store, manipulate and retrieve data

e.g., SQLite

New cards

Data v

Raw, unorganised facts that need to be processed

New cards

Info

Data that is processed, organised, structured or presented in a given context to make it useful

New cards

SQL data types

INTEGER - whole

REAL - decimal

Boolean - true/false

TEXT - string

date

TEXT as ISO8601 (“yyyy-mm-dd hh:mm:ss.sss”)
INTEGER as Unix Time (number of seconds since 1970-01-01 00:00:00 UTC

New cards

Relationships

Associations or connections between tables in a relational database

Help organise and structure data, allowing for efficient data retrieval and maintaining data integrity

New cards

Related table

Tables that share a relationship

Established using foreign keys, which are columns in a table that refer to the primary key in another table

New cards

One-to-one

Each record in Table A is associated with one and only one record in Table B, and vice versa

e.g., Tables ‘users’ and ‘user_profiles’, where each user has a single corresponding profile

New cards

One-to-many

Each record in Table A can be associated with multiple records in Table B, but each record in Table B is associated with only one record in Table A

e.g., Tables ‘departments’ and ‘employees’, where each department can have multiple employees, but each employee belongs to one department

New cards

Many-to-many

Each record in Table A can be associated with multiple records in Table B, and vice versa

e.g., Tables ‘students’ and ‘courses’, where each student can enroll in multiple courses, and each course can have multiple students

New cards

Data anomalies

Inconsistency in data resulting from an operation like an update, insertion or deletion

New cards

Normalisation

To reduce the chance of creating data anomalies when modifying database

Minimise or eliminate redundancy

New cards

Redundancy

Several, unrelated copies of the same data in the database

New cards

Update anomaly

Results from data redundancy and partial update

New cards

Delete anomaly

Unintended loss of data due to deletion of other data

New cards

Insert anomaly

Inability to add data to the database due to the absence of other data

New cards

Composite keys

Creating a primary key by combining more than one attribute

New cards

Constraints

Used to limit the type of data that can go into a table

Ensures accuracy and reliability of data

Specified when table is created

e.g., primary key, autoincrement, not null, unique

New cards

Data integrity/quality

Accuracy, completeness and quality of data as it’s maintained over time and across formats

e.g., currency, authenticity, relevance, accuracy, outliers

New cards

Entity integrity

Feature of relational DB

Each entity (row) must have a unique and non-null PK

Ensures that data can’t be listed multiple times

New cards

Domain integrity

Enforced by types and constraints

Ensures the accuracy of values of data within a domain (entity)

New cards

Referential integrity

Relationship between tables

Each table in a DB must have a PK which can appear in other tables

Relationship between FK leads to hierarchy of dependencies

Updating or deleting rows can break this

To prevent use ON UPDATE/DELETE CASCADE/SET NULL

New cards

Currency

Accurate, up to date and relevant

New cards

Authenticity

Trustworthy info collected from real life events

New cards

Relevance

Degree to which data provides insight into the problem or purpose being addressed

New cards

Accuracy

Percentage of errors across the total number of records

New cards

Outliers

Data point that differs significantly from other observations that can distort statistical results

New cards

Accuracy vs validity

Both seek to describe the quality of usability of a data set

Accuracy - how well the data corresponds to the real world or true value

Validity - how well data values are consistent, based on defined rules

New cards

Referential integrity

Making sure that the relationships between tables remain valid, even if data is modified or deleted

Foreign keys join tables and establish dependencies - if change or delete row in one table, meaning of rows in other table is destroyed

New cards

Aggregate function

Performs a calculation on a set of values and returns a single value

e.g., COUNT, SUM AVG, MAX, MIN

New cards

Aliases

Give table or column in table a temporary name, making them more readable

Use AS

New cards

Concatenated fields

Created by combining the values of multiple fields using the | | operator, often with additional characters for formatting

New cards

Calculated fiedls

Columns of a table whose values are a function of other columns in the same row

New cards

Open database connectivity (ODBC)

Open standard API used for accessing a database

Allows programs to use SQL requests that access databases without knowing the proprietary interfaces to the databases

Enables a developer to cover and effectively work with many database systems whereby for relational database it enables a developer to make connections with tabular data in ODBC server

New cards

ACID

Ensures reliable and consistent database transactions

Atomicity, consistency, isolation, durability

New cards

Role of ACID

Provide a mechanism to ensure the correctness and consistency of a database in a way such that each transaction is a group of operations that acts as a single unit, produces consistent results, acts in isolation from other operations, and updates that it makes are durably stored

New cards

Atomicity

Entire transaction takes place at once or doesn’t happen at all

New cards

Consistency

Database must be consistent before and after the transaction

New cards

Isolation

Multiple transactions occur independently without interference

New cards

Durability

The changes of a successful transaction occurs even if the system failure occurs

New cards

Normalisation

Process of organising data in a database

Creating tables and establishing relationships between them according to rules designed to protect data and make it more flexible

Eliminates redundancy and inconsistent dependency

New cards

Form of a relation

No repeated attributes

All cells must be atomic (only contain single piece of data)

New cards

1NF

In the form of a relation

Each record must have a primary or composite key

New cards

2NF

In 1NF

Have no partial dependencies - when a non-key attribute is only dependent on part of the composite key

New cards

3NF

In 2NF

Have no transitive dependencies - when a non-key field is dependent on a field other than the primary key

New cards

Data quality

Assesses whether information can serve its purpose in a particular context

New cards

Cleaning outliers

Deleting the outlier values, transforming the data or treating them as missing values

New cards

Ethics in data collection

Entities in DBs are people which contain their personal info

Important to obtained informed consent for collection, collect only necessary info and ensure data is secured

New cards

Privacy concerns

As DBs store personal info, they are attractive targets for cyberattacks

Protecting data privacy is crucial to maintain trust and comply with regs

New cards

Appropriate use of data

DBs enable data analysis and decision making based on store info (can lead to discrimination)

Personal info datasets are often ‘on sold’ without knowledge or consent of people whose info that’s sold

New cards

Reliability of data sources

DBs often aggregate data from various sources

Ensure the reliability of these sources is crucial for data quality and validity of subsequent analyses or decisions

New cards

Acknowledgement of data sources

DBs often contain data complied or curated by others

Acknowledging data sources is important for respecting IP rights, ensuring transparency and enabling reproducibility of research

New cards

Use of data mining

DBs are the raw material for data mining algorithms, which extract patterns and insights from large datasets

Can be used for various purposes (e.g., market research, scientific discovery)

New cards

Keep personal data private practices

Store personal data in encrypted form

Collect only min necessary data

Implement strict access controls based on roles and responsibilities

Define clear policies for how long it’s retained and how to dispose of it

New cards

Backups of organisational data practices

Schedule regular backups

Store it in multiple, separate locations

Regularly test the restoration process to ensure they’re usable and valid

Maintain multiple versions

New cards

Restricting access to data practices

Role-based access control (RBAS): assign permissions based on job roles and responsibilities

Principle of least privilege (POLP): grant users only min necessary access to perform their tasks

Use strong authentication

New cards

Ownership and control of data practices

Policies and procedures to define data ownership, roles and responsibilities and processes for handling data

Compliance with data protection regs

Establish legal contracts that specify data ownership, usage, restrictions and liabilities when dealing with third parties