IB comp sci - HL databases

Studied by 28 people

5.0(1)

Get a hint

Hint

Benefits of normalization

1 / 48

There's no tags or description

Looks like no one added any tags here yet for you.

49 Terms

Benefits of normalization

improve data consistency and reduce redundancy

New cards

Definition of Relational databases

Organises data into one or more tables, where each table represents a collection of related data

New cards

Applications for Relational databases

Inventory management, accounting, and e-commerce

New cards

Advantages of Relational databases

flexible, scalable, and efficient, and allow for easy data retrieval and manipulation through the use of SQL (Structured Query Language) queries

New cards

definition of a object-oriented database

databases that store data in the form of objects, which can contain both data and behavior. Supports Encapsulation, Inheritance and Polymorphism

New cards

Applications for an object-oriented database

datasets with complex relationships between data, such as those found in social networks, medical records, or financial transactions

New cards

encapsulation

hiding internal state; the technique of making the states in a class private and providing access to those states via public behaviours (methods)

New cards

inheritance

Is the mechanism by which one class is allowed to inherit the features (states and behaviors) of another class

New cards

polymorphism

A concept that enables a single object to work with multiple types or objects of different classes, increasing flexibility and reusability of code. Supports reusability, flexibility, and maintainability

New cards

definition of a network database model

Organises data into a graph-like structure, where data is represented as nodes (also called records) and relationships between them

New cards

Applications of a network database model

complex data structures that have many-to-many relationships between entities, such as social networks, genealogy, or supply chain management.

New cards

Advantages of a network database model

efficiently store and retrieve complex relationships between data, and allow for more flexible and powerful analysis of the data.

New cards

Definition of a spatial database model

database model that is used to store and manage data that has a location or geographic component
specialized type of database model that is designed to store and manage spatial data

New cards

Applications of a spatial database model

geographic information systems (GIS), environmental management, transportation planning, and urban planning

New cards

Advantages of a spatial database model

organisations can store and analyse spatial data more efficiently and accurately, and make better-informed decisions based on the geographic relationships between different features

New cards

Definition of a multi dimensional database model

Each dimension represents a different aspect of the data. In other words, is designed to handle data that has multiple dimensions, such as time, geography, and product type

New cards

Applications of a multi dimensional database model

business intelligence, financial analysis, and forecasting

New cards

Advantages of a multi-dimensional database model

quickly and easily analyze the data from different perspectives, and identify trends and insights that might not be apparent from a traditional table-based view.
It supports aggregation of data across multiple dimensions, allowing users to analyse data from different perspectives and levels of detail

New cards

importance of choosing the right database model

Applications could suffer from slowed performance, data inconsistencies, and difficulties in adding new features

New cards

Advantages of an object-oriented database model

Flexibility: more flexible than RDBs because they allow for complex and hierarchical data structures. This is because they allow for the storage of complex objects that can have methods and properties.

Performance: Better performance than RDBs when dealing with complex queries and large volumes of data. This is because they can use more advanced indexing and caching techniques than RDBs.

Reduced Mapping Overhead: They eliminate the need for mapping objects to relational structures, which can simplify development and reduce the amount of code needed to handle data.

New cards

Disadvantages of object-oriented databases

Complexity: They are more complex than RDBs because they require knowledge of both ____ programming and database design.

Limited Tool Support: There are fewer tools and frameworks available for working with them than RDBs, which can make development and maintenance more difficult.

Scalability: they may have scalability issues because they are not as widely used as RDBs and may not be able to handle as many simultaneous users.

New cards

Advantages of relational database models

Familiarity: They are widely used and well-understood, making them easier to use and maintain for many developers.

Tool Support: There are a wide range of tools and frameworks available for working with ___, which can make development and maintenance easier.

Scalability: they can be highly scalable because they are widely used and have been optimised for performance and scalability.

New cards

Disadvantage of relational database model

Limited Flexibility: They are limited in their ability to handle complex and hierarchical data structures.

Performance: they have performance issues when dealing with complex queries and large volumes of data.

Mapping Overhead: they require mapping objects to relational structures, which can increase development time and introduce complexity.

New cards

Requirements of a database warehouse

1: Subject oriented - Broken down in to subject specific areas.

2: Integrated - Follows common rules for data conventions

3: Time-Variant - Contains historical data and a means to query data by date.

4: Non-Volatile - Cannot be changed once stored in the data warehouse.

5: Summarised - The data should be summarised ready for easy analysis.

New cards

Definition of database warehousing

the process of collecting, storing, and managing large amounts of data from multiple sources for the purpose of creating a centralised repository for decision-making and business intelligence

New cards

Applications of a database warehouse

situations where large amounts of data need to be analysed for business intelligence and decision-making purposes
Some examples include:

Business reporting: support business reporting by providing a centralised repository for data from multiple sources.

Data integration: integrate data from disparate sources into a single, unified view of the data.

Data mining: Enable organisations to perform data mining, or the process of uncovering hidden patterns and relationships in large amounts of data.

Customer behaviour analysis: analyse customer purchasing patterns, preferences and behaviour.

Financial analysis: analyse large amounts of financial data to identify trends and make informed decisions.

Supply chain management: analyse supply chain data to identify areas for improvement and make decisions that increase efficiency and reduce costs.

Marketing analysis: Analyse customer data to understand market trends and target customers more effectively.

New cards

definition of time-dependent

Data is only valid and accurate at some point in time or over a time interval

New cards

Why a database warehouse is time dependent

because it stores historical data as well as current data

New cards

Advantages of database warehousing

Allows for trend analysis and historical reporting
can help organisations make informed decisions based on past performance and predict future trends

New cards

Applications of time-dependancy

Identify trends in sales, such as which products are selling well and which stores are performing best. This information can be used to make informed decisions about inventory management, marketing, and store operations.
can provide a unified and consistent view of the data over time, even as the underlying data sources change.

New cards

Methods of real-time processing

Stream processing: This involves capturing data in real-time as it is generated, transforming it on the fly, and loading it directly into the warehouse. This method is suitable for high-velocity data streams, such as log data, social media feeds, and IoT sensor data.
Change data capture (CDC): This involves monitoring source systems for changes and capturing only the changes as they occur, rather than reading the entire source data set. The changes are then transformed and loaded into the warehouse.
Batch processing with near real-time updates: This involves processing data in batch mode and updating the warehouse periodically, but with a minimal delay. This method is suitable for large data sets that can be processed in batch mode, such as sales data or financial data.

New cards

Disadvantages of data warehousing

High Cost: Can be a costly undertaking, requiring significant investments in hardware, software, and personnel. This can be a challenge for smaller organizations or organizations with limited budgets.
Complexity: Can be complex, requiring specialized skills and knowledge to implement and maintain. This can make it challenging for organizations to manage their data warehousing systems, leading to errors and reduced data quality.
Data Latency: Relies on batch processing to update the warehouse, which can result in data latency. This means that the data in the warehouse may not reflect the most current state of the business, reducing its usefulness for real-time decision making.
Data Integration Challenges: Integrating data from multiple sources can be challenging, particularly if the data has different structures or formats. This can result in errors, data quality issues, and a need for extensive data cleaning and transformation.
Maintenance Requirements: Requires ongoing maintenance, including data backup and recovery, performance tuning, and hardware upgrades. This can be time-consuming and costly, and can take resources away from other critical business tasks.
Data Privacy and Security: Can raise privacy and security concerns, as it requires storing and analyzing large amounts of sensitive data. This can result in the risk of data breaches and the unauthorized access of confidential information.

New cards

Extract, Transform, Load

an essential part of data warehousing. The process is used to extract data from various sources, transform it into a format that can be used by the data warehouse, and load it into the data warehouse.

New cards

Advantages of ETL

provides a way to extract data from these disparate sources and transform it into a consistent format that can be used by the data warehouse. This process ensures that the data is accurate, complete, and consistent, which is essential for making informed decisions based on the data.

New cards

Methods used by ETL

Removing Duplicates: Can identify and remove duplicate records from the data, ensuring that each record is unique and accurate.
Standardizing Formats: Can standardize the format of data, such as dates or addresses, to ensure consistency across the data.
Data Validation: Can validate data to ensure that it meets certain criteria, such as data type or range, to ensure accuracy and completeness.
Correcting Errors: Can correct errors in the data, such as misspelled names or incorrect values, to ensure accuracy.
Handling Missing Data: Can handle missing data by either filling in missing values with a default value or dropping records with missing data, depending on the requirements of the data warehouse.

New cards

Definition of data mining

the process of discovering patterns and insights from large datasets

New cards

Methods for data mining

Cluster Analysis: Cluster analysis is a technique used to group similar items or records together based on their characteristics. It is used to identify natural groupings in the data and can be used for market segmentation or customer profiling. For example, a retailer may use cluster analysis to group customers based on their purchasing behaviour.
Association: Association is a technique used to discover relationships between variables or items in the data. It is used to find co-occurrences of items or events and can be used for market basket analysis or recommendation engines. For example, a retailer may use association analysis to identify which products are commonly purchased together.
Classification: Classification is a technique used to predict the value of a categorical variable based on other variables in the data. It is used for making predictions and can be used for credit risk analysis or fraud detection. For example, a bank may use classification to predict whether a loan applicant is likely to default on a loan.
Sequential Patterns: Sequential patterns are patterns that occur over time, such as a sequence of events or a pattern of behaviour. It is used to find patterns in temporal data and can be used for predicting future events. For example, a healthcare provider may use sequential pattern analysis to predict the likelihood of a patient developing a certain condition based on their medical history.
Forecasting: Forecasting is a technique used to predict future values of a variable based on its past values. It is used for making predictions and can be used for sales forecasting or demand forecasting. For example, a retailer may use forecasting to predict sales for a particular product based on past sales data.

New cards

Applications for data mining

Fraud detection in banking, Targeted marketing in retail,

New cards

Advantages of data mining

Organisations can identify risks, opportunities, and trends that may not be immediately apparent from the data.

New cards

definition of predictive modelling

a technique used to analyse data and make predictions about future events or trends. It involves using statistical algorithms and machine learning techniques to build models that can predict the outcome of future events based on historical data.

New cards

A key technique used in predictive modeling

decision tree induction. Decision tree induction involves constructing a tree-like model of decisions and their possible consequences

New cards

Definition of database segmentation

the process of dividing a database into smaller, more manageable segments or subsets.

New cards

advantages of data segmentation

Performance: Large databases can be slow to query and update, especially if they contain millions or billions of records. By ____ the database into smaller subsets, queries and updates can be performed more quickly, improving performance.
Security: Can be used to limit access to sensitive data. For example, an organisation may _____ so that only authorised personnel have access to certain records or fields.
Maintenance: Smaller databases are easier to maintain and backup than larger databases. By _____ into smaller subsets, organisations can simplify maintenance tasks, such as backup and recovery.
Scalability: As a database grows in size, it may become necessary to _____ it into smaller subsets to accommodate growth. This can also help to distribute the load across multiple servers or clusters, improving scalability.
Compliance: Some industries or regulations require data to be _______ or partitioned. For example, the Payment Card Industry Data Security Standard (PCI DSS) requires that credit card data be stored in a separate, secure database ______.

New cards

definition of link analysis

to identify and analyse relationships and patterns between individual records

New cards

Advantages of link analysis

can provide insights into customer behaviour, social networks, and other patterns that may be relevant to a particular analysis or investigation.

New cards

Applications for link analysis

Identifying Key Relationships: Can be used to identify key relationships between records, such as frequent co-occurrences or shared attributes. This information can be used to understand the nature of the relationships between records and to identify key patterns or trends.
Cluster Analysis: Can be used to group records into clusters or subgroups based on their relationships. This can help to identify patterns or trends within the data set and to understand the structure of the data.
Fraud Detection: Can be used to detect patterns of fraud or other suspicious activity within a data set. By analysing the relationships between individual records, this technique can identify anomalies or unusual patterns that may be indicative of fraudulent behavior.
Customer Analysis: Can be used to analyse the behaviour of customers, such as their purchasing patterns or interactions with other customers. By analysing the relationships between individual records, this technique can provide insights into customer behaviour and preferences.

New cards

definition of Deviation detection

the process of identifying when the values in a database deviate from expected or normal values

New cards

Application of deviation detection

identify unusual or unexpected data values that may indicate errors, outliers, or other anomalies in the data
ensuring the accuracy and integrity of data and in detecting issues that may impact the quality of decisions made based on the data

New cards