Looks like no one added any tags here yet for you.
Benefits of normalization
improve data consistency and reduce redundancy
Definition of Relational databases
Organises data into one or more tables, where each table represents a collection of related data
Applications for Relational databases
Inventory management, accounting, and e-commerce
Advantages of Relational databases
flexible, scalable, and efficient, and allow for easy data retrieval and manipulation through the use of SQL (Structured Query Language) queries
definition of a object-oriented database
databases that store data in the form of objects, which can contain both data and behavior. Supports Encapsulation, Inheritance and Polymorphism
Applications for an object-oriented database
datasets with complex relationships between data, such as those found in social networks, medical records, or financial transactions
encapsulation
hiding internal state; the technique of making the states in a class private and providing access to those states via public behaviours (methods)
inheritance
Is the mechanism by which one class is allowed to inherit the features (states and behaviors) of another class
polymorphism
A concept that enables a single object to work with multiple types or objects of different classes, increasing flexibility and reusability of code. Supports reusability, flexibility, and maintainability
definition of a network database model
Organises data into a graph-like structure, where data is represented as nodes (also called records) and relationships between them
Applications of a network database model
complex data structures that have many-to-many relationships between entities, such as social networks, genealogy, or supply chain management.
Advantages of a network database model
efficiently store and retrieve complex relationships between data, and allow for more flexible and powerful analysis of the data.
Definition of a spatial database model
database model that is used to store and manage data that has a location or geographic component
specialized type of database model that is designed to store and manage spatial data
Applications of a spatial database model
geographic information systems (GIS), environmental management, transportation planning, and urban planning
Advantages of a spatial database model
organisations can store and analyse spatial data more efficiently and accurately, and make better-informed decisions based on the geographic relationships between different features
Definition of a multi dimensional database model
Each dimension represents a different aspect of the data. In other words, is designed to handle data that has multiple dimensions, such as time, geography, and product type
Applications of a multi dimensional database model
business intelligence, financial analysis, and forecasting
Advantages of a multi-dimensional database model
quickly and easily analyze the data from different perspectives, and identify trends and insights that might not be apparent from a traditional table-based view.
It supports aggregation of data across multiple dimensions, allowing users to analyse data from different perspectives and levels of detail
importance of choosing the right database model
Applications could suffer from slowed performance, data inconsistencies, and difficulties in adding new features
Advantages of an object-oriented database model
Flexibility: more flexible than RDBs because they allow for complex and hierarchical data structures. This is because they allow for the storage of complex objects that can have methods and properties.
Performance: Better performance than RDBs when dealing with complex queries and large volumes of data. This is because they can use more advanced indexing and caching techniques than RDBs.
Reduced Mapping Overhead: They eliminate the need for mapping objects to relational structures, which can simplify development and reduce the amount of code needed to handle data.
Disadvantages of object-oriented databases
Complexity: They are more complex than RDBs because they require knowledge of both ____ programming and database design.
Limited Tool Support: There are fewer tools and frameworks available for working with them than RDBs, which can make development and maintenance more difficult.
Scalability: they may have scalability issues because they are not as widely used as RDBs and may not be able to handle as many simultaneous users.
Advantages of relational database models
Familiarity: They are widely used and well-understood, making them easier to use and maintain for many developers.
Tool Support: There are a wide range of tools and frameworks available for working with ___, which can make development and maintenance easier.
Scalability: they can be highly scalable because they are widely used and have been optimised for performance and scalability.
Disadvantage of relational database model
Limited Flexibility: They are limited in their ability to handle complex and hierarchical data structures.
Performance: they have performance issues when dealing with complex queries and large volumes of data.
Mapping Overhead: they require mapping objects to relational structures, which can increase development time and introduce complexity.
Requirements of a database warehouse
1: Subject oriented - Broken down in to subject specific areas.
2: Integrated - Follows common rules for data conventions
3: Time-Variant - Contains historical data and a means to query data by date.
4: Non-Volatile - Cannot be changed once stored in the data warehouse.
5: Summarised - The data should be summarised ready for easy analysis.
Definition of database warehousing
the process of collecting, storing, and managing large amounts of data from multiple sources for the purpose of creating a centralised repository for decision-making and business intelligence
Applications of a database warehouse
situations where large amounts of data need to be analysed for business intelligence and decision-making purposes
Some examples include:
Business reporting: support business reporting by providing a centralised repository for data from multiple sources.
Data integration: integrate data from disparate sources into a single, unified view of the data.
Data mining: Enable organisations to perform data mining, or the process of uncovering hidden patterns and relationships in large amounts of data.
Customer behaviour analysis: analyse customer purchasing patterns, preferences and behaviour.
Financial analysis: analyse large amounts of financial data to identify trends and make informed decisions.
Supply chain management: analyse supply chain data to identify areas for improvement and make decisions that increase efficiency and reduce costs.
Marketing analysis: Analyse customer data to understand market trends and target customers more effectively.
definition of time-dependent
Data is only valid and accurate at some point in time or over a time interval
Why a database warehouse is time dependent
because it stores historical data as well as current data
Advantages of database warehousing
Allows for trend analysis and historical reporting
can help organisations make informed decisions based on past performance and predict future trends
Applications of time-dependancy
Identify trends in sales, such as which products are selling well and which stores are performing best. This information can be used to make informed decisions about inventory management, marketing, and store operations.
can provide a unified and consistent view of the data over time, even as the underlying data sources change.
Methods of real-time processing
Stream processing: This involves capturing data in real-time as it is generated, transforming it on the fly, and loading it directly into the warehouse. This method is suitable for high-velocity data streams, such as log data, social media feeds, and IoT sensor data.
Change data capture (CDC): This involves monitoring source systems for changes and capturing only the changes as they occur, rather than reading the entire source data set. The changes are then transformed and loaded into the warehouse.
Batch processing with near real-time updates: This involves processing data in batch mode and updating the warehouse periodically, but with a minimal delay. This method is suitable for large data sets that can be processed in batch mode, such as sales data or financial data.
Disadvantages of data warehousing
High Cost: Can be a costly undertaking, requiring significant investments in hardware, software, and personnel. This can be a challenge for smaller organizations or organizations with limited budgets.
Complexity: Can be complex, requiring specialized skills and knowledge to implement and maintain. This can make it challenging for organizations to manage their data warehousing systems, leading to errors and reduced data quality.
Data Latency: Relies on batch processing to update the warehouse, which can result in data latency. This means that the data in the warehouse may not reflect the most current state of the business, reducing its usefulness for real-time decision making.
Data Integration Challenges: Integrating data from multiple sources can be challenging, particularly if the data has different structures or formats. This can result in errors, data quality issues, and a need for extensive data cleaning and transformation.
Maintenance Requirements: Requires ongoing maintenance, including data backup and recovery, performance tuning, and hardware upgrades. This can be time-consuming and costly, and can take resources away from other critical business tasks.
Data Privacy and Security: Can raise privacy and security concerns, as it requires storing and analyzing large amounts of sensitive data. This can result in the risk of data breaches and the unauthorized access of confidential information.
Extract, Transform, Load
an essential part of data warehousing. The process is used to extract data from various sources, transform it into a format that can be used by the data warehouse, and load it into the data warehouse.
Advantages of ETL
provides a way to extract data from these disparate sources and transform it into a consistent format that can be used by the data warehouse. This process ensures that the data is accurate, complete, and consistent, which is essential for making informed decisions based on the data.
Methods used by ETL
Removing Duplicates: Can identify and remove duplicate records from the data, ensuring that each record is unique and accurate.
Standardizing Formats: Can standardize the format of data, such as dates or addresses, to ensure consistency across the data.
Data Validation: Can validate data to ensure that it meets certain criteria, such as data type or range, to ensure accuracy and completeness.
Correcting Errors: Can correct errors in the data, such as misspelled names or incorrect values, to ensure accuracy.
Handling Missing Data: Can handle missing data by either filling in missing values with a default value or dropping records with missing data, depending on the requirements of the data warehouse.
Definition of data mining
the process of discovering patterns and insights from large datasets
Methods for data mining
Cluster Analysis: Cluster analysis is a technique used to group similar items or records together based on their characteristics. It is used to identify natural groupings in the data and can be used for market segmentation or customer profiling. For example, a retailer may use cluster analysis to group customers based on their purchasing behaviour.
Association: Association is a technique used to discover relationships between variables or items in the data. It is used to find co-occurrences of items or events and can be used for market basket analysis or recommendation engines. For example, a retailer may use association analysis to identify which products are commonly purchased together.
Classification: Classification is a technique used to predict the value of a categorical variable based on other variables in the data. It is used for making predictions and can be used for credit risk analysis or fraud detection. For example, a bank may use classification to predict whether a loan applicant is likely to default on a loan.
Sequential Patterns: Sequential patterns are patterns that occur over time, such as a sequence of events or a pattern of behaviour. It is used to find patterns in temporal data and can be used for predicting future events. For example, a healthcare provider may use sequential pattern analysis to predict the likelihood of a patient developing a certain condition based on their medical history.
Forecasting: Forecasting is a technique used to predict future values of a variable based on its past values. It is used for making predictions and can be used for sales forecasting or demand forecasting. For example, a retailer may use forecasting to predict sales for a particular product based on past sales data.
Applications for data mining
Fraud detection in banking, Targeted marketing in retail,
Advantages of data mining
Organisations can identify risks, opportunities, and trends that may not be immediately apparent from the data.
definition of predictive modelling
a technique used to analyse data and make predictions about future events or trends. It involves using statistical algorithms and machine learning techniques to build models that can predict the outcome of future events based on historical data.
A key technique used in predictive modeling
decision tree induction. Decision tree induction involves constructing a tree-like model of decisions and their possible consequences
Definition of database segmentation
the process of dividing a database into smaller, more manageable segments or subsets.
advantages of data segmentation
Performance: Large databases can be slow to query and update, especially if they contain millions or billions of records. By ____ the database into smaller subsets, queries and updates can be performed more quickly, improving performance.
Security: Can be used to limit access to sensitive data. For example, an organisation may _____ so that only authorised personnel have access to certain records or fields.
Maintenance: Smaller databases are easier to maintain and backup than larger databases. By _____ into smaller subsets, organisations can simplify maintenance tasks, such as backup and recovery.
Scalability: As a database grows in size, it may become necessary to _____ it into smaller subsets to accommodate growth. This can also help to distribute the load across multiple servers or clusters, improving scalability.
Compliance: Some industries or regulations require data to be _______ or partitioned. For example, the Payment Card Industry Data Security Standard (PCI DSS) requires that credit card data be stored in a separate, secure database ______.
definition of link analysis
to identify and analyse relationships and patterns between individual records
Advantages of link analysis
can provide insights into customer behaviour, social networks, and other patterns that may be relevant to a particular analysis or investigation.
Applications for link analysis
Identifying Key Relationships: Can be used to identify key relationships between records, such as frequent co-occurrences or shared attributes. This information can be used to understand the nature of the relationships between records and to identify key patterns or trends.
Cluster Analysis: Can be used to group records into clusters or subgroups based on their relationships. This can help to identify patterns or trends within the data set and to understand the structure of the data.
Fraud Detection: Can be used to detect patterns of fraud or other suspicious activity within a data set. By analysing the relationships between individual records, this technique can identify anomalies or unusual patterns that may be indicative of fraudulent behavior.
Customer Analysis: Can be used to analyse the behaviour of customers, such as their purchasing patterns or interactions with other customers. By analysing the relationships between individual records, this technique can provide insights into customer behaviour and preferences.
definition of Deviation detection
the process of identifying when the values in a database deviate from expected or normal values
Application of deviation detection
identify unusual or unexpected data values that may indicate errors, outliers, or other anomalies in the data
ensuring the accuracy and integrity of data and in detecting issues that may impact the quality of decisions made based on the data