1/40
A comprehensive set of question-and-answer flashcards covering key concepts from the lecture on databases, big data, business intelligence, and knowledge management.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are the four main problems caused by the traditional file environment?
Data redundancy, data inconsistency, processing (and reporting) inflexibility, and wasted storage resources.
How does data redundancy differ from data inconsistency?
Data redundancy is the presence of duplicate data in multiple files, while data inconsistency means the same attribute has different values in different files.
What is program–data dependence in a traditional file environment?
A situation where changes in program logic require changes to the data the program accesses, making maintenance difficult.
Give two security-related weaknesses of the traditional file environment.
Poor security (little control over data) and lack of data sharing/availability.
What is a Database Management System (DBMS)?
Software that centralizes data, manages them, and provides controlled access, separating logical from physical views and reducing redundancy.
Name three key benefits a DBMS provides over the traditional file environment.
Controls redundancy, eliminates inconsistency, and uncouples programs from data.
In a relational database table, what is a tuple?
A row that represents a single record or instance of an entity.
What is the purpose of a primary key?
To uniquely identify each record (row) in a table.
Define foreign key.
A primary key from one table that appears in another table to create a relationship and enable look-ups.
State two characteristics of non-relational (NoSQL) databases.
Use a flexible data model and can handle large volumes of unstructured and structured data across distributed machines.
Give one example of a NoSQL product mentioned in the notes.
Oracle NoSQL Database.
What cloud database services are listed in the lecture notes?
Amazon RDS and Microsoft SQL Azure.
What does the data definition capability of a DBMS do?
Specifies the structure of database content and is used to create tables and define field characteristics.
What is a data dictionary?
A repository (automated or manual) that stores definitions of data elements and their characteristics.
Which DBMS language is used to add, change, delete, or retrieve data?
Data Manipulation Language (DML), e.g., SQL.
Differentiate between conceptual (logical) and physical database design.
Conceptual design creates an abstract model from the business perspective; physical design determines how the database is arranged on storage devices for efficiency.
What is normalization?
The process of creating small, stable, flexible, and adaptive data structures that eliminate redundancy and ensure consistent relationships between tables.
Why are referential integrity rules important?
They ensure that relationships between tables remain consistent (e.g., no orphan foreign keys).
What diagram is commonly used by designers to document data models?
Entity-Relationship Diagram (ERD).
Complete the sequence of data sizes: kilobyte, megabyte, gigabyte, , petabyte, exabyte.
Terabyte.
Define big data.
Massive sets of unstructured or semi-structured data whose volume, velocity, and variety are too great for typical DBMS tools to manage.
List three technologies for managing or analyzing big data.
Hadoop, in-memory computing, and analytic platforms (others acceptable: NoSQL, MapReduce, etc.).
What is the primary purpose of a data warehouse?
To store current and historical data from multiple operational systems in a consolidated, standardized form for analysis and reporting.
How does a data mart differ from a data warehouse?
A data mart is a smaller, focused subset of a data warehouse, usually dedicated to a specific business line or user group.
What is Hadoop designed to do?
Enable distributed, parallel processing of big data across inexpensive computers.
Explain in-memory computing.
A technique that stores data in a computer's main memory (RAM) to avoid disk delays, requiring optimized hardware.
What analytical technique supports multidimensional data analysis?
Online Analytical Processing (OLAP).
What is data mining?
The process of finding hidden patterns and relationships in large datasets.
Name three sub-categories of web mining.
Web content mining, web structure mining, and web usage mining.
Mention two advantages of using the web for database access.
Ease of browser use and minimal changes required to existing databases.
What is an information policy?
Firm rules, procedures, and roles for sharing, managing, and standardizing data.
Differentiate between data administration and database administration.
Data administration sets policies and procedures to manage data; database administration focuses on creating and maintaining the physical database.
What is the purpose of a data quality audit?
To assess the accuracy and completeness of data in an information system.
Define data cleansing (data scrubbing).
Software-based detection and correction of incorrect, incomplete, improperly formatted, or redundant data.
According to the DIKW hierarchy, what comes directly above information?
Knowledge.
State the four value-adding steps in the knowledge management value chain.
Acquire, store, disseminate, and apply knowledge.
What are the three broad types of Knowledge Management Systems?
Enterprise-wide knowledge management systems, knowledge work systems, and intelligent techniques.
Give two examples of intelligent techniques used to discover or distill knowledge.
Data mining, neural networks, expert systems, fuzzy logic, genetic algorithms (any two).
Why is getting the data model right critical for a business system?
Because an incorrect data model prevents the system from adequately serving business needs.
Complete the quote: "Knowledge comes from learning. _ comes from living."
Wisdom.
According to Gerald Cohen, what must data do beyond being turned into information?
Impact how the business operates and responds to the changing marketplace.