1/239
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What is the key difference between data and information?
Data are raw facts; information is processed data given context and meaning, suitable for decision making.
What is a database?
A shared, integrated computer structure that stores end-user data and metadata to manage, retrieve, and secure information efficiently.
What is metadata?
Data about data that describes data characteristics, structure, and relationships used to integrate and manage end-user data.
What is a DBMS?
A collection of programs that manages the database structure, controls access, enforces rules, and serves as intermediary between users and data.
Name two main roles of a DBMS.
Provides an integrated view of data and enables secure, efficient sharing, improving access, security, and decision-making.
What does "ubiquitous" mean about data today?
That data are abundant, global, and available everywhere.
What does "pervasive" mean about data today?
That data are prevalent, persistent, and hard to avoid.
List the primary components of a database system.
Hardware; software; people; procedures; data.
What is the importance of database design?
Good design facilitates management and accurate reporting; poor design leads to hidden errors and poor decisions.
What is a single-user database?
Supports one user at a time; often implemented as a desktop database on a personal computer.
What is a multiuser database?
Supports multiple concurrent users, from small workgroups to enterprise-wide deployments.
Define centralized, distributed, and cloud databases.
Centralized: data at a single site; Distributed: data across multiple sites; Cloud: provided and managed by cloud services with defined SLAs.
What is an operational database?
Designed to support daily business operations and transaction processing.
What is an analytical database?
Stores historical data and metrics optimized for tactical or strategic decision making.
What is a data warehouse?
A storage system formatted and optimized for decision support and large-scale analytics.
What is OLAP?
Online Analytical Processing: tools and methods for retrieving, processing, and modeling data from data warehouses.
How are databases classified by structure of data?
Structured: formatted for processing; Semistructured: partially processed; Unstructured: raw original state.
What is XML's role in semistructured data?
XML represents data elements in a textual, tagged format to describe structure and content.
Give the basic file terminology: Data, Field, Record, File.
Data: raw facts; Field: a character/group with meaning; Record: connected fields describing an entity; File: collection of related records.
List common problems with file system data processing.
Lengthy development; slow answers; complex admin; poor security; limited sharing; extensive programming.
What is structural dependence vs. independence?
Structural dependence: programs rely on file structure; independence: file structure changes without breaking application access.
What is data dependence vs. data independence?
Data dependence: access changes when storage changes; independence: storage changes do not affect program access.
What is data redundancy and why is it bad?
Unnecessary duplicate storage across locations causing inconsistency, security issues, entry errors, and integrity problems.
Name the three data anomalies caused by redundancy.
Update anomalies; insertion anomalies; deletion anomalies.
How does a DBMS address file system weaknesses?
By centralizing data structures, relationships, access paths, reducing redundancy, anomalies, and dependence while managing access and security.
List key DBMS functions for reliability and integrity.
Multiuser access control, backup/recovery management, data integrity management, performance tuning, and security management.
What is the role of SQL in DBMS?
SQL is the de facto query and data access language allowing users to specify what to do without programming how to do it.
Give three disadvantages of database systems.
Higher costs; management complexity; vendor dependence and frequent upgrades.
What are common database career roles?
Examples: Database Developer, Designer, Administrator, Analyst, Architect, Consultant, Security Officer, Cloud Architect, Data Scientist.
Summary: core takeaways from Chapter 1.
Data are raw facts stored in databases; DBMSs centralize management and reduce redundancy; types of databases vary by users, location, usage, and structure; good design matters.
What is data modeling?
Creating a specific data model for a defined problem domain to represent real-world data structures simply and usefully.
What is a data model?
A simplified representation of complex real-world data structures used to support a specific problem domain.
Why are data models important?
They facilitate communication, provide multiple views of the database, organize data for users, and support good database design.
Define entity in data modeling.
An object (person, place, thing, or event) about which data will be collected and stored.
What is an attribute?
A characteristic or property of an entity.
What is a relationship in a data model?
An association among entities describing how instances of entities relate to each other.
Give the three common relationship types.
One-to-many (1:M), many-to-many (M:N), one-to-one (1:1).
What is a constraint?
A restriction placed on data to ensure integrity and correctness.
What are business rules?
Brief, precise, unambiguous policies or principles that establish entities, relationships, and constraints in a data model.
Name common sources for discovering business rules.
Company managers; policy makers; department managers; documentation; interviews with end users.
How do nouns and verbs in requirements translate into models?
Nouns → entities; verbs → relationships between entities.
What question determines relationship cardinality?
Ask how many instances of B relate to one instance of A and vice versa.
Why are naming conventions important?
Descriptive, familiar names for entities and attributes facilitate communication and self-documentation.
What characterizes the hierarchical model?
Upside-down tree of segments representing 1:M relationships; used for complex manufacturing and large datasets.
What is a key limitation of hierarchical models?
Navigational access; structural changes require changes in all application programs.
What characterizes the network model?
Allows records to have multiple parents; models complex relationships; introduced schema/subschema, DDL, and DML.
What is a relational model?
A model based on relations (tables) composed of tuples (rows) and attributes (columns) for precise data manipulation.
What is an RDBMS?
Relational DBMS that implements relational model functions, hides relational complexity, and manages tables and queries.
How do tables relate in a relational database?
Each table is independent; rows in different tables relate through common attribute values.
What role does the SQL engine play?
Executes queries and processes data requests in relational systems.
What is an ERD?
Entity Relationship Diagram; a graphical representation of entities and relationships in a database.
What is an entity instance?
A single occurrence (row) of an entity in a relational table.
What is the object-oriented data model?
A model where data and relationships are contained in objects that include operations and attributes.
Define class and inheritance in OODM.
Class: collection of similar objects with shared structure/behavior; Inheritance: subclasses inherit attributes and methods from parent classes.
What is UML used for?
Unified Modeling Language: diagrams and symbols to graphically model systems and object relationships.
What is the extended relational data model (ERDM)?
Relational model extended with OO features, extensible data types, and inheritance support.
What problems drive Big Data / NoSQL adoption?
Volume, velocity, and variety of web/sensor data; conventional structures and OLAP struggle with unstructured large-scale data.
What technologies are associated with Big Data?
Hadoop, HDFS, MapReduce, and NoSQL databases.
What are NoSQL databases optimized for?
Distributed architectures, high scalability, availability, fault tolerance, and handling large sparse/unstructured data.
Name trade-offs of many NoSQL systems.
Optimized for performance over strict transaction consistency; often eventual consistency.
List the four levels of data abstraction.
External (end-user views); Conceptual (global view); Internal (specific DB model); Physical (storage/access methods).
What is the external model?
Representation of end users' views; external schema shows specific representations of those views.
What is the conceptual model?
A global, software- and hardware-independent view used for logical design and identifying main data objects.
What is the internal model?
Mapping of the conceptual model to the DBMS constructs; provides logical independence from physical details.
What is the physical model?
Lowest level describing how data are stored on media and access methods; hardware- and software-dependent.
How do data-modeling requirements vary?
They depend on different data views (global vs local) and the chosen level of data abstraction.
Summary: core takeaways from Chapter 2.
Data models abstract real-world structures; entities, attributes, relationships, and business rules drive design; multiple model types exist to meet different needs; abstraction levels separate user views from storage.
What is the logical view of data in the relational model?
Represents data and relationships using relations (tables) to yield logical simplicity and effective database design.
List three advantages of using tables in the relational model.
Tables provide a two-dimensional row/column view; attribute-named columns; independent rows that represent entity occurrences.
Name four necessary characteristics of a relational table.
Rows are tuples; columns are attributes with domains; intersections are single values; table must have a unique identifier (key).
What is a key in a relational table?
One or more attributes that uniquely identify a row and establish relationships among tables to ensure integrity.
Define functional dependency.
When the value of one attribute (or a set) determines the value of another attribute.
What is a determinant and dependent attribute?
Determinant: attribute whose value determines another; Dependent: attribute whose value is determined by the determinant.
What is full functional dependence?
The entire set of attributes in the determinant is required to determine the dependent attribute.
Define composite key and candidate key.
Composite key: key composed of multiple attributes; Candidate key: a minimal superkey with no redundant attributes.
What is a superkey and a primary key?
Superkey: attribute(s) that uniquely identify rows; Primary key: chosen candidate key that must be unique and non-null.
What is a foreign key and referential integrity?
Foreign key: primary key from one table placed in another; Referential integrity requires non-null foreign keys to reference existing primary keys.
What is a secondary key?
A key used only for data retrieval (non-unique indexing purpose).
What is entity integrity?
Requirement that primary key values are unique and none of its attributes are null to guarantee each row's identity.
Give two ways to handle nulls in a table.
Use flags or special codes to indicate missing data; enforce NOT NULL or UNIQUE constraints where appropriate.
What is relational algebra?
Theoretical set of operators for manipulating relations, producing new relations and obeying closure.
What does the SELECT (restrict) operator do?
Produces a horizontal subset of rows satisfying a condition.
What does the PROJECT operator do?
Produces a vertical subset by selecting specific columns (attributes).
What is UNION and when are tables union-compatible?
UNION combines rows from two tables removing duplicates; union-compatible means same column count and compatible domains.
What does INTERSECT return?
Rows that appear in both union-compatible tables.
What does DIFFERENCE (minus) return?
Rows present in the first table but not in the second (requires union-compatibility).
What is the CARTESIAN PRODUCT operator?
Combines every row of one table with every row of another, yielding all possible row pairs.
Explain the DIVIDE operator in relational algebra.
Divides a double-column table by a single-column table, returning values associated with every row in the divisor.
What is a natural join and how does it differ from an equijoin?
Natural join links tables on common attribute names and values; equijoin links using explicit equality comparisons on specified columns.
Define inner join and outer join (left/right).
Inner join returns only matched rows; outer join retains unmatched rows from one side with nulls for missing matches; left/right indicate which table's unmatched rows are kept.
What is a theta join?
A join based on an inequality comparison operator (any condition other than equality).
What does an index provide in a relational database?
An orderly reference structure that speeds row lookup; index keys point to data locations; can be unique or non-unique.
What is a unique index?
An index whose key maps to exactly one data pointer, enforcing uniqueness of the indexed values.
What is the data dictionary and system catalog?
Data dictionary: user/designer descriptions of tables; System catalog: DBMS-managed metadata describing all database objects.
Why avoid homonyms and synonyms in a database design?
Homonyms (same name different attributes) and synonyms (different names same attribute) create ambiguity and confusion.
How is an M:N relationship implemented in a relational model?
By creating a composite (associative) entity that holds primary keys from the related tables, converting M:N into two 1:M relationships.
When might data redundancy be intentionally preserved?
To support crucial reporting needs or to preserve historical accuracy despite general goals to minimize redundancy.
List three common relationship types in relational databases.
One-to-many (1:M), one-to-one (1:1), many-to-many (M:N via associative entity).
Name three integrity constraints enforced by RDBMS.
Entity integrity, referential integrity, and column constraints (NOT NULL, UNIQUE).
What are two practical purposes of Codd's relational rules?
Ensure data is represented in tables and guarantee systematic access, treatment of nulls, metadata management, and integrity stored in catalogs.