Data management foundations

0.0(0)

Studied by 4 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/138

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

139 Terms

New cards

Data

is numeric, textual, visual, or audio information that describes real-world systems

New cards

Scope

The amount of data produced and collected can vary. Ex: A small business might track an inventory of a few thousand items, but a large commerce website might track billions of items.

New cards

Format

Data may be produced as numbers, text, image, audio, or video. Ex: A phone's proximity sensor generates raw numbers, and a satellite captures images

New cards

Access

Some data sources are private while others are made publicly available. Ex: A retail company may use private customer data to discover purchasing behavior patterns, but a government may be required by law to share certain data sets.

New cards

What are the 3 ways in which data can vary?

Scope, access, format

New cards

How was data represented previously vs now?

Data was represented in an analog fashion while now data is mostly digital.

New cards

Visualization

Data relationships that are often represented in a graphical format such as images in animation

New cards

Database (database management systems or DBMS)

is software that reads and writes data in a database. Database systems ensure data is secure, internally consistent, and available at all times. These functions are challenging for large databases with many users, so database systems are complex.

New cards

Query

is a request to retrieve or change data in a database

New cards

Query language

is a specialized programming language, designed specifically for database systems. Query languages read and write data efficiently, and differ significantly from general-purpose languages such as Python, Java, and C++.

New cards

Data application

is software that helps business users interact with database systems. Many databases are complex, and most users are not familiar with query languages. Consequently, direct database access is usually not feasible. Instead, programmers write applications to simplify the user experience and ensure data access is efficient and secure.

New cards

Information management system

is a software application that manages corporate data for a specific business function. An information management system usually includes a database system as well as other components, such as a user interface, business logic, and interfaces to other systems.

New cards

Database administrator

is responsible for securing the database system against unauthorized users. A database administrator enforces procedures for user access and database system availability.

New cards

Database designer

determines the format of each data element and the overall database structure. Database designers must balance several priorities, including storage, response time, and support for rules that govern the data. Since these priorities often conflict, database design is technically challenging.

New cards

Database programmer

develops computer programs that utilize a database. Database programmers write applications that combine database query languages and general-purpose programming languages. Query languages and general-purpose languages have significant differences, so database programming is a specialized challenge.

New cards

Database user

is a consumer of data in a database. Database users request, update, or use stored data to generate reports or information. Database users usually access the database via applications but can also submit queries directly to the database system.

New cards

Performance

When many users and applications simultaneously access large databases, query response time degrades rapidly. Database systems maintain fast response times by structuring data properly on storage media and processing queries efficiently.

New cards

Authorization

Many database users should have limited access to specific tables, columns, or rows of a database. Database systems authorize individual users to access specific data.

New cards

Security

Database systems ensure authorized users only access permissible data. Database systems also protect against hackers by encrypting data and restricting access.

New cards

Rules

Database systems ensure data is consistent with structural and business rules. Ex: When multiple copies of data are stored in different locations, copies must be synchronized as data is updated. Ex: When a course number appears in a student registration record, the course must exist in the course catalog.

New cards

Recovery

Computers, database systems, and individual transactions occasionally fail. Database systems must recover from failures and restore the database to a consistent state without loss of data

New cards

Transaction

transaction is a group of queries that must be either completed or rejected as a whole. Execution of some, but not all, queries results in inconsistent or incorrect data. Ex: A debit-credit transaction transfers funds from one bank account to another. The first query removes $100 from one account and the second query deposits $100 in another account. If the first query succeeds but the second fails, $100 is mysteriously lost. The transaction must process either both queries or neither query.

New cards

What must the data system do when processing transactions?

Ensure transactions are processed completely or not at all, Prevent conflicts between concurrent transactions, Ensure transaction results are never lost

New cards

Architecture

of a database system describes the internal components and the relationships between components

New cards

At a high level, the components of most database systems are similar in what aspects?

Query processor which performs query optimization, storage manager, indexes, transaction manager, log, catalog(database management systems dictionary)

New cards

What does the query processor do?

interprets queries, creates a plan to modify the database or retrieve data, and returns query results to the application. The query processor performs query optimization to ensure the most efficient instructions are executed on the data.

New cards

Storage manager

translates the query processor instructions into low-level file-system commands that modify or retrieve data. Database sizes range from megabytes to many terabytes, so the storage manager uses indexes to quickly locate data.

New cards

Transaction manager

ensures transactions are properly executed. The transaction manager prevents conflicts between concurrent transactions. The transaction manager also restores the database to a consistent state in the event of a transaction or system failure.

New cards

Log

is a file containing a complete record of all inserts, updates, and deletes processed by the database. The transaction manager writes log records before applying changes to the database. In the event of a failure, the transaction manager uses log records to restore the database

New cards

Catalog (data dictionary)

is a directory of tables, columns, indexes, and other database objects. Other components use catalog information to process and execute queries.

New cards

Metadata

is data about the database, such as column names and the number of rows in each table. Database systems store metadata in the catalog and use metadata to process queries

New cards

Relational database

stores data in tables, columns, and rows, similar to a spreadsheet. All data in a column has the same format. All data in a row represents a single object, such as a person, place, product, or activity

New cards

SQL

stands for Structured Query Language and includes statements that read and write data, create and delete tables, and administer the database system.

New cards

Big data

The growth of the internet in the 1990s generated massive volumes of online data, often with poorly structured or missing information

New cards

NoSQL

The newer non-relational systems, for 'not only SQL', and are optimized for big data.

New cards

Open source

software is software that anyone can inspect, copy, and modify with no licensing fee.

New cards

Query

is a command for a database that typically inserts new data, retrieves data, updates data, or deletes data from a database.

New cards

Query language

is a computer programming language for writing database queries.

New cards

CRUD

Create, Read, Update, and Delete data.

New cards

SQL statement

INSERT inserts rows into a table.
SELECT retrieves data from a table.
UPDATE modifies data in a table.
DELETE deletes rows from a table

New cards

SQL crate table

statement creates a new table by specifying the table and column names. Each column is assigned a data type that indicates the format of column values. Data types can be numeric, textual, or complex

New cards

Database design

is a specification of database objects such as tables, columns, data types, and indexes. Database design also refers to the process used to develop the specification.

New cards

For a large complex databases, the process has 3 phases:

analysis, logical design, physical design

New cards

Analysis

phase specifies database requirements without regard to a specific database system. Requirements are represented as entities, relationships, and attributes. An entity is a person, place, activity, or thing. A relationship is a link between entities, and an attribute is a descriptive property of an entity.

New cards

Logical design

phase implements database requirements in a specific database system. For relational database systems, logical design converts entities, relationships, and attributes into tables, keys, and columns. A key is a column used to identify individual rows of a table. Tables, keys, and columns are specified in SQL with CREATE TABLE statements.

New cards

Physical design

New cards

Data independence

Physical design affects query processing speed but never affects the query result. The principle that physical design never affects query results

New cards

Information independence

is occasionally used instead of data independence. These terms are synonymous.

New cards

Application programming interface (API)

is a library of procedures or classes that links a host programming language to a database. The host language calls library procedures, which handle details such as connecting to the database, executing queries, and returning results. Ex: JDBC is a library of Java classes that access relational databases

New cards

MySQL

is a leading relational database system sponsored by Oracle. MySQL is relatively easy to install and use, yet has many advanced capabilities. MySQL runs on all major operating systems, including Linux, Unix, Mac OS, and Windows. For these reasons, MySQL is one of the most popular database systems

New cards

MySQL Community (MYSQL Server)

is a free edition. MySQL Server includes a complete set of database services and tools, and is suitable for non-commercial applications such as education.

New cards

MySQL Enterprise

is a paid edition for managing commercial databases. MySQL Enterprise includes MySQL Server and additional administrative applications

New cards

MySQL Command-Line Client

is a text interface included in the MySQL Server download. The Command-Line Client allows developers to connect to the database server, perform administrative functions, and execute SQL statements

New cards

How do you run the Command Line Client

a user must first open a Command Prompt on Windows or a Terminal on a Mac:

Windows: Click the Start button in the Taskbar, type "cmd", then click Command Prompt.
Mac: Click on the Terminal application, usually found in the Applications > Utilities folder.

New cards

When does MySQL server return an error code?

when an SQL statement is syntactically incorrect or the database cannot execute the statement.

New cards

MySQL Workbench

is installed with MySQL Server and allows developers to execute SQL commands using an editor. When MySQL Workbench is started, the user can connect to MySQL Server running on the local machine or on the network

New cards

What does the Local Instance MySQL80 do?

connects to MySQL Server running on the same computer as MySQL Workbench.

New cards

Administration Tab

shows various administrative options, like checking the server's status, importing/exporting data, and starting/stopping the MySQL server.

New cards

Schemas

tab shows a list of available databases. A database can be expanded to show the database's tables

New cards

Null

is a special value that represents either unknown or inapplicable data. NULL is not the same as zero for numeric data types or blanks for character data types. Ex: A zero bonus indicates an employee can, but has not, earned a bonus. A zero bonus is known and applicable, and should not be represented as NULL.

New cards

Not Null

constraint prevents a column from having a NULL value. Statements that insert NULL, or update a value to NULL, are automatically rejected. NOT NULL follows the column name and data type in a CREATE TABLE statement

New cards

Truth tables

The value of logical expressions containing NULL operands is defined

New cards

Null logic

TRUE AND TRUE is TRUE.
TRUE AND FALSE is FALSE.
TRUE AND NULL is NULL

New cards

Insert

statement adds rows to a table

New cards

What are the two insert clauses?

The INSERT INTO clause names the table and columns where data is to be added. The keyword INTO is optional.
The VALUES clause specifies the column values to be added.

New cards

Default values

The optional DEFAULT keyword and default value follow the column name and data type in a CREATE TABLE statement. The column is assigned the default value, rather than NULL, when omitted from an INSERT statement.

New cards

Update statement

UPDATE statement modifies existing rows in a table. The UPDATE statement uses the SET clause to specify the new column values. An optional WHERE clause specifies which rows are updated. Omitting the WHERE clause results in all rows being updated

New cards

Delete statement

The DELETE statement deletes existing rows in a table. The FROM keyword is followed by the table name whose rows are to be deleted. An optional WHERE clause specifies which rows should be deleted. Omitting the WHERE clause results in all rows in the table being deleted.

New cards

Truncate

The TRUNCATE statement deletes all rows from a table. TRUNCATE is nearly identical to a DELETE statement with no WHERE clause except for minor differences that depend on the database system.

New cards

Merge

statement selects data from one table, called the source, and inserts the data to another table, called the target. If the primary keys of source and target rows match, the target values are updated to the corresponding source values. MySQL does not support the MERGE statement

New cards

Primary key

A primary key is a column, or group of columns, used to identify a row. The primary key is usually the table's first column and appears on the left of table diagrams, but the position is not significant to the database. In this material, a solid circle (●) precedes the primary key in table diagrams. Ex: ID is the primary key of the Employee table below

New cards

Primary keys must include what

Unique. This rule ensures that each value identifies at most one row.
Not NULL. This rule ensures that each value identifies at least one row

New cards

Simple primary key

consists of a single column

New cards

Composite primary key

consists of multiple columns. Composite primary keys are denoted with parentheses. Ex: (ColumnA, ColumnB).

New cards

Composite primary keys must include

Unique. Values of primary key columns, when grouped together, must be unique. No group of values may repeat in multiple rows.
Not NULL. No column of a composite primary key may contain a NULL value.
Minimal. All primary key columns are necessary for uniqueness. When any column is removed, the resulting simple or composite column is no longer unique

New cards

Primary key constraint

in a CREATE TABLE statement names the table's primary key. The PRIMARY KEY constraint ensures that a column or group of columns is always unique and non-null.

New cards

Auto increment column

is a numeric column that is assigned an automatically incrementing value when a new row is inserted

New cards

Auto_increment

keyword defines an auto-increment column. AUTO_INCREMENT follows the column's data type in a CREATE TABLE statement.

New cards

Foreign key

is a column, or group of columns, that refer to a primary key. The data types of the foreign and primary keys must be the same, but the names may be different. In this material, an empty circle (○) precedes foreign keys in table diagrams, and an arrow leads to the referenced primary key

New cards

Referential integrity

requires foreign key values must either be NULL or match some value of the referenced primary key.

New cards

Foreign key constraint

A foreign key constraint is added to a CREATE TABLE statement with the FOREIGN KEYand REFERENCES keywords. When a foreign key constraint is specified, the database rejects insert, update, and delete statements that violate referential integrity.

New cards

Fully null

A fully NULL foreign key is a simple or composite foreign key in which all columns are NULL.\\

New cards

How is referential integrity violated?

A primary key is updated.
A foreign key is updated.
A row containing a primary key is deleted.
A row containing a foreign key is inserted

New cards

When referential integrity is violated, which SQL constraints automatically fix it?

RESTRICT rejects an insert, update, or delete that violates referential integrity.
SET NULL sets invalid foreign keys to NULL.
SET DEFAULT sets invalid foreign keys to the foreign key default value.
CASCADE propagates primary key changes to foreign keys.

New cards

ON Update and on delete

ON UPDATE and ON DELETE determine what happens to the foreign key when the referenced primary key is updated or deleted. When several foreign keys refer to the same primary key, different actions can be specified for each foreign key

New cards

What follows the ON UPDATE & ON DELETE?

ON UPDATE and ON DELETE are followed by either RESTRICT, SET NULL, SET DEFAULT, or CASCADE

New cards

Constraint

is a rule that governs allowable values in a database. Constraints are based on relational and business rules, and implemented with special keywords in a CREATE TABLE statement. The database automatically rejects insert, update, and delete statements that violate a constraint

New cards

Column constraint

appears after the column name and data type in a CREATE TABLE statement. Column constraints govern values in a single column. Ex: NOT NULL is a column constraint.

New cards

Table constraint

appears in a separate clause of a CREATE TABLE statement and governs values in one or more columns. Ex: FOREIGN KEY is a table constraint

New cards

Unique constraint

ensures that values in a column, or group of columns, are unique. When applied to a single column, UNIQUE may appear either in the column declaration or a separate clause. When applied to a group of columns, UNIQUE is a table constraint and must appear in a separate clause

New cards

Check

constraint specifies an expression on one or more columns of a table. The constraint is violated when the expression is FALSE and satisfied when the expression is either TRUE or NULL.

New cards

Constraint

Table constraints may be named using the optional CONSTRAINT keyword, followed by the constraint name and declaration. If no name is provided, the database generates a default name. Constraint names appear in error messages when constraints are violated.

New cards

Equijoin

compares columns of two tables with the = operator. Most joins are equijoins

New cards

Non-equijoin

compares columns with an operator other than =, such as < and >.

New cards

Self-join

joins a table to itself. A self-join can compare any columns of a table, as long as the columns have comparable data types. If a foreign key and the referenced primary key are in the same table, a self-join commonly compares those key columns. In a self-join, aliases are necessary to distinguish left and right tables.

New cards

Cross-join

combines two tables without comparing columns. A cross-join uses a CROSS JOIN clause without an ON clause. As a result, all possible combinations of rows from both tables appear in the result.

New cards

Subquery (nested query, inner query)

is a query within another SQL query. The subquery is typically used in a SELECT statement's WHERE clause to return data to the outer query and restrict the selected results. The subquery is placed inside parentheses ().

New cards

Alias

An alias can also help differentiate the columns. An alias is a temporary name assigned to a column or table.

New cards

Correlated

when the subquery's WHERE clause references a column from the outer query. In a correlated subquery, the rows selected depend on what row is currently being examined by the outer query.

100

New cards

Flattening

Replacing a subquery with an equivalent join.