Flat Files
a file having no internal hierarchy
Hashed Files
A file that has been encrypted for security purposes.
Heap File
An unsorted set of records.
Information
The transformation of raw data into useful facts.
Punch Card
A card that is perforated and can hold commands or data.
Structured Data
Information with a high degree of organization.
Unstructured Data
Information that does not have structure (such as text)
Binary Relationship
A relationship between two entity types.
Unary Relationship
An associate occurrence of an entity type with other occurrences of the same entity type.
Cardinality
The maximum number of entities that can be involved in a particular relationship.
E-R Model
*E-R = Entity - Relationship
Diagram of entities together with their attributes and the relationship among them.
Intersection Data
It is data that describes a many-to-many relationship.
Modality
It is a minimum number of entity occurrences that can be involved in a relationship.
One-to-one Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
Ternary Relationship
Involves three different entity types.
Unique identifier
It is used to uniquely identify each record in a database table.
Attribute
A property, characteristic, or fact that we know about an entity.
"A salesperson works in one office."
What is the name of this relationship?
One-to-one binary relationship
"A salesperson sells to many customers."
What is the name of this relationship?
One-to-many binary relationship
"A salesperson is authorized to sell many products, and a product can be sold by many salespersons."
What is the name of this relationship?
Many-to-many binary relationship
What is the positioning and meaning for Cardinality and Modality on an E-R model?
Cardinality is the outer symbol; represents the maximum.
Modality is the inner symbol; represents the minimum.
"A salesperson works in a minimum of one and a maximum of one office, and an office may be occupied by or assigned to a minimum of zero and a maximum of one salesperson."
"A salesperson may have no customers or many customers."
Describe the ER model for "Each salesperson is authorized to sell to at least one or many products, and each product can be sold by at least one or many salespeople."
"One salesperson backs-up another salesperson."
What is the name of this model?
One-to-one unary relationship
"A salesperson manages zero to many other salespersons, and a salesperson is managed by exactly one other salesperson."
What is the name of this model?
One-to-many unary relationship
"A product can either be part of no other products or be part of several other products, and a product can either be composed of no other products or be composed of several other products."
What is the name of this model?
Many-to-many unary relationship
What does 'refer' in Referential Integrity imply?
This revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation.
Define the delete rule RESTRICT.
If the delete rule between two relations is RESTRICT and an attempt is made to delete a record on the "one side" of the one-to-many relationship, the system will forbid the delete to take place if there are any matching foreign key values in the relation on the "many side".
Define the delete rule CASCADE.
If the delete rule between two relations is CASCADE and an attempt is made to delete a record on the "one side" of the relationship, not only will the record be deleted but all of the records on the "many side" of the relationship that have a matching foreign key value will also be deleted.
In other words, the delete will "cascade" from one relation to the other.
Define the delete rule SET-TO-NULL.
If the delete rule between the two relations is SET-TO-NULL and an attempt is made to delete a record on the "one side" of the one-to-many relationship, that record will be deleted and the matching foreign key values in the records on the "many side" of the relationship will be set to null.
Essentially, it's exactly like the CASCADE delete option, but instead of completely deleting all possible values, the values are set to NULL instead.
Which entity is uniquely identified by concatenating the primary keys of the two entities it connects?
Associative Entity
Which type of entity is also called a dependent entity?
Weak Entity
Candidate Key
This is when a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entity.
Concurrency Problem
When two or more users are trying to update the same record simultaneously.
Equijoin
Combines two or more tables based on a column that is common to the tables.
Example: Joining Client and Salesman tables that both contain the SalesmanID column which have the exact same values.
Foreign Key
When an attribute or group of attributes serves as the primary key of one relation and also appears in another relation.
Natural Join
Matches each row in a table against each row in another table based on common values found in columns sharing a common name and data type.
Tuple
Rows/records are referred to as tuples when talking about relations. They serve the exact same function, it just has a different name in the context of relations.
What are the five basic principles of The Database Concept?
The creation of a datacentric environment that is a significant company resource, which can be shared inside and outside the company.
The ability to achieve data integration while storing data in a non-redundant fashion.
The ability to store data representing entities involved in multiple relationships w/o introducing data redundancy.
Managing data control issues such as data security, backup and recovery, and concurrency control.
High degree of data independence.
What are the four major DBMS approaches?
Hierarchical
Network
Relational
Object-oriented
What are four key differences between a RELATION and a FILE?
The columns of a relation can be arranged in any order w/o affecting the meaning of the data. That is not true of a file.
Similarly, the rows of a relation can be arranged in any order, which is not true of a file.
Every row/column position, sometimes referred to as a "cell", can have only a single value, which is not necessarily true in a file.
No two rows of a relation are identical, which is not necessarily true in a file.
in the SELECT clause
It indicates that all attributes of the selected row are to be retrieved
AND operator
It displays a record if more than one condition is true
AVG() function
It returns the average value of a numeric column.
BETWEEN operator
It allows you to specify a range of numeric values in a search.
DISTINCT operator
It is used to eliminate duplicate rows in a query result.
IN operator
It allows you to specify a list of character strings to be included in a search
JOIN clause
It is used to combine rows from more than one table, based on a common field between them. Sometimes it is done by using the '=' symbol.
LIKE operator
It allows you to specify partial character strings in a "wildcard" sense.
OR operator
It displays a record it either the first condition OR the second condition is true.
ORDER BY clause
It simply takes the result of a SQL query and orders them by one or more specified attributes.
SELECT command
Data retrieval in SQL is accomplished with the SELECT command.
Subquery
When on SELECT statement is "nested" within another in a format, it is known as subquery. This is shown when there is a second SELECT phrase within a set of parenthesis.
Common DDL commands:
DROP
ALTER
RENAME
CREATE
TRUNCATE
Common DML commands:
UPDATE
DELETE
INSERT
MERGE
SELECT
Write the basic SQL query command:
Write the SQL query to "Find the commission percentage and year of hire of salesperson 186":
SELECT COMMPERCT, YEARHIRE FROM SALESPERSON WHERE SPNUM=186;
Write the SQL query to "Retrieve the entire record for salesperson 186":
SELECT * FROM SALESPERSON WHERE SPNUM=186;
Write the SQL query to "List the salesperson numbers and salesperson names of those salespersons who have a commission percentage of 10.":
SELECT SPNUM, SPNAME FROM SALESPERSON WHERE COMMPERCT=10;
Write the SQL query to "List the salesperson numbers, salesperson names, and commission percentages of the salespersons whose commission percentage is less than 12.":
SELECT SPNUM, SPNAME, COMMPERCT FROM SALESPERSON WHERE COMMPERCT<12;
Write the SQL query to "List the customer numbers and headquarters cities of all customers that have a customer number of at least 1700":
SELECT CUSTNUM, HQCITY FROM CUSTOMER WHERE CUSTNUM>=1700;
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York and that have a customer number higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' AND CUSTNUM>1500;
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York OR that have customer numbers higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' OR CUSTNUM>1500;
Write the SQL query to "List the customers, customer names, and headquarters cities of the customers that are headquartered in New York or that satisfy the two conditions of having a customer number higher than 1500 and being headquartered in Atlanta":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' OR (CUSTNUM>1500 AND HQCITY='Atlanta');
Write the SQL query to "List the customer records for those customers whose names begin with the letter 'A' ":
SELECT * FROM CUSTOMER WHERE CUSTNAME LIKE 'A%';
Write the SQL query to "Find the customer numbers, customer names, and headquarters cities of those customers with the customer numbers greater than 1000. List the results in alphabetic order by headquarters cities (and have the customer names within the same city alphabetized)":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE CUSTNUM>1000 ORDER BY HQCITY, CUSTNAME;
Write the SQL query to "Find the average quantity of units of the different products that Salesperson 137 has sold":
SELECT AVG(QUANTITY) FROM SALES WHERE SPNUM=137;
Write the SQL query to "Find the total quantity of units of all products that Salesperson 137 has sold":
SELECT SUM(QUANTITY) FROM SALES WHERE SPNUM=137;
Write the SQL query to "Find the name of the salesperson responsible for Customer Number 1525":
SELECT SPNAME FROM SALESPERSON, CUSTOMER WHERE SALESPERSON.SPNUM=CUSTOMER.SPNUM AND CUSTNUM=1525;
Write the SQL query to "List the NAMES of the products of which salesperson Adams has sold more than 2000 units":
SELECT PRODNAME FROM SALESPERSON, PRODUCT, SALES WHERE SALESPERSON.SPNUM=SALES.SPNUM AND SALES.PRODNUM=PRODUCT.PRODNUM AND SPNAME='Adams' AND QUANTITY>2000;
CREATE TABLE command
The command that creates base tables and tells the system what attributes will be in them.
CREATE VIEW command
Specifies the base tables on which the view is to be based and the attributes and rows of the table that are to be included in the view.
DELETE command
Specify which row(s) of a table are to be deleted based on data values within those rows.
DROP TABLE command
Discards an entire table from a database.
DROP VIEW command
Discards views.
Normalization
The process of organizing the fields and tables of a relational database to minimize redundancy (duplication) and dependency.
Second Normal Form
All non-key attributes must be functionally dependent on the entire key of that table.
Third Normal Form
Non-key attributes are not allowed to define other non-key attributes.
What are three important points about Third Normal Form?
It is completely free of redundancy
All foreign keys appear where needed to logically tie together related tables.
It is the same structure that would have been derived from a properly drawn entity-relationship diagram of the same business environment.
Write the SQL query to "Add a new salesperson into the SALESPERSON table whose salesperson number is 489, name is Quinlan, commission percentage is 15, year of hire is 2011, and department number is 59.":
INSERT INTO SALESPERSON VALUES ('489','Quinlan',15,'2011','59');
*Hint, this is DML, so remember that INSERT is one of the keywords for DML.
Write the SQL query to "Delete the row for salesperson 186 from the SALESPERSON table.":
DELETE FROM SALESPERSON WHERE SPNUM = '186';
What is the correct syntax of the INSERT command?
INSERT INTO table_name VALUES (value1,value2,value3,...):
What is the correct syntax of the CREATE VIEW command?
CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
What is called a decomposition process?
Data normalization
In which of the normal forms should every non-key attribute be fully functionally dependent on the entire key of a table?
Second form
What is the correct syntax of the CREATE TABLE command?
CREATE TABLE table_name ( column_name data_type(size), );
What is the correct syntax of the UPDATE command?
UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value;
Association Rules
Association rules specify a relation between attributes that appears more frequently than expected if the attributes were independent.
Business Intelligence
The processes, technologies, and tools needed to turn data into information, information into knowledge, and knowledge into plans that drive profitable business action.
Classification
Classification involves examining the attributes of a particular object and assigning it to a defined class.
Clustering
Clustering is the task of taking a large collection of objects and dividing them into smaller groups of objects that exhibit some similarity.
Affinity Grouping
Affinity Grouping is a process of evaluating relationships or associations between data elements that demonstrate some kind of affinity between objects.
What are the values of Business Intelligence?
Financial value associated w/ increased profitability.
Productivity value associated with increased throughput.
Trust value (customer, employee, supplier satisfaction) as well as increased confidence in forecasting.
Risk value - decreased risk associated with decision making
What are the reasons for using the Dimensional Model for Business Intelligence?
Simplicity.
Lack of bias.
Extensibility.
What are the fundamental aspects of a Data Warehouse?
Centralized repository of information.
Organized around relevant subject areas.
Provides platform for queries.
Used for analysis and not transactional processing.
Data is nonvolatile.
Target location for integrating data from multiple sources.
What is the general theme of the ETL process?
Get the data
Map the data to staging area
Validate and clean the data
Apply necessary transformations
Map data to loading model
Move data to repository
Load data to warehouse
What is the key factor based on the need for linear scalability?
Performance
What is used for populating summaries or any cube dimensions that can be performed at the staging area (ETL)?
Aggregation
What data mining activity is a process of assigning some continuously valued numeric value to an object?
Estimation