S

Database Development Vocabulary

Week 10 Lecture: Database Development & Course Review

Online Class Instructions:

  • Mute microphone to avoid disruption.
  • Use the chat for questions or raise a virtual hand.
  • Turn off video if internet is poor.
  • Wait for the lecturer to start.

In-Person Class Instructions:

  • If you have any symptoms, even very mild, you MUST NOT stay in class. Go home, self-isolate and seek medical advice immediately.

Acknowledgement of Country

  • The UNSW Business School acknowledges the Bidjigal (Kensington campus) and Gadigal (City campus) as the traditional custodians of the lands where each campus is located.
  • Acknowledgement of Aboriginal and Torres Strait Islander Elders, past and present, and their communities.
  • Recognition of their ongoing leadership and contributions to business, education, and industry.

Lecture Content

  • Chapter 9: Database Design (9-1 to 9-3)
  • Chapter 13: Business Intelligence and Data Warehouses (13-1 to 13-2)
  • Chapter 16: Database Administration and Security (16-4 to 16-5)

Overview

  • Database development
    • Information systems development overview
    • Database development lifecycle (DDLC)
  • Business Intelligence vs Business Analytics
  • Professional roles
    • DBA vs. DA (Data Administrator)
    • DBA tasks and responsibilities
  • Practice questions

Building Information Systems

  • Important issues when building IS
    • The system must solve the right problem
    • The system must be built in the most effective way
    • The system must fit into the existing environment
    • The system must be easy to use (e.g., smartphone)
  • Database Development takes place within the context of an information systems development process
  • Key terminologies
    • Information Systems (IS): capture, transmit, store, retrieve, manipulate, or display information used in one or more business processes
    • System analysis: process of establishing the need for and the scope of an information system
    • Systems development: process of creating an information system is known as
  • Building IS involves below
    • Database development (back-end)
    • Application development (front-end)
    • Administrative and organizational procedures

Database Development Lifecycle (DDLC)

  • Framework for database development to ensure structured approach
  • Contains 6 stages in a sequence
  • Iterative process: each stage could go back to a previous stage if necessary
  • Stages:
    • Database Initial Study
    • Database Design
    • Implementation and Loading
    • Testing and Evaluation
    • Operation
    • Maintenance and Evolution

Phase 1: Database Initial Study

  • Analysing the organization
    • Objectives, Operations, Structures
  • Defining problems and constraints
    • Function of existing systems
    • Input of existing systems
    • Output of existing systems
  • Defining objectives
    • Initial objectives
    • Data sharing and interfaces with other systems
  • Define scope and boundaries
    • DB design for which part of the organization?
    • What hardware will be used?

Phase 2: Database Design (1/2)

  • Data requirement analysis
    • What data is needed?
    • Where does it come from?
    • What are the end-users’ desired views of the data?
    • What is the business logic?
  • ER modelling and normalisation
    • What are the entities, attributes, and relationships for ER diagram?
    • What are the keys?
    • Do we need to normalise?
  • ER Model verification
  • Iterative ER Modelling Process
    • Data requirement analysis
    • Initial ERM
    • Normalisation
    • Attributes Verification
    • Final ER Model

Phase 2: Database Design (2/2)

  • Select resources for Implementation
    • DBMS (software), e.g., Oracle SQL, MySQL, NoSQL
    • Server (hardware)
  • Logical Design
    • Translating the ERD into a model (e.g., relational, document-oriented, graph) that can be implemented in the selected DBMS
  • Physical Design
    • Estimate the data volume and prepare storage space
    • Identify indexes on selected columns in each table, to allow quick searching of data
  • Note: Logical and physical design can be carried out in parallel activities but requires high level of understanding of software and hardware.

Phase 3: Implementation and Loading

  • Core tasks:
    • Creating DB
    • Assigning permission to a Database Administrator (DBA)
    • Creating tables within DB
    • Assigning permission to users
  • Further areas needing attention:
    • Performance
    • Database Security
    • Data privacy
    • Backup and Recovery
    • Data Integrity
    • Company Standards
    • Concurrency Control
  • Database security contains two aspects
    • Authentication: user is authenticated to use the system
    • Authorization: user is allowed to do specific activities on the system
    • Related topics: physical security, password security, access rights, audit trails, data encryption, diskless workstations
  • Data privacy (or information privacy) require us to
    • Conduct cautious collection/dissemination of personally identifiable information (PII), e.g., financial transactions
    • Adhere to public expectation of privacy
    • Address legal/political issues

Phase 4/5/6

  • Phase 4: Testing and Evaluation
    • Testing performance / performance fine-tuning
    • Testing security constraints
    • Testing integrity
    • Testing concurrent access
  • Phase 5: Operation
    • DB (and application) completed and going into production
    • Users and applications start to insert, receive, update and delete data…
    • DBA starts to fine-tune performance, allocate storage space, control access, backup data…
  • Phase 6: Maintenance and Evaluation
    • Preventive maintenance, e.g., backup
    • Corrective maintenance, e.g., recovery
    • Adaptive maintenance (enhancing performance, adding entities, adding attributes, etc.)
    • Access control, statistics, auditing, periodic system-usage summaries

Business Intelligence (BI) and Business Analytics (BA)

  • Good DB development enables Business Intelligence (BI) and Business Analytics (BA)
  • Check out these courses if you are keen in Analytics
    • INFS3873: Predictive Analytics using Machine Learning (mainly Python)
    • INFS3822: Use of Deep Learning in business context (mainly no or low code tools, e.g., SAS)
  • Business Intelligence
    • Comprehensive, cohesive, integrated set of tools and processes
    • Captures, collects, integrates, stores, and analyzes data
    • Generates and presents information to support business decision making
    • Allows a business to transform

AI, Machine Learning, and Deep Learning

  • Artificial Narrow Intelligence
    • A technique which enables machines to replicate human’s intelligent behaviours
  • Machine Learning
    • Subset of AI technique to enable machines to learn from data with statistical methods
    • Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.
  • Deep Learning
    • Subset of ML with multi-layer neural network
  • ANI (or weak AI) is limited to one or a few specific tasks and can do it better than humans, such as estimating a numeric variable
  • Deep Learning (Deep neural network) is helping us getting closer to AGI (or strong AI), which can learn to perform any task humans do and allows machine to have full human cognitive abilities. AGI is not attained yet but predicted to be possible within our lifetime.

Business Intelligence Tools

  • Data visualization
    • Enhances data understanding through advanced visual techniques (e.g., charts, graphs).
    • Sample Vendors: Tableau, MS PowerBI, QlikView
  • Online Analytical Processing (OLAP) tools
    • Enables multidimensional data analysis (e.g., drill-down, roll-up) since data is organized in cubes (instead of tables) with dimensions like time, geography, product, etc.
    • Sample Vendors: IBM/Cognos, BusinessObjects, Oracle, Microsoft
  • Portals
    • Provides a unified, web-based entry point to integrate and distribute data from multiple sources into a single webpage.
    • Sample Vendors: Oracle Portal, Actuate, Microsoft, SAP
  • Data analysis and reporting tools
    • Queries diverse data sources to generate integrated reports.
    • Sample Vendors: Microsoft Reporting Services, MicroStrategy, SAS WebReportStudio
  • Data-mining tools
    • Applies advanced statistical analysis to uncover hidden patterns or opportunities in business data.
    • Sample Vendors: SAP, Teradata, MicroStrategy, MS Analytics Services
  • Data warehouses (DW) serves as the foundation of BI infrastructure, integrating production system data in near real-time into the tools above, e.g., Microsoft, Oracle, IBM/Cognos, Teradata

Professional Roles: DBA vs. DA

  • DATA ADMINISTRATOR (DA)
    • Strategic planning
    • Control and supervision
    • Sets long-term goals
    • Sets policies and standards
    • Broad scope
    • Long term
    • Managerial orientation
    • DBMS-independent
  • DATABASE ADMINISTRATOR (DBA)
    • Executes plans to reach goals
    • Enforces policies, procedures, programming standards
    • Narrow scope
    • Short term (focus on daily operations)
    • Technical orientation
    • DBMS-specific

DBA's Tasks and Responsibilities

  • DBA conducts a list of tasks daily…
    • Installing and upgrading the database server
    • Allocating system storage and planning future storage
    • Modifying the database structure
    • Enrolling users and maintaining system security
    • Ensuring compliance with database vendor license agreement
    • Controlling and monitoring user access
    • Monitoring and optimizing the performance
    • Planning for backup and recovery
    • Maintaining archived data
    • Backing up and restoring databases
    • Contacting database vendor (e.g., for technical support)
    • Generating various reports
  • … while maintaining a high level of ethics is a must
    • Responsibilities to Company:
      • Follow internal standards and regulations.
      • Inform openly about issues, provide complete information, do not create knowledge silo.
      • Ensure up-to-date security, have recovery plan in place.
    • Responsibilities to Externals:
      • Follow external regulations.
      • Protect externals from inappropriate data use.
      • Ensure privacy through authorization and security.
    • Responsibilities to Co-Workers:
      • Be honest and open with co-workers.
      • Protect co-workers from inappropriate data use.
      • Share, teach and help grow the collective knowledge base
    • Responsibilities to Oneself:
      • Stay up to date on industry and technology.
      • Stay up to date on regulations.
      • Learn new techniques, new tools and best practices.

Practice Questions

Question 1

What is a 'foreign key'?
A) A key that uniquely identifies a row in its own table.
B) A key that uniquely identifies a row in another table.
C) A key that is not part of any table.
D) A key that is used for encryption.

Correct Answer: B

Question 2

Which of the following is a characteristic of a NoSQL database?
A) It uses a fixed, predefined schema.
B) It is designed primarily for structured data.
C) It can handle large volumes of unstructured data.
D) It uses SQL as its primary query language.

Correct Answer: C

Question 3

Which of the following statements is FALSE about the diagram?
A) A course is taught by 1 and exactly 1 lecturer.
B) A lecturer teaches up to 3 courses.
C) Entity ‘Course’ is mandatory in this relationship.
D) The cardinality is (1,1) for Lecturer and (0,3) for Course.

Correct Answer: C

Question 4

What does the following Oracle SQL query return?

SELECT AVG(SALARY) FROM EMPLOYEES;

A) Sum of employee salaries
B) Number of employees
C) Average employee salary
D) All employee details

Correct Answer: C

Question 5

We have an incomplete ERD for PERSON and MARRIAGE entities. Which of the following statements is FALSE?
A) A person must be married to somebody.
B) An instance of entity MARRIAGE links two instances of entity PERSON.
C) The ID in PERSON is a primary key
D) SPOUSE_ID in MARRIAGE is a foreign key.

Correct Answer: A

Question 6

What does this SQL query do? Employee table has employeeid as the PK and another column departmentid.

SELECT department_id,
 COUNT(*) as employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;

A) Lists all employees in each department.
B) Returns department id with more than 5 employees, along with their employee count.
C) Updates the employee count for departments exceeding 5 employees.
D) Deletes departments with 5 or fewer employees.

Correct Answer: B

Question 7

Which join returns only rows that have matching values in the common column of both tables?
A) LEFT JOIN
B) RIGHT JOIN
C) INNER JOIN
D) FULL OUTER JOIN

Correct Answer: C

Question 8

Given a table named TRANSACTIONS with columns TRANSACTIONID, CARDNUMBER, AMOUNT, and TRANSACTIONDATE (of DATE type), which query retrieves all transactions made in the last 1 month? A) SELECT * FROM transactions WHERE transactiondate > ADDMONTHS(SYSDATE, -1); B) SELECT * FROM transactions WHERE transactiondate < ADDMONTHS(SYSDATE, -1); C) SELECT * FROM transactions WHERE ADDMONTHS(transactiondate, 1) < SYSDATE; D) SELECT * FROM transactions WHERE transactiondate BETWEEN SYSDATE AND ADD_MONTHS(SYSDATE, 1);

Correct Answer: A

Question 9

Which normal form removes transitive dependencies?
A) 1NF
B) 2NF
C) 3NF
D) BCNF

Correct Answer: C

Question 10

What is the primary role of a Data Warehouse (DW) in Business Intelligence (BI)?
A) To delete outdated business data automatically every 24 hours.
B) To serve as the foundational infrastructure and enable BI tools.
C) To replace all relational databases within an organization.
D) To create visual dashboards without requiring data consolidation.

Correct Answer: B

Question 11

You have two tables, REVENUE and COST, both containing a department_id column. Table REVENUE has a column revenue, while table COST has a column cost. Some departments might not have a row in REVENUE table, because they did not generate revenue. Some departments might not have a row in COST table, because the cost is unknown. What does this SQL query do?

SELECT r.department_id, r.revenue, c.cost, (r.revenue - c.cost) AS profit
FROM REVENUE r
INNER JOIN COST c
ON r.department_id = c.department_id;

A) Lists all departments with their revenue and cost, including those departments with revenue but no cost, or vice versa.
B) Calculates the profit (revenue minus cost) for each department that has both revenue and cost numbers in the two tables
C) Updates the profit column in the REVENUE table by subtracting cost from revenue.
D) Deletes departments where costs exceed revenue.

Correct Answer: B

Question 12

Practice questions: uncorrelated subquery
Table Professors contains: profid (INT, PK), profname (VARCHAR2), age (INT), department (VARCHAR2), hire_date (DATE)
Consider the following SQL query to find the oldest professor in the entire university and the information of this person. Explain the error and how to fix the SQL code.

SELECT prof_name, department, hire_date
FROM Professors
WHERE age = MAX(age);

Answer: The code uses MAX() directly in the WHERE clause, which is invalid. Aggregate functions like MAX() must be in a subquery, or HAVING, or as part of SELECT (when there is ‘group by’). We put MAX(age) in a subquery, (SELECT MAX(age) FROM Professors). This subquery would execute first and give one number, which is the max age among all professors. Then the outer query will run with “WHERE age = one single value”

SELECT prof_name, department, hire_date
FROM Professors
WHERE age = (SELECT MAX(age) FROM Professors);

Question 13

Practice questions: correlated subquery
The Professors table contains: profid (INT, PK), profname (VARCHAR2), age (INT), department (VARCHAR2), hire_date (DATE)
Consider the following Oracle SQL query intended to list professors older than the average age in their department. Explain the error and how to fix the SQL code.

SELECT prof_name, department
FROM professors
WHERE age > (SELECT AVG(age) FROM professors WHERE department = department);

Answer: The subquery WHERE department = department will always compare a department to itself (always true). This causes the subquery to calculate the overall average age across ALL departments, not per-department averages. The query needs correlation between the outer and inner queries.

SELECT p1.prof_name, p1.department
FROM professors p1
WHERE p1.age > (SELECT AVG(p2.age) FROM professors p2 WHERE p2.department = p1.department);

Question 14

How is a correlated subquery executed?

SELECT p1.prof_name, p1.department
FROM professors p1
WHERE p1.age > (SELECT AVG(p2.age) FROM professors p2 WHERE p2.department = p1.department);
  • The database engine starts by reading the first row from table professors (aliased as p1). Assuming the department is “Engineering” for the first professor, the subquery below is executed, to calculate the average age for only “Engineering" professors
SELECT AVG(p2.age) FROM professors p2 WHERE p2.department = 'Engineering’
  • Assume the average is 40, the below condition is executed to compares p1.age with the department’s average
WHERE p1.age > 40
  • If this comparison is true, the first professor is included in the query result. Otherwise we discard the row.
  • Continues until all rows in table professors are processed in the above way
  • Don’t worry. This is more efficient than it appears because modern DBs save the subquery results in temporary cache for repeated use, to improve efficiency.

Recap

  • Database development
    • Information systems development overview
    • Database development lifecycle (DDLC)
  • Business Intelligence vs Business Analytics
  • Professional roles
    • DBA vs. DA (Data Administrator)
    • DBA tasks and responsibilities
  • Practice questions
  • Next lecture: Recap! The End!

Contact

  • Lecturer: Bin Huang, bin.huang@unsw.edu.au
  • Lecture recording will be made available in Moodle course site.