Acknowledgement of Country: UNSW Business School acknowledges the Bidjigal and Gadigal people, the traditional custodians of the lands where each campus is located.
Acknowledges all Aboriginal and Torres Strait Islander Elders, past and present, and their communities.
Recognizes their ongoing leadership and contributions to business, education, and industry.
Completed Database Design Process.
Key aspects:
Data vs. Information.
Data stored in databases.
Database management system (DBMS).
Database design defines database structure.
Entity-Relationship Modeling technique.
Chen’s notation for high-level conceptual models; Crow’s Foot as the design standard.
Entity type and instance; attribute and value.
Relationship (Degree, Connectivity, Cardinality).
Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation.
Advanced topics:
Relationship strength.
Composite entity.
Relationship degree.
Supertype and Subtype.
Selecting Primary Key.
Convert ER model to a set of tables (relations) in the relational model.
Apply Normalization on the relations to remove any anomalies.
Use Relational Model to implement the database by creating a table for each normalized relation.
Data Definition Language defines the tables.
Data Manipulation Language queries/updates the tables.
Chapter 14. Big Data and NoSQL.
Chapter 16 Database Administration and Security (sections 16-1 to 16-6).
What is Privacy?
What does good data governance look like?
How can we recognize the ethical issues surrounding data collection, storage, and use?
How can we resolve the identified ethical issues using an ethical framework?
Data is not merely a commodity; it is linked to civic rights, personal autonomy, and dignity.
The rights of individuals and organizations to determine access to data about themselves.
Ensuring data confidentiality, security, and protection of personal data and information.
Global privacy policies:
European Union: General Data Protection Regulation (GDPR).
Australia: Privacy Act 1988 (updated regularly).
Australian organizations’ privacy policies are on their websites.
A framework outlining roles, responsibilities, processes, and policies for managing and governing data.
Defines how data will be created, collected, stored, used, shared, protected, and disposed of.
Establishes guidelines and standards for data quality, accuracy, and compliance.
Business Value:
Improved decision-making based on higher quality data.
Increased public trust through improved data management and transparency.
Increased competitiveness through improved customer satisfaction.
Risk Mitigation:
Reduction of risk and costs through better data management for regulatory compliance.
More robust consideration of ethical and privacy issues.
Efficiency:
Increased data sharing through improved trust and standardization.
Reduction in costs by improving resource and process efficiencies.
Reduction in duplication and waste created by information silos.
Reduction in time spent by employees in finding, acquiring, and processing data.
Ethics: moral principles that control or influence a person’s behavior.
Data privacy focuses on protecting personal private data and information.
Data ethics is relevant to all data use, regardless of privacy protection.
Principles of Data Ethics:
Ownership
Transparency
Privacy
Intention
Outcomes
Individual factors influence the recognition of ethical issues, ethical judgment, and ethical intent.
Situational factors also play a role in ethical behavior.
A person might believe they have done nothing wrong, but others may view it differently.
Example: Emotional contagion study by Cornell University and Facebook.
The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed.
Posts rated as containing positive or negative content were respectively withheld.
N = 689,003
Experimental evidence of massive-scale emotional contagion through social networks (Kramer et al.).
Systemic:
Human Rights
Social and Environmental Sustainability
Professional ethics
Organizational:
Corporate Responsibility
Codes of Conduct
Risk Management and Trust
Individual:
Personal values
Virtuous conduct
Assess:
Framing: Identify ethical issues, assumptions, biases, and stakeholders.
Values:
Personal and professional values and responsibilities.
Frameworks:
Relevant ethical frameworks (e.g., deontology, consequentialism).
Deontological: Is it the right thing to do? Is it violating anybody’s rights?
Consequential: Is anyone harmed - who and how?
Care ethics: Will it create good relationships?
Virtue ethics: Do I feel good about this action?
Assess the situation: What are the facts?
Assumptions & worldviews: How might they be challenged?
Principles, Duties and Care needs
Options, Outcomes and Consequences
Character factors? What virtues are relevant?
Comprehensive Assessment: What would be an ethical decision?
Justify your Decision: How to explain to those favoring a less ethical pathway?
Grant: give user access privileges to a database.
Revoke: take back permissions from the user.
Deny: explicitly prevents a user from receiving a particular permission (not implemented in Oracle SQL).
Principle of information security to minimize the risk of unauthorized disclosure.
Access to sensitive information is restricted to only those requiring it to perform specific tasks.
Example: patient medical records available only to doctors and nurses directly involved in their care.
The grant statement is used to give user access privileges to a database
grant <privilege list> on <table name> to <user list> [WITH GRANT OPTION]
<user list>
:
a user-id
public
: the privilege is granted to all valid users in the database
[WITH GRANT OPTION]
:
allows a user who is granted a privilege to pass the privilege on to other users
The grantor of the privilege must already hold the privilege on the specified table, i.e., database administrator
GRANT UPDATE on emp_view to liud22;
GRANT SELECT on emp_view to public;
Select: allows read access to a relation (table).
Insert: allows insert access to a relation.
Update: can specify update on a column using update(column-name).
Delete: allows delete data from a relation.
All: short form for all allowable privileges.
Execute: only for procedure or functions (this is for PL/SQL - Procedural Language for SQL).
Used to revoke authorization, i.e., take back permissions from the user
REVOKE <privilege list> on <table name> from <user list>
Example: REVOKE SELECT on emp from U1, U2, U3;
Large and complex sets of raw data (difficult or impossible to capture in ER models).
A set of data analysis and predictive analytics techniques, e.g., data mining techniques on raw data (instead of organizing data upfront into neat structures) to make sense of the data.
Much larger set of data sources (e.g., Internet search/browsing, mobile devices).
Much cheaper costs to store data (e.g., costs of hard disc drives reduced substantially).
Growing interest in identifying patterns for business purposes (in all kinds of data).
Scaling out instead of scaling up.
Relational Model (structure/schema on write):
data's structure (tables, columns, data types, relationships) is defined before the data is written
Big Data Model (structure/schema on read):
data's structure is not necessarily defined when data is written.
Instead, the structure is applied when the data is read or queried.
Volume: Quantity of data to be stored (e.g., 100 GB to 100 TB).
Scaling up vs. scaling out.
Velocity: Speed at which data is entering the system.
Stream processing focuses on input processing and requires analysis of data stream as it enters the system (real-time).
Feedback loop processing refers to the analysis of data to produce actionable results.
Variety: Variations in the structure of the data to be stored.
Unstructured data: the wide variations does not fit into a predefined data model.
E.g., maps, images, emails, texts, tweets, videos.
Needs to quickly analyze massive data for instant results
Any data types that clearly defined be stored, accessed and processed in a fixed format can be defined a structured data.
A good example is data stored in a table in a normalised database. You can easily search and retrieve the data from a table using SQL tools.
Simply described as not structured data, i.e., anything that cannot be described as structured data.
Examples include free text, videos, images
The ability to analyse social media such as Facebook, Twitter, and WeChat, and images are among the key drivers behind the growth of Big Data.
E.g., chat messages during a live YouTube stream
Semi-Structured data is in between Structured Data and Unstructured Data
E.g., Markup Language XML, Electronic Data Interchange (EDI), and Open Standard JSON (JavaScript Object Notation).
XML document with varying tags inside each conversation, , , , or any other tags. It is not entirely unstructured but also not strictly relational or rigidly defined
Big Data has been redefined as involving any of 3Vs (but not necessary all)
Current View of Big Data covers all portions of the three overlapping circles
The impact of Big Data could be described the next major revolution since the Agricultural Revolution and Industrial Revolution. We can call it Digital Revolution or Big Data Revolution.
Today, we have already seen large corporations, particularly the large Chinese companies, use Big Data, Artificial Intelligence, and Machine Learning extensively to drive their business strategies to gain competitiveness.
This award-winning documentary was created to explain how Big Data has evolved the way we work, shop, socialize, live, and benefit from Big Data as well as the rise of negative issues associated with Big Data. Big Data is collected, stored, and used across a wide range of products and services.
Digitizing Ourselves (17:36 to 23:55)
Collection of personal data through devices like Apple Watch, Samsung Watch, and Fitbit
Pattern recognition algorithms are changing our society
Personal data can influence behaviour, e.g., tracking calories burned
Building a Global Brain (23:55 to 25:55) / Creating Intelligence System (25:55 to 28:50)
Data collected from various sources becomes part of a "Big Data" system
Data can be used to create a more proactive and responsive city, e.g., bus rerouting based on demand
City can be viewed as an intelligent entity responding to people’s needs
Targeting You (38:23 to 41:05)
Target can identify pregnant women as an example of Big Data's use in marketing
This practice is common in hotels, airlines, and gambling, where loyalty programs are used
Loyalty programs, originally for customer relationships, are now used to create customer profiles using Big Data
Search engines like Google generate revenue through advertising based on search data, with companies advertising online to reach customers who search for related terms
Data collectors holding our private data might not fully reveal the intention
One of the criticisms on Facebook is they have been collecting data without fully reveal their intention, and how they would use your data once they collected. They can build a profile of you as an individual.
However, National Security Agency (NSA) has been collecting data for many years.
Shift to First-Party Data
Requires stronger direct relationships with customers and offering value in exchange for their data.
This can limit the scope of big data to only the data that the company can collect
Reduced Granularity of Tracking
Third-party cookies’ decline means a loss of granular, cross-site data, which is crucial for building detailed user profiles.
Much harder now to precisely target ads and measure campaign effectiveness.
Contextual Advertising
Placing ads based on the content of the website or app being viewed, rather than the user's browsing history
Content-related data include keywords, topics and page categories
Changes in Data Analysis
New technologies are required to analyse less granular data
Growth in machine learning, and AI, to find patterns in less defined data
Data Governance and Ethics
Database Access Control
Big Data basics
Next lecture Recap
Big Data Advanced