S

Week 8 Lecture Vocabulary

Ethics, Database Access Control & Big Data Basics

Introduction

  • Acknowledgement of Country: UNSW Business School acknowledges the Bidjigal and Gadigal people, the traditional custodians of the lands where each campus is located.

  • Acknowledges all Aboriginal and Torres Strait Islander Elders, past and present, and their communities.

  • Recognizes their ongoing leadership and contributions to business, education, and industry.

Review of Database Concepts

  • Completed Database Design Process.

  • Key aspects:

    • Data vs. Information.

    • Data stored in databases.

    • Database management system (DBMS).

    • Database design defines database structure.

Conceptual Model

  • Entity-Relationship Modeling technique.

  • Chen’s notation for high-level conceptual models; Crow’s Foot as the design standard.

  • Entity type and instance; attribute and value.

  • Relationship (Degree, Connectivity, Cardinality).

Logical Model

  • Converting Conceptual model to detailed Logical Model (using Crow’s Foot) ready for DB implementation.

  • Advanced topics:

    • Relationship strength.

    • Composite entity.

    • Relationship degree.

    • Supertype and Subtype.

    • Selecting Primary Key.

Relational Model

  • Convert ER model to a set of tables (relations) in the relational model.

  • Apply Normalization on the relations to remove any anomalies.

SQL Data Definition & Manipulation Language

  • Use Relational Model to implement the database by creating a table for each normalized relation.

  • Data Definition Language defines the tables.

  • Data Manipulation Language queries/updates the tables.

Readings

  • Chapter 14. Big Data and NoSQL.

  • Chapter 16 Database Administration and Security (sections 16-1 to 16-6).

Overview of Ethics and Data Governance

  • What is Privacy?

  • What does good data governance look like?

  • How can we recognize the ethical issues surrounding data collection, storage, and use?

  • How can we resolve the identified ethical issues using an ethical framework?

Data as Civic Rights

  • Data is not merely a commodity; it is linked to civic rights, personal autonomy, and dignity.

Data Privacy

  • The rights of individuals and organizations to determine access to data about themselves.

  • Ensuring data confidentiality, security, and protection of personal data and information.

  • Global privacy policies:

    • European Union: General Data Protection Regulation (GDPR).

    • Australia: Privacy Act 1988 (updated regularly).

    • Australian organizations’ privacy policies are on their websites.

Data Governance Model

  • A framework outlining roles, responsibilities, processes, and policies for managing and governing data.

  • Defines how data will be created, collected, stored, used, shared, protected, and disposed of.

  • Establishes guidelines and standards for data quality, accuracy, and compliance.

Benefits of Good Data Governance

  • Business Value:

    • Improved decision-making based on higher quality data.

    • Increased public trust through improved data management and transparency.

    • Increased competitiveness through improved customer satisfaction.

  • Risk Mitigation:

    • Reduction of risk and costs through better data management for regulatory compliance.

    • More robust consideration of ethical and privacy issues.

  • Efficiency:

    • Increased data sharing through improved trust and standardization.

    • Reduction in costs by improving resource and process efficiencies.

    • Reduction in duplication and waste created by information silos.

    • Reduction in time spent by employees in finding, acquiring, and processing data.

Ethics and Data Ethics

  • Ethics: moral principles that control or influence a person’s behavior.

  • Data privacy focuses on protecting personal private data and information.

  • Data ethics is relevant to all data use, regardless of privacy protection.

  • Principles of Data Ethics:

    • Ownership

    • Transparency

    • Privacy

    • Intention

    • Outcomes

Situational Factors Influencing Ethical Judgments

  • Individual factors influence the recognition of ethical issues, ethical judgment, and ethical intent.

  • Situational factors also play a role in ethical behavior.

Ethics Application Process

  • A person might believe they have done nothing wrong, but others may view it differently.

  • Example: Emotional contagion study by Cornell University and Facebook.

    • The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News Feed.

    • Posts rated as containing positive or negative content were respectively withheld.

  • N = 689,003

  • Experimental evidence of massive-scale emotional contagion through social networks (Kramer et al.).

Levels of Ethics

  • Systemic:

    • Human Rights

    • Social and Environmental Sustainability

    • Professional ethics

  • Organizational:

    • Corporate Responsibility

    • Codes of Conduct

    • Risk Management and Trust

  • Individual:

    • Personal values

    • Virtuous conduct

Professional Ethics in a Business Context

  • Assess:

    • Framing: Identify ethical issues, assumptions, biases, and stakeholders.

  • Values:

    • Personal and professional values and responsibilities.

  • Frameworks:

    • Relevant ethical frameworks (e.g., deontology, consequentialism).

Ethical Prompt Questions

  • Deontological: Is it the right thing to do? Is it violating anybody’s rights?

  • Consequential: Is anyone harmed - who and how?

  • Care ethics: Will it create good relationships?

  • Virtue ethics: Do I feel good about this action?

Ethical Decision-Making: the 7 Step Process

  1. Assess the situation: What are the facts?

  2. Assumptions & worldviews: How might they be challenged?

  3. Principles, Duties and Care needs

  4. Options, Outcomes and Consequences

  5. Character factors? What virtues are relevant?

  6. Comprehensive Assessment: What would be an ethical decision?

  7. Justify your Decision: How to explain to those favoring a less ethical pathway?

Data Control Language (DCL)

  • Grant: give user access privileges to a database.

  • Revoke: take back permissions from the user.

  • Deny: explicitly prevents a user from receiving a particular permission (not implemented in Oracle SQL).

Need-to-Know Basis

  • Principle of information security to minimize the risk of unauthorized disclosure.

  • Access to sensitive information is restricted to only those requiring it to perform specific tasks.

  • Example: patient medical records available only to doctors and nurses directly involved in their care.

DCL Commands: Grant

  • The grant statement is used to give user access privileges to a database

  • grant <privilege list> on <table name> to <user list> [WITH GRANT OPTION]

  • <user list>:

    • a user-id

    • public: the privilege is granted to all valid users in the database

  • [WITH GRANT OPTION]:

    • allows a user who is granted a privilege to pass the privilege on to other users

  • The grantor of the privilege must already hold the privilege on the specified table, i.e., database administrator

  • GRANT UPDATE on emp_view to liud22;

  • GRANT SELECT on emp_view to public;

Common Privileges in DCL

  • Select: allows read access to a relation (table).

  • Insert: allows insert access to a relation.

  • Update: can specify update on a column using update(column-name).

  • Delete: allows delete data from a relation.

  • All: short form for all allowable privileges.

  • Execute: only for procedure or functions (this is for PL/SQL - Procedural Language for SQL).

DCL Commands: Revoke

  • Used to revoke authorization, i.e., take back permissions from the user

  • REVOKE <privilege list> on <table name> from <user list>

  • Example: REVOKE SELECT on emp from U1, U2, U3;

Big Data

  • Large and complex sets of raw data (difficult or impossible to capture in ER models).

  • A set of data analysis and predictive analytics techniques, e.g., data mining techniques on raw data (instead of organizing data upfront into neat structures) to make sense of the data.

  • Much larger set of data sources (e.g., Internet search/browsing, mobile devices).

  • Much cheaper costs to store data (e.g., costs of hard disc drives reduced substantially).

  • Growing interest in identifying patterns for business purposes (in all kinds of data).

  • Scaling out instead of scaling up.

Relational Model vs. Big Data Model

  • Relational Model (structure/schema on write):

    • data's structure (tables, columns, data types, relationships) is defined before the data is written

  • Big Data Model (structure/schema on read):

    • data's structure is not necessarily defined when data is written.

    • Instead, the structure is applied when the data is read or queried.

3Vs of Big Data

  • Volume: Quantity of data to be stored (e.g., 100 GB to 100 TB).

    • Scaling up vs. scaling out.

  • Velocity: Speed at which data is entering the system.

    • Stream processing focuses on input processing and requires analysis of data stream as it enters the system (real-time).

    • Feedback loop processing refers to the analysis of data to produce actionable results.

  • Variety: Variations in the structure of the data to be stored.

    • Unstructured data: the wide variations does not fit into a predefined data model.

    • E.g., maps, images, emails, texts, tweets, videos.

Feedback Loop Processing

  • Needs to quickly analyze massive data for instant results

Structured Data

  • Any data types that clearly defined be stored, accessed and processed in a fixed format can be defined a structured data.

  • A good example is data stored in a table in a normalised database. You can easily search and retrieve the data from a table using SQL tools.

Unstructured Data

  • Simply described as not structured data, i.e., anything that cannot be described as structured data.

  • Examples include free text, videos, images

  • The ability to analyse social media such as Facebook, Twitter, and WeChat, and images are among the key drivers behind the growth of Big Data.

  • E.g., chat messages during a live YouTube stream

Semi-Structured Data

  • Semi-Structured data is in between Structured Data and Unstructured Data

  • E.g., Markup Language XML, Electronic Data Interchange (EDI), and Open Standard JSON (JavaScript Object Notation).

  • XML document with varying tags inside each conversation, , , , or any other tags. It is not entirely unstructured but also not strictly relational or rigidly defined

Current View of Big Data

  • Big Data has been redefined as involving any of 3Vs (but not necessary all)

  • Current View of Big Data covers all portions of the three overlapping circles

The Human Face of Big Data

  • The impact of Big Data could be described the next major revolution since the Agricultural Revolution and Industrial Revolution. We can call it Digital Revolution or Big Data Revolution.

  • Today, we have already seen large corporations, particularly the large Chinese companies, use Big Data, Artificial Intelligence, and Machine Learning extensively to drive their business strategies to gain competitiveness.

  • This award-winning documentary was created to explain how Big Data has evolved the way we work, shop, socialize, live, and benefit from Big Data as well as the rise of negative issues associated with Big Data. Big Data is collected, stored, and used across a wide range of products and services.

Summary of key points in the video

  • Digitizing Ourselves (17:36 to 23:55)

    • Collection of personal data through devices like Apple Watch, Samsung Watch, and Fitbit

    • Pattern recognition algorithms are changing our society

    • Personal data can influence behaviour, e.g., tracking calories burned

  • Building a Global Brain (23:55 to 25:55) / Creating Intelligence System (25:55 to 28:50)

    • Data collected from various sources becomes part of a "Big Data" system

    • Data can be used to create a more proactive and responsive city, e.g., bus rerouting based on demand

    • City can be viewed as an intelligent entity responding to people’s needs

  • Targeting You (38:23 to 41:05)

    • Target can identify pregnant women as an example of Big Data's use in marketing

    • This practice is common in hotels, airlines, and gambling, where loyalty programs are used

    • Loyalty programs, originally for customer relationships, are now used to create customer profiles using Big Data

    • Search engines like Google generate revenue through advertising based on search data, with companies advertising online to reach customers who search for related terms

The Dark Side

  • Data collectors holding our private data might not fully reveal the intention

  • One of the criticisms on Facebook is they have been collecting data without fully reveal their intention, and how they would use your data once they collected. They can build a profile of you as an individual.

  • However, National Security Agency (NSA) has been collecting data for many years.

The move away from third-party cookies

  • Shift to First-Party Data

    • Requires stronger direct relationships with customers and offering value in exchange for their data.

    • This can limit the scope of big data to only the data that the company can collect

  • Reduced Granularity of Tracking

    • Third-party cookies’ decline means a loss of granular, cross-site data, which is crucial for building detailed user profiles.

    • Much harder now to precisely target ads and measure campaign effectiveness.

  • Contextual Advertising

    • Placing ads based on the content of the website or app being viewed, rather than the user's browsing history

    • Content-related data include keywords, topics and page categories

  • Changes in Data Analysis

    • New technologies are required to analyse less granular data

    • Growth in machine learning, and AI, to find patterns in less defined data

Recap and next lecture

  • Data Governance and Ethics

  • Database Access Control

  • Big Data basics

  • Next lecture Recap

  • Big Data Advanced