M

Chapter Six: Data and Business Intelligence - Detailed Notes

Section 6.1: Data, Information, and Databases

Data Quality

Data is ubiquitous within an organization, and employees must be capable of accessing and analyzing it across various levels, formats, and granularities to inform decision-making. Effective data collection, compilation, sorting, and analysis provides invaluable insights into an organization's performance.

Levels, Formats, and Granularities of Data

  • Levels: Can be individual, departmental, or enterprise-wide.
  • Formats: Include documents, presentations, spreadsheets, and databases.
  • Granularities: Vary from detailed (fine) to summary to aggregate (coarse).

For example, reports can be generated for each salesperson, product, and part (detailed), for all sales personnel, products, and parts (summary), or across departments, organizations, and companies (aggregate).

Data Types: Transactional and Analytical

Data's value is determined by its type, timeliness, quality, and governance.

  • Transactional Data: Data within a single business process or unit of work, primarily used to support daily operational tasks.
  • Analytical Data: Encompasses all organizational data, used to support managerial analysis tasks.

Quantitative vs. Categorical Data

  • Quantitative Data: Includes integers, decimals, or floating-point numbers to represent quantities, measurements, or calculations. Examples are employee salaries, stock prices or customer ages.
  • Categorical Data: Textual data including alphanumeric characters, used to store textual information such as names, addresses, descriptions or comments.
  • Dates and Times: Databases store data related to dates and times, including timestamps, calendar dates durations or intervals.
  • Boolean Data: Represents binary values, typically "true" or "false," and is used for logical conditions or flags.
  • Images and Multimedia: Databases can also store binary data representing multimedia content, such as images, audio files, or videos.

Data Timeliness

Timeliness depends on the situation. Real-time data is immediate and up-to-date, provided by a real-time system in response to requests.

Data Quality

Business decisions are only as reliable as the quality of the data they're based on. It's detrimental to use technology to accelerate bad decisions. Data inconsistency occurs when the same data element has different values. Data integrity issues arise when a system produces incorrect, inconsistent, or duplicate data.

Characteristics of High-Quality Data

  • Accurate: Is the data correct? For example, is a name spelled correctly?
  • Complete: Is any value missing from the data? For example, does an address include street, city, state and zip code?
  • Consistent: Is aggregated data in agreement with detailed data?
  • Timely: Is the data current with respect to business needs? Is data updated weekly, daily, or hourly?
  • Unique: Is each transaction and event represented only once in the data? Are there any duplicate customers?

Examples of Low-Quality Data

Examples can include missing names, incomplete addresses, potential wrong data, probable duplicate data, inaccurate emails, and incomplete phone numbers.

Costs of Using Low-Quality Data

Low-quality data often results from customers entering inaccurate data to protect their privacy, differing entry standards and formats, operator errors, and third-party data inconsistencies.

Potential Business Effects of Low-Quality Data

These include the inability to accurately track customers, identify valuable customers or selling opportunities, marketing to nonexistent customers, difficulty tracking revenue, and the inability to build solid customer relationships.

Benefits of Good Data

High-quality data improves the chances of making a good decision, positively impacting an organization's bottom line. A data steward ensures data policies and procedures are implemented across the organization.

Increase Data Integrity (Quality)

Data integrity measures data quality. Integrity constraints are rules that ensure data quality, including relational and business-critical integrity constraints.

Section 6.2: Business Intelligence

Learning Outcomes

  • Identify the advantages of using business intelligence to support managerial decision making.
  • Describe the roles and purposes of data warehouses and data marts in an organization.
  • Explain blockchain and its advantages over a centralized relational database.

Business Intelligence

Organizational data is often difficult to access and includes both structured (databases) and unstructured data (voice mail, phone calls, text messages, and video clips).

Data Analysis Cycle

  • Ask the right questions.
  • Identify data sources.
  • Collect data.
  • Analyze data.
  • Visualize.
  • Craft your data story.
  • Communicate, influence, and persuade.

The Problem: Data Rich, Information Poor

Many organizations are data rich but information poor, struggling to turn business data into business intelligence.

The Solution: Data Aggregation

Improving the quality of business decisions directly impacts costs and revenue. Business Intelligence (BI) enables business users to receive reliable, consistent, understandable, and easily manipulated data for analysis.

BI Can Answer Tough Questions

For example:

  • Why are sales below target? Because we sold less in the Western region.
  • Why did we sell less in the West? Because sales of product X dropped.
  • Why did X sales drop? Because customer complaints increased.
  • Why did customer complaints increase? Because late deliveries went up 60 percent.

Data Mining Analysis Techniques

  • Collaborative filtering: Used in recommendation systems to provide personalized recommendations based on user similarity.
  • Recommendation engine: Analyzes customer purchases and website actions to recommend complementary products.

Further Techniques Include

  • Estimation Analysis: Determines values for an unknown continuous variable behavior or estimated future value.
  • Affinity Grouping Analysis: Reveals the relationship between variables along with the nature and frequency of the relationships.
  • Cluster Analysis: Divides an information set into mutually exclusive groups where members of each group are as close as possible to one another and the different groups are as far apart as possible.
  • Classification Analysis: Organizes data into categories or groups for its most effective and efficient use.

Data Warehouse

Data warehouses extend the transformation of data into information. In the 1990s, executives focused on overall business functions rather than day-to-day operations. The data warehouse provided decision-making support without disrupting daily operations.

Definition

A logical collection of data, gathered from many different operational databases, that supports business analysis activities and decision-making tasks. Its primary purpose is to aggregate data throughout an organization into a single repository for decision-making purposes.

Reasons Business Analysis Is Difficult from Operational Systems

  • Inconsistent Data Definitions: Departments had their own method for recording data, leading to mismatches when sharing information.
  • Lack of Data Standards: Cross-functional analysis was difficult due to differences in granularities, formats, and levels.
  • Poor Data Quality: Data was often incorrect or incomplete, making it unreliable for decision-making.
  • Inadequate Data Usefulness: Collected data was not always useful for intended purposes.
  • Ineffective Direct Data Access: Users had to wait for MIS professionals to code SQL queries.

Data Aggregation

Collection of data from various sources for the purpose of data processing.

Extraction, Transformation, and Loading (ETL)

A process that extracts data from internal and external databases, transforms the data using a common set of enterprise definitions, and loads the data into a data warehouse.

Data Mart

Contains a subset of data warehouse data.

Data Analysis

Data Cube

The common term for the representation of multidimensional data.

Data Lake

A storage repository that holds a vast amount of raw data in its original format until the business needs it.

Data Cleansing or Scrubbing

Organizations must maintain high-quality data in the data warehouse.

Dirty Data

Erroneous or flawed data.

Data Cleansing or Scrubbing

A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete data.

Dirty Data Problems

Include non-integrated data, inaccurate data, duplicate data, misleading data, violations of business rules, non-formatted data, and incorrect data.

Data Cleansing Example

Addresses contact data inconsistencies across billing, customer service, marketing, and sales departments.

Standardizing Customer Names

Standardizing customer names from different source systems such as Sales, Customer Service and Billing.

Examples of Data Cleansing

Address missing records or attributes, redundant records, missing keys or other required data, erroneous relationships or references, and inaccurate or incomplete data.

Cost of Accurate and Complete Data

  • Completeness 100%/Accuracy 100%: Perfect data.
  • Completeness 100%/Low Accuracy: Complete but with known errors; pricey.
  • Low Completeness/Accuracy 100%: Very incomplete but accurate; may be a prototype only.

Data Visualization

Data artists use infographics to display patterns, relationships, and trends in a visual format.

Data Visualization Definition

Describes technologies that allow users to “see” or visualize data to transform data into a business perspective.

Data Visualization Tools

Move beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, and more.

Business Intelligence Dashboards

Track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls allowing users to manipulate data for analysis.

Blockchain: Distributed Computing

Distributed Computing Definition

Processes and manages algorithms across many machines in a computing environment.

Ledger

Records classified and summarized transactional data.

Blockchain

A type of distributed ledger, consisting of blocks of data that maintain a permanent and tamper-proof record of transactional data.

Proof-of-Work

A requirement to define an expensive computer calculation, also called mining, that needs to be performed in order to create a new group of trustless transactions (blocks) on the distributed ledger or blockchain.

Proof-of-work has two primary goals:
  • To verify the legitimacy of a transaction, or avoid the so-called double-spending.
  • To create new digital currencies by rewarding miners for performing the previous task.

Centralized vs. Decentralized Ledgers

Dropbox exemplifies a centralized ledger, while Blockchain represents a decentralized ledger.

Blockchain Structure

Formed by linking together blocks, data structures containing a hash, previous hash, and data.

  • Genesis Block: The first block created in the blockchain.
  • Hash: A function that converts an input of letters and numbers into an encrypted output of a fixed length.

Proof-of-Stake

A way to validate transactions and achieve a distributed consensus.

Blockchain Advantages

  • Immutability
  • Digital Trust
  • Internet of Things Integration