FAE CORE Topic 3.2: Data Analytics and Emerging Technologies Study Guide
Emerging Technologies in Big Data and Data Analytics
Data Creation and Big Data Definition
Data is a byproduct of human interaction, from spoken words to device-based communication.
Big Data: Extremely large data sets analyzed computationally to reveal patterns, trends, and associations, specifically regarding human behavior and interactions.
Data Analytics: The science of examining raw data to draw conclusions about information. It is utilized by industries to make informed decisions and verify or disprove business models.
Distinction Between Big Data and Data Analytics
The terms are often used erroneously as synonyms.
Big Data refers to the scale and ownership of extremely large data sets.
Data Analytics refers to the process of transforming and analyzing these sets to produce usable information.
Ownership of data alone does not generate value; analytics are required to improve decision-making and outcomes.
The Dimensions of Data ()
Volume: The scale and mass quantity of data. What is considered high today (measured in petabytes, exabytes, or zettabytes) may be considered low tomorrow. Ireland's electricity consumption for data centers is currently a strategic issue.
Variety: Managing the complexity of structured, semi-structured, and unstructured data from internal and external sources (e.g., text, tweets, sensor data, video, facial recognition).
Velocity: The speed of data creation, processing, and analysis. It relates to the real-time nature of data (e.g., fraud detection, multi-channel marketing) and impacts latency (the lag between creation and access).
Veracity: The reliability and uncertainty of data. Some data is inherently unpredictable due to human sentiment, GPS sensor bounce in urban areas, weather, or economic factors.
Managing Uncertainty and Veracity
Data cleansing cannot correct all uncertainty, but the data still holds value.
Large datasets offer protection; a small number of outliers or errors do not significantly impact insights in the same way they might in auditing.
Data Fusion: Combining multiple less-reliable sources to create an accurate data point (e.g., social comments appended to geospatial data).
Advanced Mathematics: Using robust optimization and fuzzy logic to embrace uncertainty.
Example: Energy companies must forecast production from wind or solar despite weather uncertainty; in February , Ireland's renewable generation hit new demand highs through such anticipating models.
Emerging Trends and Driving Factors
Computing Power and Cloud Computing: Exponential growth in storage and power allows processing of entire datasets. Cloud computing (private or public) allows resource pooling, enabling businesses to access computing on a flexible, as-needed basis.
Software Advances: Programs like Apache Hadoop manage large datasets by splitting processing across many computers. Progress has been made in visualization tools and handling unstructured data (video/text).
New Sources of Data:
Internet clickstreams (searches, transactions).
Social media (status updates, likes, photos).
Mobile technology (location data).
Open Data (public sector, transport, financial data).
Internet of Things (IoT): Sensors in assets like cars, machines, and clothes.
Infrastructure for Knowledge Creation: Digital infrastructure enables crowdsourcing and open-source collaboration. Patterns can be found by data specialists (pattern spotting) or domain specialists (subject matter understanding).
Artificial Intelligence (AI) and Machine Learning
AI: A subfield of computer science (since the ) focused on tasks easy for humans but hard for computers (planning, recognition, translation).
Machine Learning: A subset of AI where systems figure out "correct" actions from world information without explicit programming. It relies on parameters automatically learned from data.
Robotic Process Automation (RPA): Automating recurring business tasks to free up human time for high-value activity.
Blockchain: A sequential, potentially decentralized way to structure and organize data, offering new opportunities for extraction beyond cryptocurrency.
Understanding Data: Ackoff D.I.K.W. Model
Data Quality Characteristics
Error-free.
Available at the right time.
Available at the right place.
Available to appropriate persons.
The D.I.K.W. Hierarchy (Ackoff Model)
Data: Simple, unstructured facts, figures, letters, or numbers (e.g., random numbers on a page).
Information: Data processed or structured in a meaningful way (e.g., sorting numbers into a list).
Knowledge: Observations or findings revealed through interrogation of information. It requires expert opinion, skills, and experience.
Wisdom: Deep understanding or insight derived from prolonged exposure to knowledge; used to achieve good long-term outcomes.
Practical Application: Turning Data into Information in Excel
Context: Importing an unstructured CSV (Comma Separated Value) file where data resides in a single column.
Process: Use the "Text to Column" function in Excel. Use delimiters (like a comma or space) as data separators to distribute data into distinct columns.
Formatting: Centering titles, using bold text, and applying borders helps make information digestible. Adding filters, rank orders, or SUM/AVERAGE formulas provides insights.
Example: Grocery sales () $\rightarrow$ Knowing when items are bought () $\rightarrow$ Knowing bread and milk are bought together () $\rightarrow$ Strategic discounts for Saturday morning shoppers to move stock before expiry ().
Analytic Insights for Strategic Advantage
Evolution of Strategy and Data
Pre-internet: Firms used POS data, distribution sales, and market research from companies like Nielsen.
Today: Digital footprints allow companies to move from "guessing" to responding to customer feedback.
Lifetime Value of a Customer: A customer-centric approach. Example: Market revolt in car insurance pricing where new customers were rewarded more than existing ones, leading to negative switching costs.
Gartner's Analytic Ascendency Model
Descriptive Analytics: Describes what happened (e.g., historical accounts).
Diagnostic Analytics: Seeks reasons for variance; explains why it happened (e.g., using KPIs to investigate account variances).
Predictive Analytics: Attempts to control future outcomes; asks what will happen (e.g., budgeting and forecasting).
Prescriptive Analytics: Recommends actions to produce specific outcomes; asks how to make it happen (e.g., investment appraisal, what-if modeling).
Principles of Data Analytics and Planning
Foundational Principles (People, Processes, Technology)
Business Needs: Data must align with strategic goals. A "Data Champion" (often a Chief Information Officer or CIO) represents the investment case.
Processes: Analyzing how data is gathered and if sources (internal/external) are effective.
Technology: Developing a scalable architecture. Decisions include on-premises vs. cloud-based warehouses, filling data gaps via purchase or estimation, and reporting formats (read-only PDF vs. interactive).
People & Democratization: Creating a flatter organization where employees are trained to interrogate data. Analyst alignment (Business Unit vs. IT) must be considered.
Communication (Storytelling): Using visualization to spot trends and outliers efficiently.
Data Governance: Ensures correct ownership, security, and quality.
Data Dictionary: A living document defining all end-user measures and dimensions to create a common taxonomy.
The Planning Process for Data Analysis
Step : Set the Objective: Define the business problem and create a hypothesis.
Step : Collecting Data:
First-party: Directly collected from customers (CRM, transaction tracking).
Second-party: First-party data of other organizations (purchased or obtained from partners).
Third-party: Aggregated big data from many sources (e.g., Gartner reports, government portals).
Step : Cleaning the Data: Data preparation tasks include removing duplicates/outliers, structure management, and filling gaps. A good analyst spends of their time cleaning.
Step : Analyzing the Data: Applying Descriptive, Diagnostic, Predictive, or Prescriptive techniques.
Step : Report Findings: Presenting insights via interactive dashboards for stakeholder digestion.
Step : Accepting Feedback: An iterative loop where stakeholder questions may redefine the original objective.
The Analytics Lifecycle
Operational Steps:
Address the Question: Define the specific innovation or knowledge needed.
Prepare Data: Combine text, numbers, and images. This phase often consumes of time.
Explore and Model: Use visualization and machine-learning algorithms to find patterns and predict outcomes.
Act on New Information: Adjust based on findings.
Evaluate Results: Check models against expected outcomes.
Present Findings: Refresh models periodically as factors like inflation () change purchasing decisions.
Information Systems and Decision Support
Historical Evolution of Systems
Transaction Processing Systems (TPS) (): First large-scale electronic processing of day-to-day activities (deposits, ATMs).
Management Information Systems (MIS) (): Generated reports from historical data (cost trends, inventory).
Decision Support Systems (DSS) (): Provided internal and external reports; enabled PC-based democratization of info.
Executive Information Systems (EIS) (): Allowed department-level analysis of overall performance.
Enterprise Resource Planning (ERP) (): Integrated knowledge management and artificial intelligence for credit assessment or logistics.
Cloud-Computing (): Cross-functional integration where one input (receipt note) populates inventory, payables, and ordering.
ETL (Extract, Transform, Load)
Combines data from multiple sources into a single consistent store (Cloud Data Warehouse) for analysis.
Cybersecurity risks
Reflected by events like the attack on Ireland's HSE (Health Service Executive). Systems must be robust to protect against unauthorized access/disruption.
Data Protection and Privacy
Privacy vs. Security
Privacy: Control over personal information collection and use. It is a consumer right to safeguard info.
Security: Defending information assets via technology against unauthorized access (Confidentiality, Integrity, Availability).
It is possible to have good security but poor privacy, but good privacy requires good security.
GDPR (General Data Protection Regulation)
Controllers: Organizations that collect data and decide its use.
Processors: Organizations that process data on behalf of controllers.
Subjects: The users being tracked.
Compliance Requirements:
Obtaining clear consent (no complex language).
Timely Breach Notification ().
Right to Data Access (free electronic copy).
Right to be Forgotten (deletion after purpose is realized).
Data Portability (user can reuse data in different environments).
Privacy by Design (proper protocols from the start).
Data Protection Officers (DPO) for large organizations.
Anonymization Failures: Stripping personally identifiable info is hard. Netflix released anonymized movie rating data for a competition; researchers matched it with other public data to re-identify individuals.
Problems Associated with Data Analysis
Volume Overload: Risk managers sifting through thousands of sets may focus on the easiest metrics rather than high-value ones.
Visualization Hurdles: Manual graph construction is time-consuming.
Disjointed Sources: Incomplete analysis due to data residing in separate systems.
Inaccessibility: Data must be available off-site and to stakeholders (solved via self-serve dashboards).
Poor Quality: Manual entry errors and asymmetrical data (outdated info in one system vs another).
Skill Shortage: Lack of analytical competency in hiring; can be mitigated by easy-to-use systems.
Security and Scaling: Hackers target big data stores; scaling analysis becomes complex as an organization grows, favoring cloud transitions.
Questions & Discussion
FAE Exam Question โ CORE August Question
Prompt: Explain the analytics lifecycle and identify new ways to analyze existing data fields for commercial advantage in the context of KD.
Key Insight: Success required linking the model to specific case facts rather than just reciting the theory.
FAE ICD โ Wayit Company Question
Prompt: Describe the steps in a suitable data analysis for local authority information requirements.
Key Insight: Discussed the ability of testing centers to provide data for self-serve dashboards and unexpected insights.
FAE Question ICD โ Akorn Construction Question
Prompt: Assess the validity of data findings and other data analytics opportunities.
Key Insight: The analysis starts at the Descriptive and Diagnostic levels; students are encouraged to connect findings to the Strategic Management syllabus (2.3).