1/129
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Ways Process Mining is Valuable
operational inefficiencies, violations of policies and procedures (SOPs), circumvention of internal controls (manual, automated, IT general controls, Good candidates for automation (standardized, optimize, automate)
Process Mining
technique designed to discover, monitor and improve real processes by extracting readily available knowledge from the event logs of information systems; Intersection of Business Process Management and Data Mining
3 Needs for Process Mining
Case ID, Activity Name, Timestamp
3 types of PM
Discovery, Conformance, Enhacement
Discovery
uses event log data to create a process model without outside influence.
Conformance
checking confirms if the intended process model is reflected in practice
Enhancement/Extensions
1additional information is used to improve an existing process model.
PM vs BPM
PM is more data driven and brings a quantitative approach
Business Process Mining
broader concept that includes business process automation (BPA). BPM involves managing the entire process, while BPA focuses on automating tasks within those processes.
Challenges of PM
Data Quality (data might be distributed over multiple sources) and Concept Drift (processes change as they are analyzed)
Advantages of PM
1. Enhanced transparency
2. Simplified process analysis and enhanced efficiency
3. Data-driven decision making
4. Process optimization
5. Customer-centric process view
6. Process standardization
7. Better customer experience
Limitations of Process Mining
1. Data quality and availability
2. Inability to capture tasks
3. Integration hurdles
4. Concept drift
5. Complexity in large organizations
6. Potential resistance to change
Case Centric
cases (or process instances) as the core element of analysis - represents a single instance of a process, such as a customer order, an insurance claim, or a purchase order (single case, event log, analysis) à where main interest is understanding the path that individual instances take throughout a system
Object Centric
multiple interconnected objects in a single process analysis - recognizes that real-world processes often involve multiple interacting objects (multiple object focus, flexible data handing, holistic view) à complex logistics, where objects are tracked simultaneously across different process flows
Intelligent Automation
adapts and refines automated processes based on feedback of PM, using RPA and AI ; integration of AI and automation technologies to automate specific tasks or processes, uses NLP, ML, and cognitive computing, involves automating repetitive and rule based tasks to improve efficiency, accuracy, and productivity
Why are MJE primary way management overrides controls
1. Discretionary Nature
2. Lack of Traceability
3. Opportunity to Bypass Controls
4. Potential for Collusion
5. Complexity and Volume
6. Targeting Specific Accounts
7. Timing
How can PM help MJE
1. Data Extraction and Preparation
2. Automated Process Discovery
3. Conformance Checking
4. Risk Assessment and Control Testing
5. Anomaly Detection
6. Historical Analysis and Trend Detection
7. Continuous Monitoring and Improvement
Identify Fraudulent Earnings Management
1. Pattern Recognition and Anomaly Detection
2. Threshold and Parameter Monitoring
3. Sequence Analysis and Conformance Checking
4. Cluster Analysis
5. Link Analysis
6. Trend Analysis Over time
7. Integration with other data sources
Digital Transformation
integration of digital technologies into all aspects of an organization, leveraging digital tools and tech to improve process, enhance customer experiences, optimize operations, and drive innovation, adopting cloud computing, implanting data analytics, utilizing AI< ML, and employing internet of things devises à holistic, strategy driven change
Key Characteristics of IA
task-specific automation, cognitive capabilities, limited scope
IA Components and Characteristics
1. AI
2. RPA
3. Cognitive automation
4. Process orchestration
5. Decision automation
Steps to Design and Implement IA
1. Identify Process to Automate
2. Analyze and Map Current Process
3. Select Right Automation Tools
4. Design the Automation Workflow
5. Develop and Test the Automation
6. Implement and Monitor
RPA
software robots that mimic human actions to perform tasks
AI
Machines that can perform tasks requiring human intelligence, like decision making and problem solving
ML
A subset of AI that allows machines to learn from data and improve over time
NLP
AI that understands and interprets human language
Optical Character Recognition (OCR)
tech that converts different types of docs into edible and searchable data
Key Factors Enabling Implementation of RPA
Clear business objectives, Process assessment and selection, Stakeholders buy-in, Robust IT infrastructure, Data availability and quality, RPA governance and strategy, Scalability and flexibility, Change management, Security and compliance, Continuous improvement, ROI analysis
Flow of Automation
Process Optimization —> BPA —> RPA —> IA —> Agentic AI —> Agentic Automation
Business Process Automation
End to End Process |
Embedded into systems |
High, requires process redesign, integration, workflow |
Workflow engines, API, Orchestration platforms |
Months to years à involves IT involvement |
Strategic, affects organizational designs and roles |
___ : Do what I tell you ; ____: Understand what I want, figure out how to do it
RPA, Agentic
RPA
task specific
interacts with applications
low complexity
software bots
only weeks to implement
tactical
Agentic Automation
Autonomous agents w/ reasoning, planning, and memory |
UI Path Autopilot, Open Ai |
Contextual reasoning via LLMs or SLMs |
Adaptive à learns from new data and context |
Engages through APIs, natural language, and orchestration layers |
Self corrects or escalates intelligently Dynamic, goal driven tasks |
Business Risks of RPA
1. Implementation Challenges
2. Technical Issues
3. Security Concerns
4. Process Complexity
5. Lack of Flexibility and Adaptability
6. Dependency and Single Points of Failure
7. Workforce Concerns
8. Governance and Compliance
RPA Specific Risks
1. Cyber Risk
2. Unauthorized Change
3. Software Licensing
4. Access Risks
5. Incident
6. Security Breach Detection
RPA Risks and Control Framework
1. Incident Management & Business Risk
2. Regulatory Compliance
3. Identify and Access Management
4. Secured Business Process
5. License Compliance
6. Data Leakage and Privacy
7. Cyber Security
Stages of RPA Implementation
1. Opportunity Identification & Strategy
2. Solution Design
3. Configure and Test
4. Deploy
5. Operate and Maintain
6. Retirement
Agentic AI
AI systems that can reason, plan, and act autonomously toward a goal using LLM, memory, and contextual awareness: they think and operate like digital colleagues: Execute tasks without continuous user prompting, uses LLMs, retains context across sessions, Invokes APIs, RPAs, and databases, can work with humas or other agents
Intelligent Automation Characteristics
Intelligent Automation |
Tasks & Process Efficiency |
RPA+AI/ML |
Structured & semi- structured |
Rule-based triggers |
Reactive (predefined rules) |
Within bot scripts |
Agentic Characteristics
Cognitive reasoning & autonomy |
LLMs+ tools+ memory |
Unstructured& contextual |
Conversational or autonomous |
Proactive (reasoning & planning) |
Within agent logic |
Agentic Automation Characteristics
Orchestrated, autonomous execution |
Agentic AI + RPA + orchestration engine |
All types |
Human-in-the-loop orchestration |
Coordinated, goal seeking |
Enterprise orchestration layer |
Major Enabling Technologies for Digital Transformation
Cloud Computing, AI, Internet of Things, Big Data Analytics, RPA , Blockchain, Augmented Reality and Virtual Reality, Cyber Solutions, Edge Computing, Mobile Technologies
Flow of AI
AI —> ML —> Neural Networks —> Deep Learning —> Generative AI
Weak AI
AI trained and focused to perform specific tasks
Strong AI
Made of artificial general intelligence and artificial super intelligence, theoretical form of AI where a machine can equal human
ML vs Deep Learning
both use neural networks, but different types
deep: deep neural networks, enable supervised learning, doesn’t require human intervention
How Machine Learning Works
Decision Process: based on input data, produces an estimate about a pattern
Error Function: evaluates the prediction of the model
Model Optimization Process: if the model can fit better to the data points in the training set, then weights are adjusted to reduce the discrepancy between the known example and the model estimate. The algorithm will repeat this iterative “evaluate and optimize” process, updating weights autonomously until a threshold of accuracy has been met.
Deep ML
can use labeled or unstructured data, eliminates some human intervention
Classical or Non-Deep ML
dependent on human intervention to learn
Neural Networks or Artificial Neural Networks
-comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node connects to another and has an associated weight and threshold. If the output of any node is above threshold value, the node is activated and the data is sent to the next layer. “Deep” refers to the number of layers in a neural network
3 Primary Categories of Machine Learning Methods
Supervised
Unsupervised
Semi-Supervised
Supervised ML
use of labeled datasets to train algorithms to classify data or predict outcomes accurately. As input data is fed into the model, the model adjusts its weights until it has been fitted appropriately. This occurs as part of the cross validation process to ensure that the model avoids overfitting or underfitting. Helps organizations solve a variety of real-world problems at scale
Reinforcement ML
is similar to supervised learning, but the algorithm isn’t trained using sample data, but instead trial and error
Unsupervised Machine Learning
uses machine learning algorithms to analyze and cluster unlabeled datasets (subsets called clusters). These algorithms discover hidden patterns or data groupings without the need for human intervention. Discovers similarity and differences in information making it ideal for exploratory data analysis
Semi-Supervised Learning
medium of above 2, uses a smaller labeled data set to guide classification and feature extraction from a larger, unlabeled data set. Can solve the problem of not having enough labeled data for a supervised learning algorithm
Common Machine Learning Algorithms
Neural Networks: simulate the way the human brain works, with a huge number of linked processing nodes
Linear Regression: used to predict numerical values, based on a linear relationship between different values
Logistic Regression: makes predictions for categorical response variables (supervised learning)
Clustering: identify patterns in data so that it can be grouped (unsupervised)
Decision trees: used for prediction numerical values and classifying data into categories, uses branching sequence
Random forest: predicts a value or category by combining the results from a number of decision trees
Capabilities of AI and Machine Learning
- Predictive analytics - predict trends and behavioral patterns by discovering cause-and-effect relationships in data.
- Recommendation engines - companies use data analysis to recommend products that someone might be interested in
- Speech recognition and natural language understanding - enables a computer system to identify words in spoken language, and natural language understanding recognizes meaning in written or spoken language.
- Image and video processing - recognize faces, objects, and actions in images and videos, and implement functionalities
- Sentiment analysis - identify and categorize positive, neutral, and negative attitudes that are expressed in text
5 Pillars: Risks of AI
1. Explainability: Transparent and intelligible, meaning it can clearly explain, in non-technical terms, how and why it arrived at a particular conclusion, especially when its recommendations have significant implications for individuals.
2. Fairness: Promote equitable treatment for all individuals and groups. Actively guarding against bias in the algorithm and training data, and ensuring an inclusive design and development process.
3. Robustness: AI-powered systems must be secure and resilient. They must be actively defended against malicious actions like adversarial attacks or data poisoning, and effectively handle abnormal conditions without causing unintentional harm
4. Transparency: Users must be able to see and evaluate how the service works, including its strengths and limitations. This involves clear disclosure about what data is collected, how it is used and stored, who has access to it, and the data and training used to create the AI model
5. Privacy: Prioritize and safeguard consumers' data rights. This requires full disclosure, collecting only the minimum necessary data for an explicit purpose, and providing users with the ability to choose and control how their personal data is collected and used.
Data Mining
Focuses on structured data|utilizes traditional statistical and ML techniques| involves cleaning, transforming, and normalizing structured data|discovers patterns within datasets|decision trees, k-means clustering, support vector machines
Text Mining
analyzes unstructured data, generally in natural language and lacks predefined structure|leverages NLP to understand & extract data, text specific preprocessing like tokenization, removing words, transforming text into #| extract meaningful information from text|NLP models
Similarities: Analytical process, purpose, use of ML, use of Feature Engineering, handing large volumes of data
Techniques of Text Mining/Analysis
1. Text Classification: Categorizing text documents into predefined classes or categories based on their content.
2. Sentiment Analysis: Determining the sentiment or subjective tone of a piece of text, such as positive, negative, or neutral
3. Named Entity Recognition (NER): Identifying and extracting named entities from text, such as person names, organization names, locations, dates, or other predefined categories. NER is often used in information extraction and knowledge graph construction.
4. Topic Modeling: Discovering latent topics or themes present in a collection of text documents
5. Text Clustering: Grouping similar documents together based on their content.
6. Text Summarization: Generating concise summaries of longer documents or texts through extractive techniques
7. Text Extraction and Info Retrieval: Extracting specific information from text documents, focuses on relevant documents
8. Text Similarity and Document Comparison: Measuring the similarity between text or comparing entire documents
9. Text Co-Occurrence Analysis: Analyzing patterns of word co-occurrence in a text corpus to gain insights into relationships between words, concepts, or entities
5 Most Common Text Mining Sentiment
1. Lexicon-Based Approaches: Lexicon-based methods utilize sentiment lexicons or dictionaries that contain words or phrases associated with sentiment and their corresponding polarity (positive, negative, or neutral)
2. Machine Learning-Based Approaches: Machine learning techniques involve training models on labeled data to automatically predict sentiment .
3. Rule-Based Approaches: Rule-based techniques involve creating a set of predefined rules or patterns to identify sentiment in
4. Aspect-Based Sentiment Analysis: Aspect-based sentiment analysis focuses on identifying sentiment towards specific aspects or entities mentioned in the text. It involves extracting aspects or features of interest and then determining sentiment associated with each aspect separately.
5. Deep Learning Approaches: learn hierarchical representations of text and capture contextual information effectively
Polarity Analysis (Sentiment polarity classification)
- Involves determining the sentiment or subjective polarity expressed in a piece of text, typically classified as positive, negative, or neutral
- Goal is to automatically classify text documents, sentences, or phrases into these sentiment categories
- Provides classification and can provide insights into the overall opinion, attitude, or emotional tone conveyed by the text
Plays a crucial role in various applications, such as analyzing customer feedback, social media sentiment monitoring, brand reputation management, market research, and more
Data Preparation
The text data to preprocessed to remove noise and irrelevant information
Training Data Labeling:
A labeled dataset is prepared, where each text instance is associated with its corresponding sentiment polarity label (positive, negative, or neutral)
Model Training
Machine learning algorithms or other classification techniques are employed to train a sentiment polarity classification model. The model learns patterns and relationships between the extracted features and their associated sentiment labels from the training data.
Testing and Evaluation
The trained model is evaluated using a separate set of test data to measure its performance in classifying sentiment polarity accurately.
Emotional Rating Analysis (Emotion Analysis/Emotion Detection)
- Subfield of sentiment analysis that focuses on identifying and categorizing emotions expressed in text data. While sentiment analysis typically categorizes text into positive, negative, or neutral sentiment, emotion analysis goes a step further by capturing the specific emotions conveyed in the text.
- Emotional rating analysis aims to recognize and assign emotional labels to text based on the underlying emotional content. Emotions can
include happiness, sadness, anger, fear, surprise, disgust, and more
Data Processing
text data is preprocessed to remove noise, handle punctuation, convert to lowercase, and tokenize the text into individual words or phrases
Emotion Lexicons or Datasets
Contain words or phrases associated with specific emotions
Feature Extraction
Relevant features are extracted from the text to represent its emotional content. These features can include words, n-grams, or other linguistic properties associated with emotions.
Emotion Classification Model
learns patterns and relationships between the extracted features and their associated emotions from labeled emotion datasets
Emotion Prediction
Once the model is trained and evaluated, it can be used to predict the emotions conveyed in new or unseen text data. The model applies the learned classification rules to assign emotional labels to the input text
Use Cases for Text Mining & Analytics &Techniques for Substantive Audit Texting
1. Expenses Classification and Categorization: Topic Modeling
2. Contractual Compliance: Rule-Based Extraction
3. Loan Covenant Monitoring: Named Entity Recognition (NER)
4. Tax Disclosure Analysis: Sentiment Analysis
5. Inventory Valuation: Keyword Extraction
6. Legal Proceedings and Contingencies: Text Classification
7. Revenue Recognition Confirmation: Text Pattern Matching
8. Related Party Transactions: Network Analysis
9. Non-GAAP Financial Measures: NLP-Based Reconciliation
10. Management’s Discussion and Analysis: Topic Analysis
Text Mining & Analytics for Auditing MJEs
1. Risk Assessment: Text Classification
2. Anomaly Detection: Outlier Detection
3. Narrative Analysis: Text Mining for Descriptions
4. Trend Analysis: Time-Series Analysis
5. Segregation of Duties: Network Analysis
6. Materiality Assessment: Content Analysis
7. Matching with Supporting Documentation: Document Matching
8. Consistency Check: Semantic Analysis
9. Entity Relationship Analysis: Relationship Extraction
10. Audit Trail Verification: Text Pattern Matching
Collocation
words commonly appearing near each other
Concordance
the instances and contexts of a given word or set of words
N-Grams
common two, three, word phrases
5 Tools/Techniques for Text Analytics
1. Sentiment Analysis: analyzing the opinion or tone of what people are saying about a particular topic or issue
a. Polarity, categorization, emotional rating analysis, Affective Norms for English Words (ranks words in terms of pleasure, arousal, dominance), WordNet (synonyms/ antonyms)
2. Topic Modeling: Identifying dominant themes in a vast array of documents and for dealing with a large corpus of text
a. Latent Dirichlet Allocation (words clustered into topics) and Probabilistic Latent Semantic Indexting (models co-occurrence data using probability)
3. Term Frequency-Inverse Document Frequency Analysis: examines how frequently a word appears in a document and its importance relative to a whole set of documents
4. Named Entity Recognition – examines text to identify nouns by analyzing the words surrounding them
5. Event Extraction – a step further than NER but more complicated; looks at nouns in context but also what the relationship is between the nouns and the kinds of interferences that can be made from incidents referred to in the text d
The 7 Practice Areas
1. Search and Information Retrieval (IR) - Storage and retrieval of text documents, including search engines and keyword search
2. Document Clustering – Grouping and categorizing terms, snippets, paragraphs or docs using data mining clustering methods
3. Document Classification – Grouping and categorizing snippets, paragraph, or docs using dating minding classification methods based on models trained on labeled examples
4. Web Mining – Data and text mining on the internet with a specific focus on the scale and interconnectedness of the web
5. Information Extraction – Identification and extraction of relevant facts and relationships from unstructured texted; the process of making structured data from unstructured and semi-structured text
6. NLP – low-level language processing and understanding tasks; often used synonymously w/ computational linguistics
7. Concept Extraction – Grouping of words and phrases into semantically similar groups
Generative AI
category of AI techniques that involve generating new and original content, works by learning patterns and structures form a given dataset during a training phrase. They learn to capture the underlying distribution of the training data, enabling them to generate new samples that resemble the original data
How does Generative AI Work?
1. Generator: neural network that takes random input and transforms it into synthetic data
2. Discriminator: neural network that learns to distinguish between real and synthetic data
3. Adversarial Training: provides feedback to the generator, indicator how well it can distinguish between real and fake samples
4. Iterative Process: improves its abilities to generate realistic samples, while the discriminator gets better at distinguishing real from synthetic samples
- Lacks true understanding and consciousness, it mimics patterns from the training data rater than having on inherent understanding of the content it generates
Prompt Engineering
refers to the practice of carefully crafting and optimizing prompts to guide generative AI models in generating more accurate, coherent, and contextually relevant responses
Components of Prompt Engineering
- Initial Query: The primary question or instruction that outlines what is sought from the model.
- Contextual Information: Additional information or constraints that shape the kind of output that should be generated.
- Explicit Instructions: Additional guidelines to inform the model on how the answer should be structured or
formatted.
Techniques of Prompt Engineering
- Prompt Variations: Changing the phrasing, tone, or complexity of a prompt to get different results.
- Prompt Concatenation: Adding multiple queries in a single prompt to guide the model in generating a multi-faceted answer
- Prompt Fine-tuning: Using trial and error to iteratively improve the prompt for more accurate responses
Retrieval-Augmented Generation:
AI Framework for retrieving facts from an external knowledge base to ground large language models on the most accurate, up to date, information and to give users insights into LLM’s generative process
Chain of Thought
mirrors human reasoning, facilitating systematic problem solving through a coherent series of logical deductions
Reason and Act
method designed to enhance the interaction with AI models, particularly in tasks where reasoning and subsequent actions required
Directional Stimulus Prompting
guides generative AI models towards producing specific types of resources or focusing on certain aspects of a problem by providing a direct or indirect stimulus
RAG Equation
RAG = Fact Retrival + Reasoning Syntheses, COT/ReAct/DSP = Reasoning Strategy + Prompt Structure
Limitations and Risks of RAG
Retrieval Quality | Garbage in, Garbage out – irrelevant or low quality chucks mislead reasoning |
Context Length | Current context windows are finite; long documents must be chucked carefully |
Data Governance | Confidential data must be isolated; retrieval systems must log access and sources |
Latency | Retrieval adds processing overhead |
Chain of Thought (Core Idea, Strength, Weakness, RAG Contrast)
Prompts the model to show step by step reasoning | Improves reasoning accuracy, transparency of logic | Still limited to model’s internal knowledge; can hallucinate facts | Can supply factual grounding for each reasoning step |
ReAct (Core Idea, Strength, Weakness, RAG Contrast)
Alternates reasoning and action | Mimics agentic behavior | Can loop inefficiently | Serve as the retravel action |
Directional Stimulus Prompting (Core Idea, Strength, Weakness, RAG Contrast)
Steers model outputs toward desired attributes | Useful for tone, professional skepticism | Doesn’t add new knowledge | Adds content validity |
Few-Shot/In Context Learning ( Core Idea, Strength, Weakness, RAG Contrast)
Embeds examples in the prompt to teach task behavior | Reduces find-tuning need | Limited by prompt size | Scales beyond context |
Blockchain
decentralized and distributed digital ledger technology that enables the secure and transparent recording of transactions across multiple computers or nodes
- Consists of chain of blocks, where each block contains a list of transactions, transactions are bundled together and added to the chain in a chronological order
technology infrastructure that enables decentralized and transparent record-keeping, while cryptocurrencies are digital assets that utilize blockchain technology to enable secure and decentralized transactions. Blockchain has broader applications beyond cryptocurrencies, while cryptocurrencies specifically focus on financial transactions and digital currencies.
Distributed Network
Blockchain operates on a network of computers (nodes) that are connected to each other. Each node maintains a copy of the entire blockchain, ensuring redundancy and decentralized control.
Transactions
When a transaction is initiated, it needs to be verified and validated by the network.
Verification
Nodes on the network verify the transaction by checking its validity, ensuring that the sender has sufficient funds or authority to make the transaction, and confirming that the transaction adheres to the predefined rules of the blockchain network.
Block Creation
Validated transactions are grouped together into a block. Each block typically contains a reference to the previous block, forming a chain of blocks, hence the name "blockchain." The blocks are linked using cryptographic hashes, which are unique identifiers generated based on the data within the block.