1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
DENDRAL
A heuristic program that explained empirical data, especially mass spectrometry results. Used many IF–THEN rules to infer molecular structures. Showed that an expert system is only as good as its rules and motivated knowledge engineering.
Knowledge engineering
The process of extracting knowledge from human experts (often via interviews), formalising it (rules/representations), and encoding it into a computer system so the program can use it. Hard because experts often have implicit/unconscious reasoning.
Early expert systems (MYCIN example)
Focused on observable evidence in a domain (e.g., symptoms, lab tests). Used rules to identify causes (like bacteria), recommend treatments, and provide explanations for recommendations (e.g., why this antibiotic/dose).
MYCIN
Rule-based medical expert system advising physicians on antimicrobial therapy selection. (1) identify bacteria responsible for infections, (2) recommend treatments, (3) explain decisions to users (important for trust).
Certainty factors (MYCIN uncertainty)
MYCIN gave a confidence with predictions but did NOT use Bayesian uncertainty. It used ad hoc certainty factors: CF>0 = evidence for, CF<0 = evidence against. CFs were combined using specific combination rules.
Expert system definition
A computer system that emulates the decision-making ability of a human expert in a specialised domain. Uses knowledge + inference to solve problems difficult enough to require human expertise (often can be asked to do it in a very specific way).
Thinking rationally (in this lecture)
Interpretation of intelligence where the system draws logically/rationally correct conclusions from encoded knowledge. Expert systems are an implementation of “thinking rationally”.
Expert systems vs Intelligent Systems
Expert systems “think rationally” but are not embodied, have no real-world embedding, and are not proactive: they usually output conclusions/recommendations rather than acting in an environment. So they overlap with intelligent systems but don’t satisfy embodied/agent-like notions.
Decision Support System (DSS)
System that helps a human make sense of data: visualisation, analytics/statistics, predictions. It supports the human decision-maker rather than replacing them.
Expert systems vs DSS
DSS: helps users decide by providing right info/analytics. Expert system: combines knowledge/data + reasoning to make recommendation (action/decision) + may explain why; doesn’t leave entire decision process to user.
Expert systems vs Agents
Expert systems are case-input driven + disembodied: collect facts from user, may ask follow-up questions, but don’t interact with environment / influence it sequentially. Agents are proactive, environment-interacting, often sequential decision-makers.
Expert system usage (examples)
Diagnosis + troubleshooting: medical diagnosis, car/mechanics diagnostics (car reports → system suggests fixes), computer diagnostic assistants, crime investigation tools, biological classification, space exploration (e.g., spacecraft health inference).
Why expert systems were popular
are proven/provable technology: consistent behaviour for same inputs + reasoning that can be checked for correctness
Components of an expert system
User interface (collect query + show advice), inference engine (reasoning), knowledge base (rules + case-specific facts), explanation facility (why this conclusion). Often also includes knowledge acquisition/engineering process.
Interfacing with an expert system
The way the system asks questions + presents rules/conclusions so non-experts can use it (forms, Q&A dialogues, chat-like UI). Good interfaces make rule-based reasoning usable + explanations understandable.
Rule (IF–THEN)
A general piece of domain knowledge expressed as a conditional: IF condition(s) THEN conclusion/action. Example: IF student THEN tax_exempt.
Fact (case-specific)
A statement about the current case/instance used during reasoning. Example: Milena is a student. Facts + rules together drive inference.
Knowledge base + inference engine relationship
Knowledge base stores rules and facts; inference engine applies rules to facts to derive new facts/conclusions and produce advice. Without the inference engine, rules don’t “run”; without the knowledge base, the engine has nothing to reason with.
Knowledge representation (KR) examples
Languages/structures for representing knowledge: description logics, first-order rules (e.g., wet(X) :- outside(X), raining.), ontologies (support forward/backward chaining), non-monotonic logics. Goal: expressive but maintainable + explainable.
Probabilistic reasoning (why)
Adds uncertainty as part of the model when outcomes are not deterministic. Two origins: (1) lack of knowledge (we aren’t sure), (2) randomness/inherent uncertainty (stochastic effects). Models dependencies like p(A|B) where B makes A more/less likely.
Probabilistic fact vs probabilistic dependency
Probabilistic fact: uncertainty about a single statement (e.g., p(student)=0.6). Probabilistic dependency: one variable depends on another with a conditional probability (e.g., p(A|B)=0.8).
Marginalisation
Compute probability of event(s) by summing over unobserved variables. Example: want P(X=x) even if joint model includes many other variables → sum them out. Hard because sums can be exponential.
Maximisation (MAP)
Instead of full probabilities, find the most probable value. Example: “most likely temperature tomorrow” = argmax over values. Often used for decisions/classification. Can also be computationally hard.
Bayes’ rule
P(A|B) = (P(B|A)P(A))/P(B). Updates beliefs using evidence. Often beats intuition because it accounts for base rates (priors) + likelihoods correctly.
Bayes rule for classification
Given attribute values a1..an, choose the class v that maximises P(v|a1..an). Directly estimating P(a1..an|v) can require too much data.
Naïve Bayes
Assumes attributes are conditionally independent given the class, so P(a1..an|v)=∏P(ai|v). Often predicts the right class, but probability outputs can be exaggerated/unrealistic because independence assumption is usually false.
Graphical models motivation
Between two extremes: (1) full joint distribution (optimal but infeasible), (2) Naïve Bayes (feasible but strong assumptions). Graphical models assume independence only where reasonable, improving realism while staying tractable.
Factor graph idea
Group variables into conditionally independent cliques (factors). Each factor has a potential function over variables in its clique. Need a normalisation constant so the distribution sums to 1.
Potential function (intuition)
A non-negative “compatibility/weight” over assignments to variables in a clique: higher potential means the combination of values fits together better (more probable before normalisation).
Markov network / Markov Random Field (MRF)
Undirected graphical model. Cliques have potential functions; joint probability is proportional to product of potentials divided by normalisation constant Z. “More probable” configurations correspond to higher combined potential.
Bayesian network
Directed graphical model with conditional probability tables (CPTs): each node has P(child|parents). Joint distribution factorises as ∏ P(Xi | parents(Xi)). CPTs are often easier to interpret than undirected potentials.
Bayesian network as a factor graph
CPTs can be viewed as factors: each table provides weights for a factor/potential over the clique (node + parents). Joint probability can be computed by multiplying the factor weights (then normalising if needed).
Inference in graphical models
Inference = compute probabilities of queries given evidence (observations). Exact inference typically requires summing over all assignments of unobserved variables, which is NP-complete in general.
Variable elimination (what it does)
Exact inference method: repeatedly eliminate variables by (1) multiplying all factors containing that variable, (2) substituting observed values where applicable, and (3) summing out the variable if unobserved. Intermediate factors can blow up exponentially.
Variable elimination (Alarm example intuition)
If Alarm is observed true, plug A=T into factors and simplify. If Alarm is unobserved, compute results by summing over A=T and A=F contributions (marginalising over Alarm).
Loopy belief propagation
Approximate inference via message passing in graphs with loops. Factors/variables exchange messages about “beliefs” (distributions), multiply incoming messages, and iterate until convergence (not always guaranteed).
Why combine rules with probabilities
IF–THEN rules can be too rigid for real domains (“When it rains I might get wet… but not always”). Adding probabilities allows uncertain rule outcomes and more realistic modelling.
StaRAI (Statistical Relational AI)
Area combining probability + logic/relational structure. Includes formalisms such as ProbLog (distribution semantics) and CP-logic (causal probabilities).
Distribution semantics (core idea)
Two parts: probabilistic predicates and standard logical rules. A probabilistic predicate has form p: q(t1..tn) meaning each grounded atom q(…) is true with probability p. Logical rules derive consequences. This defines a distribution over possible worlds.
Probabilistic predicate vs logical variable
In p: q(t1..tn), q(…) is treated as a random variable (true/false). The ti are logical variables/placeholders that get substituted with concrete objects (grounding) to refer to real entities.
Possible worlds (ProbLog)
Each probabilistic fact is like an independent random choice (included or not). A “possible world” is one particular selection of which probabilistic facts are true; its probability is the product of chosen p’s and (1-p)’s for unchosen facts.
ProbLog query probability
To answer a query (e.g., path(1,4)), ProbLog considers all proofs/worlds where the query holds. Proofs can overlap, so computing P(A∨B) requires handling overlaps (disjoint-sum problem), often solved using decision diagrams or weighted model counting.
CP-logic (causal probabilities)
Instead of specifying full joint probabilities (hard for experts), CP-logic uses causal rules: given conditions, certain outcomes happen with specified probabilities. Causal probabilities are typically easier for humans to estimate.
DeepProbLog (state of the art)
Extends ProbLog by adding “neural predicates”: a neural network outputs a probability distribution over classifications → used as probabilistic input to the logic program. Can learn from examples + keeping logical structure.
Machine learning in expert systems (two roles)
(1) Keep rule base updated by adapting/adding rules without expert input (e.g., Relational Learning learns first-order-like rules). (2) Fully automate decision model building + replace rule-based reasoning.
Explanation facility importance
Expert systems can often explain logical reasoning steps (which rules fired and why). This matters for trust and regulation (GDPR). Explanations become harder when statistical learning components are included, motivating Explainable AI research.
Why an LLM is not an expert system
LLMs imitate conversation (next-token prediction) rather than performing explicit rule-based inference over a knowledge base. They are not guaranteed to follow provided context/instructions (alignment issue), so they don’t provide the same formal correctness/explainability guarantees as expert systems.
LLMs as interface to expert systems
LLMs could be used as a natural-language interface on top of an expert system: translate user text into structured facts/queries + translate explanations back into friendly language (with caution).
Advantages of expert systems
Consistency (same inputs → same outputs), memory (don’t forget), logic (formal reasoning without sentimental bias), infinitely reproducible and don’t get tired.
Disadvantages of expert systems
Lack of common sense (hard to program), limited creativity (hard to invent new solutions), maintenance burden (knowledge base updates often manual), not inherently adaptable to changing conditions.