Lecture 4 - Expert Systems

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/49

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:21 PM on 6/14/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

1
New cards

DENDRAL

A heuristic program that explained empirical data, especially mass spectrometry results. Used many IF–THEN rules to infer molecular structures. Showed that an expert system is only as good as its rules and motivated knowledge engineering.

2
New cards

Knowledge engineering

The process of extracting knowledge from human experts (often via interviews), formalising it (rules/representations), and encoding it into a computer system so the program can use it. Hard because experts often have implicit/unconscious reasoning.

3
New cards

Early expert systems (MYCIN example)

Focused on observable evidence in a domain (e.g., symptoms, lab tests). Used rules to identify causes (like bacteria), recommend treatments, and provide explanations for recommendations (e.g., why this antibiotic/dose).

4
New cards

MYCIN

Rule-based medical expert system advising physicians on antimicrobial therapy selection. (1) identify bacteria responsible for infections, (2) recommend treatments, (3) explain decisions to users (important for trust).

5
New cards

Certainty factors (MYCIN uncertainty)

MYCIN gave a confidence with predictions but did NOT use Bayesian uncertainty. It used ad hoc certainty factors: CF>0 = evidence for, CF<0 = evidence against. CFs were combined using specific combination rules.

6
New cards

Expert system definition

A computer system that emulates the decision-making ability of a human expert in a specialised domain. Uses knowledge + inference to solve problems difficult enough to require human expertise (often can be asked to do it in a very specific way).

7
New cards

Thinking rationally (in this lecture)

Interpretation of intelligence where the system draws logically/rationally correct conclusions from encoded knowledge. Expert systems are an implementation of “thinking rationally”.

8
New cards

Expert systems vs Intelligent Systems

Expert systems “think rationally” but are not embodied, have no real-world embedding, and are not proactive: they usually output conclusions/recommendations rather than acting in an environment. So they overlap with intelligent systems but don’t satisfy embodied/agent-like notions.

9
New cards

Decision Support System (DSS)

System that helps a human make sense of data: visualisation, analytics/statistics, predictions. It supports the human decision-maker rather than replacing them.

10
New cards

Expert systems vs DSS

DSS: helps users decide by providing right info/analytics. Expert system: combines knowledge/data + reasoning to make recommendation (action/decision) + may explain why; doesn’t leave entire decision process to user.

11
New cards

Expert systems vs Agents

Expert systems are case-input driven + disembodied: collect facts from user, may ask follow-up questions, but don’t interact with environment / influence it sequentially. Agents are proactive, environment-interacting, often sequential decision-makers.

12
New cards

Expert system usage (examples)

Diagnosis + troubleshooting: medical diagnosis, car/mechanics diagnostics (car reports → system suggests fixes), computer diagnostic assistants, crime investigation tools, biological classification, space exploration (e.g., spacecraft health inference).

13
New cards

Why expert systems were popular

are proven/provable technology: consistent behaviour for same inputs + reasoning that can be checked for correctness

14
New cards

Components of an expert system

User interface (collect query + show advice), inference engine (reasoning), knowledge base (rules + case-specific facts), explanation facility (why this conclusion). Often also includes knowledge acquisition/engineering process.

15
New cards

Interfacing with an expert system

The way the system asks questions + presents rules/conclusions so non-experts can use it (forms, Q&A dialogues, chat-like UI). Good interfaces make rule-based reasoning usable + explanations understandable.

16
New cards

Rule (IF–THEN)

A general piece of domain knowledge expressed as a conditional: IF condition(s) THEN conclusion/action. Example: IF student THEN tax_exempt.

17
New cards

Fact (case-specific)

A statement about the current case/instance used during reasoning. Example: Milena is a student. Facts + rules together drive inference.

18
New cards

Knowledge base + inference engine relationship

Knowledge base stores rules and facts; inference engine applies rules to facts to derive new facts/conclusions and produce advice. Without the inference engine, rules don’t “run”; without the knowledge base, the engine has nothing to reason with.

19
New cards

Knowledge representation (KR) examples

Languages/structures for representing knowledge: description logics, first-order rules (e.g., wet(X) :- outside(X), raining.), ontologies (support forward/backward chaining), non-monotonic logics. Goal: expressive but maintainable + explainable.

20
New cards

Probabilistic reasoning (why)

Adds uncertainty as part of the model when outcomes are not deterministic. Two origins: (1) lack of knowledge (we aren’t sure), (2) randomness/inherent uncertainty (stochastic effects). Models dependencies like p(A|B) where B makes A more/less likely.

21
New cards

Probabilistic fact vs probabilistic dependency

Probabilistic fact: uncertainty about a single statement (e.g., p(student)=0.6). Probabilistic dependency: one variable depends on another with a conditional probability (e.g., p(A|B)=0.8).

22
New cards

Marginalisation

Compute probability of event(s) by summing over unobserved variables. Example: want P(X=x) even if joint model includes many other variables → sum them out. Hard because sums can be exponential.

23
New cards

Maximisation (MAP)

Instead of full probabilities, find the most probable value. Example: “most likely temperature tomorrow” = argmax over values. Often used for decisions/classification. Can also be computationally hard.

24
New cards

Bayes’ rule

P(A|B) = (P(B|A)P(A))/P(B). Updates beliefs using evidence. Often beats intuition because it accounts for base rates (priors) + likelihoods correctly.

25
New cards

Bayes rule for classification

Given attribute values a1..an, choose the class v that maximises P(v|a1..an). Directly estimating P(a1..an|v) can require too much data.

26
New cards

Naïve Bayes

Assumes attributes are conditionally independent given the class, so P(a1..an|v)=∏P(ai|v). Often predicts the right class, but probability outputs can be exaggerated/unrealistic because independence assumption is usually false.

27
New cards

Graphical models motivation

Between two extremes: (1) full joint distribution (optimal but infeasible), (2) Naïve Bayes (feasible but strong assumptions). Graphical models assume independence only where reasonable, improving realism while staying tractable.

28
New cards

Factor graph idea

Group variables into conditionally independent cliques (factors). Each factor has a potential function over variables in its clique. Need a normalisation constant so the distribution sums to 1.

29
New cards

Potential function (intuition)

A non-negative “compatibility/weight” over assignments to variables in a clique: higher potential means the combination of values fits together better (more probable before normalisation).

30
New cards

Markov network / Markov Random Field (MRF)

Undirected graphical model. Cliques have potential functions; joint probability is proportional to product of potentials divided by normalisation constant Z. “More probable” configurations correspond to higher combined potential.

31
New cards

Bayesian network

Directed graphical model with conditional probability tables (CPTs): each node has P(child|parents). Joint distribution factorises as ∏ P(Xi | parents(Xi)). CPTs are often easier to interpret than undirected potentials.

32
New cards

Bayesian network as a factor graph

CPTs can be viewed as factors: each table provides weights for a factor/potential over the clique (node + parents). Joint probability can be computed by multiplying the factor weights (then normalising if needed).

33
New cards

Inference in graphical models

Inference = compute probabilities of queries given evidence (observations). Exact inference typically requires summing over all assignments of unobserved variables, which is NP-complete in general.

34
New cards

Variable elimination (what it does)

Exact inference method: repeatedly eliminate variables by (1) multiplying all factors containing that variable, (2) substituting observed values where applicable, and (3) summing out the variable if unobserved. Intermediate factors can blow up exponentially.

35
New cards

Variable elimination (Alarm example intuition)

If Alarm is observed true, plug A=T into factors and simplify. If Alarm is unobserved, compute results by summing over A=T and A=F contributions (marginalising over Alarm).

36
New cards

Loopy belief propagation

Approximate inference via message passing in graphs with loops. Factors/variables exchange messages about “beliefs” (distributions), multiply incoming messages, and iterate until convergence (not always guaranteed).

37
New cards

Why combine rules with probabilities

IF–THEN rules can be too rigid for real domains (“When it rains I might get wet… but not always”). Adding probabilities allows uncertain rule outcomes and more realistic modelling.

38
New cards

StaRAI (Statistical Relational AI)

Area combining probability + logic/relational structure. Includes formalisms such as ProbLog (distribution semantics) and CP-logic (causal probabilities).

39
New cards

Distribution semantics (core idea)

Two parts: probabilistic predicates and standard logical rules. A probabilistic predicate has form p: q(t1..tn) meaning each grounded atom q(…) is true with probability p. Logical rules derive consequences. This defines a distribution over possible worlds.

40
New cards

Probabilistic predicate vs logical variable

In p: q(t1..tn), q(…) is treated as a random variable (true/false). The ti are logical variables/placeholders that get substituted with concrete objects (grounding) to refer to real entities.

41
New cards

Possible worlds (ProbLog)

Each probabilistic fact is like an independent random choice (included or not). A “possible world” is one particular selection of which probabilistic facts are true; its probability is the product of chosen p’s and (1-p)’s for unchosen facts.

42
New cards

ProbLog query probability

To answer a query (e.g., path(1,4)), ProbLog considers all proofs/worlds where the query holds. Proofs can overlap, so computing P(A∨B) requires handling overlaps (disjoint-sum problem), often solved using decision diagrams or weighted model counting.

43
New cards

CP-logic (causal probabilities)

Instead of specifying full joint probabilities (hard for experts), CP-logic uses causal rules: given conditions, certain outcomes happen with specified probabilities. Causal probabilities are typically easier for humans to estimate.

44
New cards

DeepProbLog (state of the art)

Extends ProbLog by adding “neural predicates”: a neural network outputs a probability distribution over classifications → used as probabilistic input to the logic program. Can learn from examples + keeping logical structure.

45
New cards

Machine learning in expert systems (two roles)

(1) Keep rule base updated by adapting/adding rules without expert input (e.g., Relational Learning learns first-order-like rules). (2) Fully automate decision model building + replace rule-based reasoning.

46
New cards

Explanation facility importance

Expert systems can often explain logical reasoning steps (which rules fired and why). This matters for trust and regulation (GDPR). Explanations become harder when statistical learning components are included, motivating Explainable AI research.

47
New cards

Why an LLM is not an expert system

LLMs imitate conversation (next-token prediction) rather than performing explicit rule-based inference over a knowledge base. They are not guaranteed to follow provided context/instructions (alignment issue), so they don’t provide the same formal correctness/explainability guarantees as expert systems.

48
New cards

LLMs as interface to expert systems

LLMs could be used as a natural-language interface on top of an expert system: translate user text into structured facts/queries + translate explanations back into friendly language (with caution).

49
New cards

Advantages of expert systems

Consistency (same inputs → same outputs), memory (don’t forget), logic (formal reasoning without sentimental bias), infinitely reproducible and don’t get tired.

50
New cards

Disadvantages of expert systems

Lack of common sense (hard to program), limited creativity (hard to invent new solutions), maintenance burden (knowledge base updates often manual), not inherently adaptable to changing conditions.