Dynamic Bayesian Networks & Hidden Markov Models

0.0(0)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/24

flashcard set

Earn XP

Description and Tags

basically about how bayesian networks can model sequences or temporal data, how to build HMMs (generate sequences / calculate useful probabilities)

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

25 Terms

1
New cards

what are dynamic bayesian networks

a probabilistic graphical model that captures systems' temporal dependencies and evolution over time.

  • nodes represent variables (features) at distinct time slice

  • there are directed links between nodes within the same time slice (BN) and between time slices (new)

    • e.g. lag between an observation in one feature and an observation in another

  • often assumes 1st order Markov Assumption

  • for a single time slice, the base network captures the conditional dependencies for a given point in time like a normal BN

e.g., 2 time slices, n features, can have auto-regressive links

<p><span>a probabilistic graphical model that captures systems' temporal dependencies and evolution over time.</span></p><ul><li><p>nodes represent variables (features) at distinct time slice</p></li><li><p>there are directed links between nodes within the same time slice (BN) and between time slices (new)</p><ul><li><p>e.g. lag between an observation in one feature and an observation in another</p></li></ul></li><li><p>often assumes 1st order Markov Assumption</p></li><li><p>for a single time slice, the base network captures the conditional dependencies for a given point in time like a normal BN </p></li></ul><p></p><p>e.g., 2 time slices, n features, can have auto-regressive links </p>
2
New cards

what is an auto-regressive link

a link from the same feature at a previous time slice to the feature at the current time slice

  • the feature is influenced by its previous self

3
New cards

what are the ‘rules’ of DBN

  • no cycles

  • can’t have links going back in time

4
New cards

what is the 1st Order Markov Assumption

links only need to span adjacent time slices in order to make predictions over time

  • e.g. link between T → T1, not a huge jump which causes complexity like T → T10

  • links can’t jump over many time slices, only adjacent

5
New cards

what can we use DBN for

can still enter info and see the impact it has on the rest of the notes, even across different time points

  • can come up with an explanation for while a current time point value has a certain value - explaining

  • or make predictions on the future trajectory of the features - forecasting

6
New cards

what are Markov Chains

one of the simplest state-based models that links a number of possible state transitions over a sequence or time window

  • models 1 variable as it changes over time

  • H is a sequence of data being modelled

7
New cards

what do Markov Chains require

requires

  • an initial probability distribution (PD for the states at the initial time slice)

  • a transition probability distribution matrix (determines state changes over time given previous states)

assumes 1st order markov assumption where only immediate previous state is needed to calculate the next state p(ht|ht-1) which is the conditional probability

  • where current state determines probability distribution of next stat

<p>requires</p><ul><li><p>an <strong>initial probability distribution</strong> (PD for the states at the initial time slice)</p></li><li><p>a <strong>transition probability distribution</strong> matrix (determines state changes over time given previous states)</p></li></ul><p>assumes 1st order markov assumption where only immediate previous state is needed to calculate the next state p(ht|ht-1) which is the conditional probability</p><ul><li><p>where current state determines probability distribution of next stat</p></li></ul><p></p>
8
New cards
<p>how can we calculate the probability of a sequence H using markov chains </p>

how can we calculate the probability of a sequence H using markov chains

H = 1, 2, 1, 2

P = 0.5, 0.1, 0.3, 0.1

transitions = initial 1, from 1 to 2, from 2 to 1, from 1 to 2

  • the first value is always the initial state since there’s nothing to transition from, so use the probability from the initial probability distribution

  • use the transition probability distribution values starting from the second value in the sequence

<p>H = <span style="color: #000000"><mark data-color="#ffcdcd" style="background-color: #ffcdcd; color: inherit">1,</mark></span> <mark data-color="#e3ffd5" style="background-color: #e3ffd5; color: inherit">2, 1, 2</mark></p><p>P =<mark data-color="#ffd9d9" style="background-color: #ffd9d9; color: inherit"> 0.5</mark><mark data-color="#ffe1e1" style="background-color: #ffe1e1; color: inherit">,</mark> <span style="color: #000000"><mark data-color="#e1ffcd" style="background-color: #e1ffcd; color: inherit">0.1, 0.3, 0.1</mark></span></p><p>transitions = <mark data-color="#ffd1d1" style="background-color: #ffd1d1; color: inherit">initial 1</mark>, <span style="color: #000000"><mark data-color="#ddffcc" style="background-color: #ddffcc; color: inherit">from 1 to 2, from 2 to 1, from 1 to 2</mark></span></p><p></p><ul><li><p>the first value is always the initial state since there’s nothing to transition from, so use the probability from the<strong><em><mark data-color="#ffd5d5" style="background-color: #ffd5d5; color: inherit"> </mark></em><mark data-color="#ffd5d5" style="background-color: #ffd5d5; color: inherit">initial probability distribution </mark></strong></p></li><li><p>use the <strong><mark data-color="#e4ffcd" style="background-color: #e4ffcd; color: inherit">transition probability distribution</mark></strong> values starting from the second value in the sequence</p></li></ul><p></p>
9
New cards

how can we visualise state transition probabilities

create a state transition diagram using the given markov chain and its state transition distribution matrix

  • e.g. a markov chain with three states

<p>create a state transition diagram using the given markov chain and its state transition distribution matrix</p><ul><li><p>e.g. a markov chain with three states</p></li></ul><p></p>
10
New cards

what is a hidden state

underlying variables that generate the observed data, but they are not directly observable

e.g. in the example, the hidden state is which urn we have been given, we don’t know what we’ve been given but we can observe it

<p><span>underlying variables that generate the observed data, but they are not directly observable</span></p><p><span>e.g. in the example, the hidden state is which urn we have been given, we don’t know what we’ve been given but we can observe it </span></p>
11
New cards

what are observations

the variables that are measured and observed

12
New cards

what is a hidden markov model (HMM)

used to model the probabilistic relationship between a sequence of observations and a sequence of hidden states

  • generate sequences of observed states based on hidden states

13
New cards

what do HMMs require

  • initial state distribution

  • transition distribution - the probability of transitioning from one hidden state to another, using a Markov Chain H that represents the hidden states that the hidden variable transitions between over time

    • P(Ht |Ht-1 )

  • emission distribution - the probability of observing an output given (conditioned upon) a hidden state, where observed nodes are X. linked directly to the hidden node in the same time slice

    • P(Xt |Ht )

<ul><li><p class="has-focus">initial state distribution</p></li></ul><ul><li><p>transition distribution - the probability of transitioning from one hidden state to another, using a Markov Chain H that represents the hidden states that the hidden variable transitions between over time</p><ul><li><p>P(Ht |Ht-1 )</p></li></ul></li></ul><p></p><ul><li><p>emission distribution - the probability of observing an output given (conditioned upon) a hidden state, where observed nodes are X. linked directly to the hidden node in the same time slice</p><ul><li><p>P(Xt |Ht )</p></li></ul></li></ul><p></p>
14
New cards

what is the graphical representation of a HMM

the H nodes are hidden, all we see are the observable X nodes, but we know that the X nodes are controlled in some way by the H hidden nodes

  • this structure is fixed and not learnt from data like BN, just imposed

<p>the H nodes are hidden, all we see are the observable X nodes, but we know that the X nodes are controlled in some way by the H hidden nodes</p><ul><li><p>this structure is fixed and not learnt from data like BN, just imposed </p></li></ul><p></p>
15
New cards

what are the benefits of the hidden state

can capture more complexities in the probabilities

16
New cards
<p>what is the probability of the hidden state sequence {1, 2, 2} </p>

what is the probability of the hidden state sequence {1, 2, 2}

= 1 × 0.2 × 0.6 = 0.12

17
New cards
<p>given this HMM set up, how can a sequence be modelled</p>

given this HMM set up, how can a sequence be modelled

1st loop:

  • start with the initial state for H = 1 (since this model it can’t be 2)

  • then based on H = 1, look at the emission probability for X, (since probably is both 50/50, its randomly chosen as 1

2nd loop

  • since H is hidden, the value of if it transitions or not is based on its transition distribution, but we don’t change or choose this (imagine hidden state is acting by itself simulated with a random number generator), we just observe the new state H e.g. H is 1 again

  • based on new state H, look at emission distribution to get the value for X e.g. 50 % change it could be in either state 1 or 2, (this can also be simulated to randomly choose, we are observing, not choosing)

3rd loop

  • let’s assume H transitions to state 2 (on its own accord)

  • then looking at the emission distribution, the probabilities for X being 2 are heavily biased to state 1 compared to before. so more likely it will be state 2.

<p>1st loop: </p><ul><li><p>start with the initial state for H = 1 (since this model it can’t be 2)</p></li><li><p>then based on H = 1, look at the emission probability for X, (since probably is both 50/50, its randomly chosen as 1</p></li></ul><p></p><p>2nd loop</p><ul><li><p>since H is hidden, the value of if it transitions or not is based on its transition distribution, but we don’t change or choose this (imagine hidden state is acting by itself simulated with a random number generator), we just observe the new state H e.g. H is 1 again </p></li><li><p>based on new state H, look at emission distribution to get the value for X e.g. 50 % change it could be in either state 1 or 2, (this can also be simulated to randomly choose, we are observing, not choosing)</p></li></ul><p></p><p>3rd loop</p><ul><li><p>let’s assume H transitions to state 2 (on its own accord)</p></li><li><p>then looking at the emission distribution, the probabilities for X being 2 are heavily biased to state 1 compared to before. so more likely it will be state 2.</p></li></ul><p></p>
18
New cards
<p>what state sequence is most likely for the hidden state given the observed state </p>

what state sequence is most likely for the hidden state given the observed state

assume that the most likely state sequence always uses the state where the probably of each individual observation is max

  • Heads - max is 0.75 in state 2

  • Tails - max is 0.75 in state 3

therefore assume

  • every H in sequence X comes from state 2

  • every T in sequence X comes from state 3

final answer

  • H = 2, 2, 2, 2, 3

<p>assume that the most likely state sequence always uses the state where the probably of each individual observation is max </p><ul><li><p>Heads - max is 0.75 in state 2</p></li><li><p>Tails - max is 0.75 in state 3</p></li></ul><p></p><p>therefore assume</p><ul><li><p>every H in sequence X comes from state 2</p></li><li><p>every T in sequence X comes from state 3</p></li></ul><p></p><p>final answer</p><ul><li><p>H = 2, 2, 2, 2, 3</p></li></ul><p></p>
19
New cards
<p>what is the probability of the observation sequence AND the most likely hidden sequence</p><ul><li><p>H = 2, 2, 2, 2, 3</p></li><li><p>X = H, H, H, H, T</p></li></ul><p></p>

what is the probability of the observation sequence AND the most likely hidden sequence

  • H = 2, 2, 2, 2, 3

  • X = H, H, H, H, T

times everything together

  • 1st loop with state 2 and X = H = 0.3 × 0.75

  • 2nd loop with state 2 and X = H = 0.3 × 0.75

  • 3rd loop with state 2 and X = H = 0.3 × 0.75

  • 4th loop with state 2 and X = H = 0.3 × 0.75

  • 5th loop with state change to 3 and X = T = 0.5 × 0.75 (slides are incorrect)

times all 5 values together

<p>times everything together</p><ul><li><p>1st loop with state 2 and X = H = 0.3 × 0.75</p></li><li><p>2nd loop with state 2 and X = H = 0.3 × 0.75</p></li><li><p>3rd loop with state 2 and X = H = 0.3 × 0.75</p></li><li><p>4th loop with state 2 and X = H = 0.3 × 0.75</p></li><li><p>5th loop with state change to 3 and X = T = 0.5 × 0.75 (slides are incorrect)</p></li></ul><p></p><p>times all 5 values together</p>
20
New cards

what are some typical algorithms for HMMs

  • forward algorithm

  • most probable explanation using the Viterbi algorithm

  • Learning using the Expectation Maximisation algorithm

21
New cards

what is the forward algorithm

Given an observed sequence, X and a known model (HMM), M, how do we compute the probability of seeing that sequence from the model?: p(X | M)

  • helps us learn and parameterise the model

  • basically, what’s the most likely model that fits the data

22
New cards

what is the Most Probable Explanation using the Viterbi algorithm

Given the observed sequence and the model, how do we choose an optimal hidden state sequence?

  • how can we predict what the underlying hidden state sequence is given some observations

23
New cards

what is the Learning using the Expectation Maximisation algorithm

How do we adjust the model parameters to maximise the probability of the observed sequence given the model?

  • how can we learn what the hidden parameters are

  • used to do clustering where H is the hidden class variable which becomes the cluster variable

24
New cards

what are the applications of HMMs (and thus DBN)

  • aligning sequences of nucleotides (genetic medicine)

    • Finding where sequences are the same

  • text modelling, sequence of words

  • speech recognition

  • monitoring and tracking behaviours, e.g. modelling an elderly person in a house, classifying whether someone is sitting down or cooking based on the dynamics of an accelerometer to detect between normal and abnormal behaviour

  • recognition of mouse gestures like swiping tapping recognition

25
New cards

what’s the difference between HMM and DBN

HMM are a specific type of DBN

  • HMM are temporal models for time-series data where a single hidden state changes through time, influencing the observations

  • DBN capture how dependencies among a wide range of relationships between variables change through time