By Chris Rowe on April 3, 2021
I have never encountered hidden markov models in the wild, but I’ve read a bit about them and they are incredibly cool!
The basic idea starts with a Markov process (or chain), which is a stochastic process where future events depend only on the current state of the system. Consider the simple graph below:
This markov process includes two states, E and A. If the system is in state E, then at the next time step there is a 0.3 probability that the system will remain in state E and a 0.7 probability that the system will transition to state A. If the system is in state A, then at the next time step there is a 0.6 probability that the system will remain in state A and a 0.4 probability that the system will transition to state E. This process can unfold stochastically for as long as we desire, toggling back and forth between states E and A. Markov processes can exist in continuous time, but we are going to stick with discrete time steps.
Hidden markov models involve a second process that depends on an underlying Markov process. Imagine there is a Markov process like the one above but that we don’t observe it directly (it’s hidden!). Instead, we observe a second process that depends on the hidden Markov process. This second process is typically modeled as some probability distribution (commonly multinomial or gaussian, it seems), the parameters of which depend on the state of the hidden process.
So, there’s some underlying stochastic process that informs the distribution of a second stochastic process. Consider the Markov chain with states A and E described above. Let’s say that state A is associated with a continuous random variable \(X_A \sim N(\mu_A, \sigma_A)\) and state E is associated with a different continuous random variable \(X_E \sim N(\mu_E, \sigma_E)\). Thus, we would observe a sequence of random numbers drawn from either \(X_A\) or \(X_E\), depending on the sequence of underlying states A and E. Thus, for any HMM, there are two levels of model parameters: (1) the transition probabilities for the hidden states and (2) the parameters for the observed outcome distributions. Some of the main objectives for HMMs are:
It completely blows my mind that people came with algorithms that can identify all the model parameters with only the observed outcomes in hand. This may still all sound a little abstract, so let’s check out a simple example.
Imagine you’re playing a simple dice game at a casino. The house rolls a single six-sided die: you win on a one or two, no one wins on a three or four, and the house wins on a five or six. However, you’ve grown suspicious that the house has been cheating. You believe that they are occasionally switching out a fair die for a loaded die that is more likely to land on a five or six. One day, you record the outcomes of every single die roll.
In the context of an HMM, the two dice (fair vs. loaded) are the two hidden states. You assume that the house is transitioning between these two states with some fixed probabilities, but you never directly observe whether the house is using a fair or loaded die. Instead, you observe the outcomes of all the rolls. The outcome of a six-sided die corresponds to a multinomial distribution with six outcomes. A fair die has equal probability of landing on each outcome, but you suspect a loaded die will have higher probabilities of landing on five or six, giving the house an unfair advantage.
Hidden markov models and their associated algorithms are designed to tackle such a scenario. Given only the outcomes of the die rolls, you want to estimate when the house is using a fair die and when they are using a loaded die, as well as the outcome probabilities associated with each die.
The Python code below generates a sequence of 5000 die rolls, where the use of a fair or loaded die is determined by a Markov process. The house starts with a fair die and transitions between die types with a 5% probability (i.e., the house will roll each type of die an average of 20 times before switching to the other type of die). The fair die has equal probability of landing on each number from one to six, but the loaded die is much more likely to land on a five or six.
# set parameters
n = 5000
fair_bias_tr = 0.05
fair_probs = [1/6] * 6
bias_fair_tr = 0.05
bias_probs = [0.01, 0.01, 0.01, 0.01, 0.48, 0.48]
# generate states (i.e. type of die)
states = np.empty(n)
states[0] = 0
for t in range(1, n):
if states[t-1] == 0:
if np.random.binomial(1, fair_bias_tr) == 1:
states[t]=1
else:
states[t]=0
if states[t-1] == 1:
if np.random.binomial(1, bias_fair_tr) == 1:
states[t] = 0
else:
states[t] = 1
# generate outcomes (i.e., result of each die roll)
def get_outcome(state, fair_probs, bias_probs):
if state==0:
return np.random.choice(range(6), 1, p=fair_probs)[0]
else:
return np.random.choice(range(6), 1, p=bias_probs)[0]
outcomes = np.array([get_outcome(x, fair_probs, bias_probs) for x in states])
# organize states and outcomes
states = np.array(states)
outcomes = np.array(outcomes)
outcomes = outcomes.reshape(-1,1)
outcomes = outcomes.astype(np.int)
Now that we have the results of our die rolls, let’s fit a hidden Markov model to estimate the following:
HMMs use stochastic algorithms to estimate parameters, so let’s fit the model 50 times and save the model with the best fit. I’ve found that this is necessary for the algorithm to achieve good results.
We will use the hmmlearn
Python package to wield our HMM magic.
from hmmlearn import hmm
models = []
scores = []
for i in range(50):
model = hmm.MultinomialHMM(n_components=2)
model.fit(outcomes, lengths = np.array([n]))
models.append(model)
scores.append(model.score(outcomes))
best_model = models[scores.index(np.array(scores).max())]
Now that we’ve got our best model fit, let’s compare states (i.e., the type of die) predicted by the model to the true states from the process we generated above. Here is a plot of the cumulative accuracy of the models productions over the sequence of 5000 die rolls. Wow! Our model ends up correctly predicting whether the the house is using a fair or loaded die 93.16% of the time, amazing!
Now let’s see how the algorithm did at predicting the outcome probabilites for the fair and loaded dice. The plots below show the true and estimated probabilities for each outcome and die. Nice! These estimates are quite close to the truth. It’s clear that the estimates for rolling a 5 or 6 with the fair day are a bit low, suggesting the model is probably mis-classifying some fair 5 and 6 rolls as being associated with the loaded die. This seems reasonable given the loaded die is way more likely to hit a 5 or 6. Either way, these results are great!
I personally think that it is absolutely incredible that people have come up with these algorithms. Not only have they come up with a general approach for estimating these states and outcome probabilities, but they’ve also come up how to do it lightning fast! It took mere seconds to fit our model to the 5000 die rolls 50 different times, and the results were excellent!
I’m not well-versed in the actual algorithms that are used to fit these models, but I came across a fantastic draft chapter of a textook by Daniel Jurafsky and James H. Martin here, which includes a lot of great details on how these models actually do what they do. I read through it and the algorithms are somehow both intuitive and mind-boggling at the same time. I definitely recommend taking a look if you are interested
It’s also worth highlighting that the liar’s dice example is pretty much as simple as it gets with HMMs. I am really excited to continue exploring scenarios with alternative outcome distributions and in continuous time. What a time to be acquainted with the field of statistics!