diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 1afdaf7d..b56d367a 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-29T19:05:07","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-30T07:55:59","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/alternatives/index.html b/dev/alternatives/index.html index bdfa1436..406f221d 100644 --- a/dev/alternatives/index.html +++ b/dev/alternatives/index.html @@ -1,2 +1,2 @@ -Alternatives · HiddenMarkovModels.jl

Competitors

Julia

We compare features among the following Julia packages:

We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.

HiddenMarkovModels.jlHMMBase.jlHMMGradients.jl
Algorithms[1]V, FB, BWV, FB, BWFB
Number typesanythingFloat64AbstractFloat
Observation typesanythingnumber or vectoranything
Observation distributionsDensityInterface.jlDistributions.jlmanual
Multiple sequencesyesnoyes
Priors / structurespossiblenopossible
Control dependencyyesnono
Automatic differentiationyesnoyes
Linear algebra speedupyesyesno
Numerical stabilityscaling+scaling+log
Very small probabilities

In all HMM algorithms, we work with probabilities that may become very small as time progresses. There are two main solutions for this problem: scaling and logarithmic computations. This package implements the Viterbi algorithm in log scale, but the other algorithms use scaling to exploit BLAS operations. As was done in HMMBase.jl, we enhance scaling with a division by the highest observation loglikelihood: instead of working with $b_{i,t} = \mathbb{P}(Y_t | X_t = i)$, we use $b_{i,t} / \max_i b_{i,t}$. See Formulas for details.

Python

We compare features among the following Python packages:

hmmlearnpomegranatedynamax
Algorithms[1]V, FB, BW, VIFB, BWFB, V, BW, GD
Number typesNumPy formatsPyTorch formatsJAX formats
Observation typesnumber or vectornumber or vectornumber or vector
Observation distributionshmmlearn cataloguepomegranate cataloguedynamax catalogue
Multiple sequencesyesyesyes
Priors / structuresyesnoyes
Control dependencynonoyes
Automatic differentiationnoyesyes
Linear algebra speedupyesyesyes
Numerical stabilityscaling / logloglog
  • 1V = Viterbi, FB = Forward-Backward, BW = Baum-Welch, VI = Variational Inference, GD = Gradient Descent
+Alternatives · HiddenMarkovModels.jl

Competitors

Julia

We compare features among the following Julia packages:

We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.

HiddenMarkovModels.jlHMMBase.jlHMMGradients.jl
Algorithms[1]V, FB, BWV, FB, BWFB
Number typesanythingFloat64AbstractFloat
Observation typesanythingnumber or vectoranything
Observation distributionsDensityInterface.jlDistributions.jlmanual
Multiple sequencesyesnoyes
Priors / structurespossiblenopossible
Control dependencyyesnono
Automatic differentiationyesnoyes
Linear algebra speedupyesyesno
Numerical stabilityscaling+scaling+log
Very small probabilities

In all HMM algorithms, we work with probabilities that may become very small as time progresses. There are two main solutions for this problem: scaling and logarithmic computations. This package implements the Viterbi algorithm in log scale, but the other algorithms use scaling to exploit BLAS operations. As was done in HMMBase.jl, we enhance scaling with a division by the highest observation loglikelihood: instead of working with $b_{i,t} = \mathbb{P}(Y_t | X_t = i)$, we use $b_{i,t} / \max_i b_{i,t}$. See Formulas for details.

Python

We compare features among the following Python packages:

hmmlearnpomegranatedynamax
Algorithms[1]V, FB, BW, VIFB, BWFB, V, BW, GD
Number typesNumPy formatsPyTorch formatsJAX formats
Observation typesnumber or vectornumber or vectornumber or vector
Observation distributionshmmlearn cataloguepomegranate cataloguedynamax catalogue
Multiple sequencesyesyesyes
Priors / structuresyesnoyes
Control dependencynonoyes
Automatic differentiationnoyesyes
Linear algebra speedupyesyesyes
Numerical stabilityscaling / logloglog
  • 1V = Viterbi, FB = Forward-Backward, BW = Baum-Welch, VI = Variational Inference, GD = Gradient Descent
diff --git a/dev/api/index.html b/dev/api/index.html index 67b371e5..bab36e55 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,13 +1,13 @@ -API reference · HiddenMarkovModels.jl

API reference

Sequence formatting

Most algorithms below ingest the data with two positional arguments obs_seq (mandatory) and control_seq (optional), and a keyword argument seq_ends (optional).

  • If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.
  • If the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:
obs_seq = reduce(vcat, obs_seqs)
+API reference · HiddenMarkovModels.jl

API reference

Sequence formatting

Most algorithms below ingest the data with two positional arguments obs_seq (mandatory) and control_seq (optional), and a keyword argument seq_ends (optional).

  • If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.
  • If the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:
obs_seq = reduce(vcat, obs_seqs)
 control_seq = reduce(vcat, control_seqs)
-seq_ends = cumsum(length.(obs_seqs))

Types

HiddenMarkovModels.HMMType
struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector), Vl<:(AbstractVector), Ml<:(AbstractMatrix)} <: AbstractHMM

Basic implementation of an HMM.

Fields

  • init::AbstractVector: initial state probabilities

  • trans::AbstractMatrix: state transition probabilities

  • dists::AbstractVector: observation distributions

  • loginit::AbstractVector: logarithms of initial state probabilities

  • logtrans::AbstractMatrix: logarithms of state transition probabilities

source

Interface

HiddenMarkovModels.obs_distributionsFunction
obs_distributions(hmm)
-obs_distributions(hmm, control)

Return a vector of observation distributions, one for each state of hmm (possibly when control is applied).

These distribution objects should implement

  • Random.rand(rng, dist) for sampling
  • DensityInterface.logdensityof(dist, obs) for inference
  • StatsAPI.fit!(dist, obs_seq, weight_seq) for learning
source

Utils

Base.randFunction
rand([rng,] hmm, T)
-rand([rng,] hmm, control_seq)

Simulate hmm for T time steps, or when the sequence control_seq is applied.

Return a named tuple (; state_seq, obs_seq).

source
Base.eltypeFunction
eltype(hmm, obs, control)

Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.

It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.

source
HiddenMarkovModels.seq_limitsFunction
seq_limits(seq_ends, k)
-

Return a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.

source

Inference

DensityInterface.logdensityofFunction
logdensityof(hmm)

Return the prior loglikelihood associated with the parameters of hmm.

source
logdensityof(hmm, obs_seq; ...)
+seq_ends = cumsum(length.(obs_seqs))

Types

HiddenMarkovModels.HMMType
struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector), Vl<:(AbstractVector), Ml<:(AbstractMatrix)} <: AbstractHMM

Basic implementation of an HMM.

Fields

  • init::AbstractVector: initial state probabilities

  • trans::AbstractMatrix: state transition probabilities

  • dists::AbstractVector: observation distributions

  • loginit::AbstractVector: logarithms of initial state probabilities

  • logtrans::AbstractMatrix: logarithms of state transition probabilities

source

Interface

HiddenMarkovModels.obs_distributionsFunction
obs_distributions(hmm)
+obs_distributions(hmm, control)

Return a vector of observation distributions, one for each state of hmm (possibly when control is applied).

These distribution objects should implement

  • Random.rand(rng, dist) for sampling
  • DensityInterface.logdensityof(dist, obs) for inference
  • StatsAPI.fit!(dist, obs_seq, weight_seq) for learning
source

Utils

Base.randFunction
rand([rng,] hmm, T)
+rand([rng,] hmm, control_seq)

Simulate hmm for T time steps, or when the sequence control_seq is applied.

Return a named tuple (; state_seq, obs_seq).

source
Base.eltypeFunction
eltype(hmm, obs, control)

Return a type that can accommodate forward-backward computations for hmm on observations similar to obs.

It is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.

source
HiddenMarkovModels.seq_limitsFunction
seq_limits(seq_ends, k)
+

Return a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.

source

Inference

DensityInterface.logdensityofFunction
logdensityof(hmm)

Return the prior loglikelihood associated with the parameters of hmm.

source
logdensityof(hmm, obs_seq; ...)
 logdensityof(hmm, obs_seq, control_seq; seq_ends)
-

Run the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.

source
HiddenMarkovModels.joint_logdensityofFunction
joint_logdensityof(hmm, obs_seq, state_seq; ...)
+

Run the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.

source
HiddenMarkovModels.joint_logdensityofFunction
joint_logdensityof(hmm, obs_seq, state_seq; ...)
 joint_logdensityof(
     hmm,
     obs_seq,
@@ -15,13 +15,13 @@
     control_seq;
     seq_ends
 )
-

Run the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.

source
HiddenMarkovModels.forwardFunction
forward(hmm, obs_seq; ...)
+

Run the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.

source
HiddenMarkovModels.forwardFunction
forward(hmm, obs_seq; ...)
 forward(hmm, obs_seq, control_seq; seq_ends)
-

Apply the forward algorithm to infer the current state after sequence obs_seq for hmm.

Return a tuple (storage.α, storage.logL) where storage is of type ForwardStorage.

source
HiddenMarkovModels.viterbiFunction
viterbi(hmm, obs_seq; ...)
+

Apply the forward algorithm to infer the current state after sequence obs_seq for hmm.

Return a tuple (storage.α, storage.logL) where storage is of type ForwardStorage.

source
HiddenMarkovModels.viterbiFunction
viterbi(hmm, obs_seq; ...)
 viterbi(hmm, obs_seq, control_seq; seq_ends)
-

Apply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.

Return a tuple (storage.q, storage.logL) where storage is of type ViterbiStorage.

source
HiddenMarkovModels.forward_backwardFunction
forward_backward(hmm, obs_seq; ...)
 forward_backward(hmm, obs_seq, control_seq; seq_ends)
-

Apply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.

Return a tuple (storage.γ, storage.logL) where storage is of type ForwardBackwardStorage.

source

Learning

HiddenMarkovModels.baum_welchFunction
baum_welch(hmm_guess, obs_seq; ...)
+

Apply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.

Return a tuple (storage.γ, storage.logL) where storage is of type ForwardBackwardStorage.

source

Learning

HiddenMarkovModels.baum_welchFunction
baum_welch(hmm_guess, obs_seq; ...)
 baum_welch(
     hmm_guess,
     obs_seq,
@@ -31,23 +31,23 @@
     max_iterations,
     loglikelihood_increasing
 )
-

Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.

Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.

Keyword arguments

  • atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)
  • max_iterations: maximum number of iterations of the algorithm
  • loglikelihood_increasing: whether to throw an error if the loglikelihood decreases
source
StatsAPI.fit!Function
StatsAPI.fit!(
+

Apply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.

Return a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.

Keyword arguments

  • atol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)
  • max_iterations: maximum number of iterations of the algorithm
  • loglikelihood_increasing: whether to throw an error if the loglikelihood decreases
source
StatsAPI.fit!Function
StatsAPI.fit!(
     hmm, fb_storage::ForwardBackwardStorage,
     obs_seq, [control_seq]; seq_ends,
-)

Update hmm in-place based on information generated during forward-backward.

This function is allowed to reuse fb_storage as a scratch space, so its contents should not be trusted afterwards.

source

In-place versions

Forward

HiddenMarkovModels.ForwardStorageType
struct ForwardStorage{R}

Fields

Only the fields with a description are part of the public API.

  • α::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • c::Vector

source

Viterbi

HiddenMarkovModels.ViterbiStorageType
struct ViterbiStorage{R}

Fields

Only the fields with a description are part of the public API.

  • q::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])

  • logL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence

  • logB::Matrix

  • ϕ::Matrix

  • ψ::Matrix{Int64}

source

Forward-backward

HiddenMarkovModels.ForwardBackwardStorageType
struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}

Fields

Only the fields with a description are part of the public API.

  • γ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])

  • ξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • α::Matrix

  • c::Vector

  • β::Matrix

  • Bβ::Matrix

source
HiddenMarkovModels.initialize_forward_backwardFunction
initialize_forward_backward(
+)

Update hmm in-place based on information generated during forward-backward.

This function is allowed to reuse fb_storage as a scratch space, so its contents should not be trusted afterwards.

source

In-place versions

Forward

HiddenMarkovModels.ForwardStorageType
struct ForwardStorage{R}

Fields

Only the fields with a description are part of the public API.

  • α::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • c::Vector

source

Viterbi

HiddenMarkovModels.ViterbiStorageType
struct ViterbiStorage{R}

Fields

Only the fields with a description are part of the public API.

  • q::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])

  • logL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence

  • logB::Matrix

  • ϕ::Matrix

  • ψ::Matrix{Int64}

source

Forward-backward

HiddenMarkovModels.ForwardBackwardStorageType
struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}

Fields

Only the fields with a description are part of the public API.

  • γ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])

  • ξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])

  • logL::Vector: one loglikelihood per observation sequence

  • B::Matrix

  • α::Matrix

  • c::Vector

  • β::Matrix

  • Bβ::Matrix

source

Baum-Welch

Baum-Welch

Miscellaneous

HiddenMarkovModels.fit_in_sequence!Function
fit_in_sequence!(dists, i, x, w)
-

Modify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.

Default behavior:

fit!(dists[i], x, w)

Override for Distributions.jl (in the package extension)

dists[i] = fit(eltype(dists), turn_into_vector(x), w)
source

Internals

HiddenMarkovModels.LightDiagNormalType
struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}

An HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • μ::AbstractVector: means

  • σ::AbstractVector: standard deviations

  • logσ::AbstractVector: log standard deviations

source
HiddenMarkovModels.LightCategoricalType
struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}

An HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • p::AbstractVector: class probabilities

  • logp::AbstractVector: log class probabilities

source
HiddenMarkovModels.log_transition_matrixFunction
log_transition_matrix(hmm)
-log_transition_matrix(hmm, control)

Return the matrix of state transition log-probabilities for hmm (possibly when control is applied).

Falls back on transition_matrix.

source
HiddenMarkovModels.argmaxplus_transmul!Function
argmaxplus_transmul!(y, ind, A, x)

Perform the in-place multiplication transpose(A) * x in the sense of max-plus algebra, store the result in y, and store the index of the maximum for each component of y in ind.

source

Index

+
source

Miscellaneous

HiddenMarkovModels.valid_hmmFunction
valid_hmm(hmm)

Perform some checks to rule out obvious inconsistencies with an AbstractHMM object.

source
HiddenMarkovModels.rand_prob_vecFunction
rand_prob_vec([rng, ::Type{R},] N)

Generate a random probability distribution of size N with normalized uniform entries.

source
HiddenMarkovModels.rand_trans_matFunction
rand_trans_mat([rng, ::Type{R},] N)

Generate a random transition matrix of size (N, N) with normalized uniform entries.

source
HiddenMarkovModels.fit_in_sequence!Function
fit_in_sequence!(dists, i, x, w)
+

Modify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.

Default behavior:

fit!(dists[i], x, w)

Override for Distributions.jl (in the package extension)

dists[i] = fit(eltype(dists), turn_into_vector(x), w)
source

Internals

HiddenMarkovModels.LightDiagNormalType
struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}

An HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • μ::AbstractVector: means

  • σ::AbstractVector: standard deviations

  • logσ::AbstractVector: log standard deviations

source
HiddenMarkovModels.LightCategoricalType
struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}

An HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.

This is not part of the public API and is expected to change.

Fields

  • p::AbstractVector: class probabilities

  • logp::AbstractVector: log class probabilities

source
HiddenMarkovModels.log_initializationFunction
log_initialization(hmm)

Return the vector of initial state log-probabilities for hmm.

Falls back on initialization.

source
HiddenMarkovModels.log_transition_matrixFunction
log_transition_matrix(hmm)
+log_transition_matrix(hmm, control)

Return the matrix of state transition log-probabilities for hmm (possibly when control is applied).

Falls back on transition_matrix.

source
HiddenMarkovModels.mul_rows_cols!Function
mul_rows_cols!(B, l, A, r)

Perform the in-place operation B .= l .* A .* transpose(r).

source
HiddenMarkovModels.argmaxplus_transmul!Function
argmaxplus_transmul!(y, ind, A, x)

Perform the in-place multiplication transpose(A) * x in the sense of max-plus algebra, store the result in y, and store the index of the maximum for each component of y in ind.

source

Index

diff --git a/dev/debugging/index.html b/dev/debugging/index.html index 37476a2b..cc31939f 100644 --- a/dev/debugging/index.html +++ b/dev/debugging/index.html @@ -1,2 +1,2 @@ -Debugging · HiddenMarkovModels.jl

Debugging

Numerical underflow

The most frequent error you will encounter is an underflow during inference, caused by some values being infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:

  • Increase the duration of the sequence / the number of sequences to get more data
  • Add a prior to your transition matrix / observation distributions to avoid degenerate behavior (like zero variance in a Gaussian or zero probability in a Bernoulli)
  • Reduce the number of states to make every one of them useful
  • Pick a better initialization to start closer to the supposed ground truth
  • Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own Custom distributions.

Method errors

This might be caused by:

  • forgetting to define methods for your custom type
  • omitting control_seq or seq_ends in some places.

Check the API reference.

Performance

If your algorithms are too slow, you can leverage the existing Interfaces to improve the components of your model separately (first observation distributions, then fitting). The usual advice always applies:

  • Use BenchmarkTools.jl to establish a baseline
  • Use profiling to see where you spend most of your time
  • Use JET.jl to track down type instabilities
  • Use AllocCheck.jl to reduce allocations
+Debugging · HiddenMarkovModels.jl

Debugging

Numerical underflow

The most frequent error you will encounter is an underflow during inference, caused by some values being infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:

  • Increase the duration of the sequence / the number of sequences to get more data
  • Add a prior to your transition matrix / observation distributions to avoid degenerate behavior (like zero variance in a Gaussian or zero probability in a Bernoulli)
  • Reduce the number of states to make every one of them useful
  • Pick a better initialization to start closer to the supposed ground truth
  • Use numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own Custom distributions.

Method errors

This might be caused by:

  • forgetting to define methods for your custom type
  • omitting control_seq or seq_ends in some places.

Check the API reference.

Performance

If your algorithms are too slow, you can leverage the existing Interfaces to improve the components of your model separately (first observation distributions, then fitting). The usual advice always applies:

  • Use BenchmarkTools.jl to establish a baseline
  • Use profiling to see where you spend most of your time
  • Use JET.jl to track down type instabilities
  • Use AllocCheck.jl to reduce allocations
diff --git a/dev/examples/autodiff/index.html b/dev/examples/autodiff/index.html index edb28e67..67af528c 100644 --- a/dev/examples/autodiff/index.html +++ b/dev/examples/autodiff/index.html @@ -76,4 +76,4 @@ Enzyme.Duplicated(obs_seq, ∇obs_enzyme), Enzyme.Duplicated(control_seq, ∇control_enzyme), Enzyme.Const(seq_ends), -)
((nothing, nothing, nothing, nothing),)

Once again we can check the results.

∇parameters_enzyme ≈ ∇parameters_forwarddiff
true
∇obs_enzyme ≈ ∇obs_forwarddiff
true
∇control_enzyme ≈ ∇control_forwarddiff
true

For increased efficiency, we could provide temporary storage to Enzyme.jl in order to avoid allocations. This requires going one level deeper and leveraging the in-place HiddenMarkovModels.forward! function.

Gradient methods

Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.

First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.

Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.

Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.


This page was generated using Literate.jl.

+)
((nothing, nothing, nothing, nothing),)

Once again we can check the results.

∇parameters_enzyme ≈ ∇parameters_forwarddiff
true
∇obs_enzyme ≈ ∇obs_forwarddiff
true
∇control_enzyme ≈ ∇control_forwarddiff
true

For increased efficiency, we could provide temporary storage to Enzyme.jl in order to avoid allocations. This requires going one level deeper and leveraging the in-place HiddenMarkovModels.forward! function.

Gradient methods

Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.

First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.

Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.

Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.


This page was generated using Literate.jl.

diff --git a/dev/examples/basics/index.html b/dev/examples/basics/index.html index ca751f41..7972736f 100644 --- a/dev/examples/basics/index.html +++ b/dev/examples/basics/index.html @@ -47,22 +47,22 @@ [-0.256414, -0.563875] [-0.5, -0.8] [0.652218, 0.841332] [0.5, 0.8]

On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.

hcat(initialization(hmm_est), initialization(hmm))
2×2 Matrix{Float64}:
  1.91765e-19  0.6
- 1.0          0.4

Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state $i$ in the true model will correspond to state $i$ in the estimated model. This is important to keep in mind when testing new models.

Multiple sequences

In many applications, we have access to various observation sequences of different lengths.

nb_seqs = 300
+ 1.0          0.4

Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state $i$ in the true model will correspond to state $i$ in the estimated model. This is important to keep in mind when testing new models.

Multiple sequences

In many applications, we have access to various observation sequences of different lengths.

nb_seqs = 1000
 long_obs_seqs = [last(rand(rng, hmm, rand(rng, 100:200))) for k in 1:nb_seqs];
 typeof(long_obs_seqs)
Vector{Vector{Vector{Float64}}} (alias for Array{Array{Array{Float64, 1}, 1}, 1})

Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.

long_obs_seq_concat = reduce(vcat, long_obs_seqs)
 typeof(long_obs_seq_concat)
Vector{Vector{Float64}} (alias for Array{Array{Float64, 1}, 1})
seq_ends = cumsum(length.(long_obs_seqs))
-seq_ends'
1×300 adjoint(::Vector{Int64}) with eltype Int64:
- 174  279  388  539  737  894  1007  …  44408  44557  44737  44855  45032

The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size length(seq_ends)).

best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(
+seq_ends'
1×1000 adjoint(::Vector{Int64}) with eltype Int64:
+ 174  279  388  539  737  894  1007  1161  …  148712  148826  148945  149138

The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size length(seq_ends)).

best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(
     hmm, long_obs_seq_concat; seq_ends
 );
length(best_joint_loglikelihood_concat) == length(seq_ends)
true
length(best_state_seq_concat) == last(seq_ends)
true

The function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.

start2, stop2 = seq_limits(seq_ends, 2)
(175, 279)
best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seqs[2]))
true

While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.

hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);

Our estimate should be a little better.

cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)
2×2×2 Array{Float64, 3}:
 [:, :, 1] =
- 0.69677   0.30323
- 0.207515  0.792485
+ 0.700421  0.299579
+ 0.203473  0.796527
 
 [:, :, 2] =
  0.7  0.3
  0.2  0.8
map(mean, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))
2×2 Matrix{Vector{Float64}}:
- [-0.503771, -0.796337]  [-0.5, -0.8]
- [0.509728, 0.805274]    [0.5, 0.8]
hcat(initialization(hmm_est_concat), initialization(hmm))
2×2 Matrix{Float64}:
- 0.629082  0.6
- 0.370918  0.4

This page was generated using Literate.jl.

+ [-0.502073, -0.800874] [-0.5, -0.8] + [0.504019, 0.801399] [0.5, 0.8]
hcat(initialization(hmm_est_concat), initialization(hmm))
2×2 Matrix{Float64}:
+ 0.605645  0.6
+ 0.394355  0.4

This page was generated using Literate.jl.

diff --git a/dev/examples/controlled/index.html b/dev/examples/controlled/index.html index 717d3816..2a4857af 100644 --- a/dev/examples/controlled/index.html +++ b/dev/examples/controlled/index.html @@ -23,12 +23,12 @@ init = [0.6, 0.4] trans = [0.7 0.3; 0.2 0.8] dist_coeffs = [-ones(d), ones(d)] -hmm = ControlledGaussianHMM(init, trans, dist_coeffs);

Simulation requires a vector of controls, each being a vector itself with the right dimension.

Let us build several sequences of variable lengths.

control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:100];
+hmm = ControlledGaussianHMM(init, trans, dist_coeffs);

Simulation requires a vector of controls, each being a vector itself with the right dimension.

Let us build several sequences of variable lengths.

control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:1000];
 obs_seqs = [rand(rng, hmm, control_seq).obs_seq for control_seq in control_seqs];
 
 obs_seq = reduce(vcat, obs_seqs)
 control_seq = reduce(vcat, control_seqs)
-seq_ends = cumsum(length.(obs_seqs));

Inference

Not much changes from the case with simple time dependency.

best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)
([1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 2, 2, 2, 2, 2, 1, 1, 1], [-325.2871341699577, -332.8057035589162, -248.5996195568691, -342.9666537461842, -259.1758561946936, -214.04522193132487, -308.6602037627958, -271.43786903918016, -367.72003608277623, -177.1592792245783  …  -259.3465743162636, -312.3081679454204, -268.35205212408965, -199.49489414360087, -218.83759193338452, -352.88904208532443, -265.65310329329327, -377.4644282958292, -196.45824540422007, -322.7919684610303])

Learning

Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.

function StatsAPI.fit!(
+seq_ends = cumsum(length.(obs_seqs));

Inference

Not much changes from the case with simple time dependency.

best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)
([2, 2, 2, 1, 1, 1, 1, 1, 1, 1  …  2, 2, 2, 2, 2, 2, 2, 2, 2, 1], [-333.4053753749967, -251.61698004758227, -244.97370098116392, -237.95343845270963, -292.4298553979756, -280.56505173804567, -192.47350618728163, -334.5791209809055, -343.7306431529746, -348.9221866538449  …  -278.6641266866282, -276.6126863114584, -218.01465609362174, -242.7881303996009, -200.07273377268848, -331.6577857855788, -335.3561870459757, -329.63098083189624, -265.741882051697, -335.74246260629064])

Learning

Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.

function StatsAPI.fit!(
     hmm::ControlledGaussianHMM{T},
     fb_storage::HMMs.ForwardBackwardStorage,
     obs_seq::AbstractVector,
@@ -60,17 +60,17 @@
 trans_guess = [0.6 0.4; 0.3 0.7]
 dist_coeffs_guess = [-1.1 * ones(d), 1.1 * ones(d)]
 hmm_guess = ControlledGaussianHMM(init_guess, trans_guess, dist_coeffs_guess);
hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq, control_seq; seq_ends)
-first(loglikelihood_evolution), last(loglikelihood_evolution)
(-26714.91395039397, -26355.41981246224)

How did we perform?

cat(hmm_est.trans, hmm.trans; dims=3)
2×2×2 Array{Float64, 3}:
+first(loglikelihood_evolution), last(loglikelihood_evolution)
(-266918.8683417796, -263225.09995692014)

How did we perform?

cat(hmm_est.trans, hmm.trans; dims=3)
2×2×2 Array{Float64, 3}:
 [:, :, 1] =
- 0.69564   0.30436
- 0.199238  0.800762
+ 0.699211  0.300789
+ 0.200608  0.799392
 
 [:, :, 2] =
  0.7  0.3
  0.2  0.8
hcat(hmm_est.dist_coeffs[1], hmm.dist_coeffs[1])
3×2 Matrix{Float64}:
- -1.00644   -1.0
- -0.998458  -1.0
- -1.00271   -1.0
hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])
3×2 Matrix{Float64}:
- 1.0153    1.0
- 0.995067  1.0
- 0.982179  1.0

This page was generated using Literate.jl.

+ -0.99658 -1.0 + -1.00033 -1.0 + -0.996751 -1.0
hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])
3×2 Matrix{Float64}:
+ 0.996756  1.0
+ 0.992554  1.0
+ 1.00189   1.0

This page was generated using Literate.jl.

diff --git a/dev/examples/interfaces/index.html b/dev/examples/interfaces/index.html index 02bfca15..0ed4779d 100644 --- a/dev/examples/interfaces/index.html +++ b/dev/examples/interfaces/index.html @@ -87,4 +87,4 @@ [:, :, 2] = 0.594827 0.405173 - 0.464871 0.535129
std(vec(transition_matrix(hmm_est))) < std(vec(transition_matrix(hmm)))
true

This page was generated using Literate.jl.

+ 0.464871 0.535129
std(vec(transition_matrix(hmm_est))) < std(vec(transition_matrix(hmm)))
true

This page was generated using Literate.jl.

diff --git a/dev/examples/temporal/index.html b/dev/examples/temporal/index.html index a83cf749..12fbeed6 100644 --- a/dev/examples/temporal/index.html +++ b/dev/examples/temporal/index.html @@ -97,4 +97,4 @@ -1.00183 -1.0 -2.01198 -2.0
map(mean, hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2)))
2×2 Matrix{Float64}:
  0.972284  1.0
- 2.01909   2.0

This page was generated using Literate.jl.

+ 2.01909 2.0

This page was generated using Literate.jl.

diff --git a/dev/examples/types/index.html b/dev/examples/types/index.html index b44ad7d9..31d5b953 100644 --- a/dev/examples/types/index.html +++ b/dev/examples/types/index.html @@ -38,4 +38,4 @@ first(loglikelihood_evolution), last(loglikelihood_evolution)
(-1649.3075577938478, -1635.2699650704797)

The estimated model has kept the same sparsity pattern as the guess.

transition_matrix(hmm_est)
3×3 SparseArrays.SparseMatrixCSC{Float64, Int64} with 6 stored entries:
  0.718863  0.281137   ⋅ 
   ⋅        0.659587  0.340413
- 0.287745   ⋅        0.712255

Another useful array type is StaticArrays.jl, which reduces allocations for small state spaces.


This page was generated using Literate.jl.

+ 0.287745 ⋅ 0.712255

Another useful array type is StaticArrays.jl, which reduces allocations for small state spaces.


This page was generated using Literate.jl.

diff --git a/dev/formulas/index.html b/dev/formulas/index.html index 3b6e2b8a..3772469b 100644 --- a/dev/formulas/index.html +++ b/dev/formulas/index.html @@ -56,4 +56,4 @@ \frac{\partial \log \mathcal{L}}{\partial a_{i,j}} &= \sum_{t=1}^{T-1} \bar{\alpha}_{i,t} \frac{b_{j,t+1}}{m_{t+1}} \bar{\beta}_{j,t+1} \\ \frac{\partial \log \mathcal{L}}{\partial \log b_{j,1}} &= \pi_j \frac{b_{j,1}}{m_1} \bar{\beta}_{j,1} = \frac{\bar{\alpha}_{j,1} \bar{\beta}_{j,1}}{c_1} = \gamma_{j,1} \\ \frac{\partial \log \mathcal{L}}{\partial \log b_{j,t}} &= \sum_{i=1}^N \bar{\alpha}_{i,t-1} a_{i,j,t-1} \frac{b_{j,t}}{m_t} \bar{\beta}_{j,t} = \frac{\bar{\alpha}_{j,t} \bar{\beta}_{j,t}}{c_t} = \gamma_{j,t} -\end{align*}\]

Bibliography

+\end{align*}\]

Bibliography

diff --git a/dev/index.html b/dev/index.html index ab3b4f9e..82cd0405 100644 --- a/dev/index.html +++ b/dev/index.html @@ -3,4 +3,4 @@ init = [0.6, 0.4] trans = [0.7 0.3; 0.2 0.8] dists = [Normal(-1.0), Normal(1.0)] -hmm = HMM(init, trans, dists)

Take a look at the documentation to know what to do next!

Some background

Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence $(Y_t)$ by assuming the existence of a latent Markovian state sequence $(X_t)$ whose current value determines the distribution of observations. In some scenarios, the state and the observation sequence are also allowed to depend on a known control sequence $(U_t)$. Each of the problems below has an efficient solution algorithm, available here:

ProblemGoalAlgorithm
EvaluationLikelihood of the observation sequenceForward
FilteringLast state marginalsForward
SmoothingAll state marginalsForward-backward
DecodingMost likely state sequenceViterbi
LearningMaximum likelihood parameterBaum-Welch

Take a look at this tutorial to know more about the math:

A tutorial on hidden Markov models and selected applications in speech recognition, Rabiner (1989)

Main features

This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.

This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.

This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.

Contributing

If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.

Acknowledgements

A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.

+hmm = HMM(init, trans, dists)

Take a look at the documentation to know what to do next!

Some background

Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence $(Y_t)$ by assuming the existence of a latent Markovian state sequence $(X_t)$ whose current value determines the distribution of observations. In some scenarios, the state and the observation sequence are also allowed to depend on a known control sequence $(U_t)$. Each of the problems below has an efficient solution algorithm, available here:

ProblemGoalAlgorithm
EvaluationLikelihood of the observation sequenceForward
FilteringLast state marginalsForward
SmoothingAll state marginalsForward-backward
DecodingMost likely state sequenceViterbi
LearningMaximum likelihood parameterBaum-Welch

Take a look at this tutorial to know more about the math:

A tutorial on hidden Markov models and selected applications in speech recognition, Rabiner (1989)

Main features

This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.

This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.

This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.

Contributing

If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.

Acknowledgements

A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.

diff --git a/dev/search_index.js b/dev/search_index.js index 18ca9834..3765ef4d 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"EditURL = \"../../../examples/controlled.jl\"","category":"page"},{"location":"examples/controlled/#Control-dependency","page":"Control dependency","title":"Control dependency","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Here, we give a example of controlled HMM (also called input-output HMM), in the special case of Markov switching regression.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/controlled/#Model","page":"Control dependency","title":"Model","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"A Markov switching regression is like a classical regression, except that the weights depend on the unobserved state of an HMM. We can represent it with the following subtype of AbstractHMM (see Custom HMM structures), which has one vector of coefficients beta_i per state.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"struct ControlledGaussianHMM{T} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dist_coeffs::Vector{Vector{T}}\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In state i with a vector of controls u, our observation is given by the linear model y sim mathcalN(beta_i^top u 1). Controls must be provided to both transition_matrix and obs_distributions even if they are only used by one.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function HMMs.initialization(hmm::ControlledGaussianHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::ControlledGaussianHMM, control::AbstractVector)\n return hmm.trans\nend\n\nfunction HMMs.obs_distributions(hmm::ControlledGaussianHMM, control::AbstractVector)\n return [Normal(dot(hmm.dist_coeffs[i], control), 1.0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In this case, the transition matrix does not depend on the control.","category":"page"},{"location":"examples/controlled/#Simulation","page":"Control dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"d = 3\ninit = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndist_coeffs = [-ones(d), ones(d)]\nhmm = ControlledGaussianHMM(init, trans, dist_coeffs);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Simulation requires a vector of controls, each being a vector itself with the right dimension.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Let us build several sequences of variable lengths.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:100];\nobs_seqs = [rand(rng, hmm, control_seq).obs_seq for control_seq in control_seqs];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/controlled/#Inference","page":"Control dependency","title":"Inference","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Not much changes from the case with simple time dependency.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/controlled/#Learning","page":"Control dependency","title":"Learning","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function StatsAPI.fit!(\n hmm::ControlledGaussianHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector,\n control_seq::AbstractVector;\n seq_ends,\n) where {T}\n (; γ, ξ) = fb_storage\n N = length(hmm)\n\n hmm.init .= 0\n hmm.trans .= 0\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n hmm.trans .+= sum(ξ[t1:t2])\n end\n hmm.init ./= sum(hmm.init)\n for row in eachrow(hmm.trans)\n row ./= sum(row)\n end\n\n U = reduce(hcat, control_seq)'\n y = obs_seq\n for i in 1:N\n W = sqrt.(Diagonal(γ[i, :]))\n hmm.dist_coeffs[i] = (W * U) \\ (W * y)\n end\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Now we put it to the test.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndist_coeffs_guess = [-1.1 * ones(d), 1.1 * ones(d)]\nhmm_guess = ControlledGaussianHMM(init_guess, trans_guess, dist_coeffs_guess);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq, control_seq; seq_ends)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"How did we perform?","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"cat(hmm_est.trans, hmm.trans; dims=3)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[1], hmm.dist_coeffs[1])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"debugging/#Debugging","page":"Debugging","title":"Debugging","text":"","category":"section"},{"location":"debugging/#Numerical-underflow","page":"Debugging","title":"Numerical underflow","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"The most frequent error you will encounter is an underflow during inference, caused by some values being infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Increase the duration of the sequence / the number of sequences to get more data\nAdd a prior to your transition matrix / observation distributions to avoid degenerate behavior (like zero variance in a Gaussian or zero probability in a Bernoulli)\nReduce the number of states to make every one of them useful\nPick a better initialization to start closer to the supposed ground truth\nUse numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own Custom distributions.","category":"page"},{"location":"debugging/#Method-errors","page":"Debugging","title":"Method errors","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"This might be caused by: ","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"forgetting to define methods for your custom type\nomitting control_seq or seq_ends in some places.","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Check the API reference.","category":"page"},{"location":"debugging/#Performance","page":"Debugging","title":"Performance","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"If your algorithms are too slow, you can leverage the existing Interfaces to improve the components of your model separately (first observation distributions, then fitting). The usual advice always applies:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Use BenchmarkTools.jl to establish a baseline\nUse profiling to see where you spend most of your time\nUse JET.jl to track down type instabilities\nUse AllocCheck.jl to reduce allocations","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"CollapsedDocStrings = true","category":"page"},{"location":"api/#API-reference","page":"API reference","title":"API reference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels","category":"page"},{"location":"api/#HiddenMarkovModels","page":"API reference","title":"HiddenMarkovModels","text":"HiddenMarkovModels\n\nA Julia package for HMM modeling, simulation, inference and learning.\n\nExports\n\nAbstractHMM\nHMM\nbaum_welch\nfit!\nforward\nforward_backward\ninitialization\njoint_logdensityof\nlogdensityof\nobs_distributions\nseq_limits\ntransition_matrix\nviterbi\n\n\n\n\n\n","category":"module"},{"location":"api/#Sequence-formatting","page":"API reference","title":"Sequence formatting","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"Most algorithms below ingest the data with two positional arguments obs_seq (mandatory) and control_seq (optional), and a keyword argument seq_ends (optional).","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.\nIf the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"obs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs))","category":"page"},{"location":"api/#Types","page":"API reference","title":"Types","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"AbstractHMM\nHMM","category":"page"},{"location":"api/#HiddenMarkovModels.AbstractHMM","page":"API reference","title":"HiddenMarkovModels.AbstractHMM","text":"AbstractHMM\n\nAbstract supertype for an HMM amenable to simulation, inference and learning.\n\nInterface\n\nTo create your own subtype of AbstractHMM, you need to implement the following methods:\n\ninitialization\ntransition_matrix\nobs_distributions\nfit! (for learning)\n\nApplicable functions\n\nAny AbstractHMM which satisfies the interface can be given to the following functions:\n\nrand\nlogdensityof\nforward\nviterbi\nforward_backward\nbaum_welch (if [fit!](@ref) is implemented)\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.HMM","page":"API reference","title":"HiddenMarkovModels.HMM","text":"struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector), Vl<:(AbstractVector), Ml<:(AbstractMatrix)} <: AbstractHMM\n\nBasic implementation of an HMM.\n\nFields\n\ninit::AbstractVector: initial state probabilities\ntrans::AbstractMatrix: state transition probabilities\ndists::AbstractVector: observation distributions\nloginit::AbstractVector: logarithms of initial state probabilities\nlogtrans::AbstractMatrix: logarithms of state transition probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#Interface","page":"API reference","title":"Interface","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"initialization\ntransition_matrix\nobs_distributions","category":"page"},{"location":"api/#HiddenMarkovModels.initialization","page":"API reference","title":"HiddenMarkovModels.initialization","text":"initialization(hmm)\n\nReturn the vector of initial state probabilities for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.transition_matrix","page":"API reference","title":"HiddenMarkovModels.transition_matrix","text":"transition_matrix(hmm)\ntransition_matrix(hmm, control)\n\nReturn the matrix of state transition probabilities for hmm (possibly when control is applied).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.obs_distributions","page":"API reference","title":"HiddenMarkovModels.obs_distributions","text":"obs_distributions(hmm)\nobs_distributions(hmm, control)\n\nReturn a vector of observation distributions, one for each state of hmm (possibly when control is applied).\n\nThese distribution objects should implement\n\nRandom.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning\n\n\n\n\n\n","category":"function"},{"location":"api/#Utils","page":"API reference","title":"Utils","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"length\nrand\neltype\nseq_limits","category":"page"},{"location":"api/#Base.length","page":"API reference","title":"Base.length","text":"length(hmm)\n\nReturn the number of states of hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.rand","page":"API reference","title":"Base.rand","text":"rand([rng,] hmm, T)\nrand([rng,] hmm, control_seq)\n\nSimulate hmm for T time steps, or when the sequence control_seq is applied.\n\nReturn a named tuple (; state_seq, obs_seq).\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.eltype","page":"API reference","title":"Base.eltype","text":"eltype(hmm, obs, control)\n\nReturn a type that can accommodate forward-backward computations for hmm on observations similar to obs.\n\nIt is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.seq_limits","page":"API reference","title":"HiddenMarkovModels.seq_limits","text":"seq_limits(seq_ends, k)\n\n\nReturn a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.\n\n\n\n\n\n","category":"function"},{"location":"api/#Inference","page":"API reference","title":"Inference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"logdensityof\njoint_logdensityof\nforward\nviterbi\nforward_backward","category":"page"},{"location":"api/#DensityInterface.logdensityof","page":"API reference","title":"DensityInterface.logdensityof","text":"logdensityof(hmm)\n\nReturn the prior loglikelihood associated with the parameters of hmm.\n\n\n\n\n\nlogdensityof(hmm, obs_seq; ...)\nlogdensityof(hmm, obs_seq, control_seq; seq_ends)\n\n\nRun the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.joint_logdensityof","page":"API reference","title":"HiddenMarkovModels.joint_logdensityof","text":"joint_logdensityof(hmm, obs_seq, state_seq; ...)\njoint_logdensityof(\n hmm,\n obs_seq,\n state_seq,\n control_seq;\n seq_ends\n)\n\n\nRun the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward","page":"API reference","title":"HiddenMarkovModels.forward","text":"forward(hmm, obs_seq; ...)\nforward(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the forward algorithm to infer the current state after sequence obs_seq for hmm.\n\nReturn a tuple (storage.α, storage.logL) where storage is of type ForwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi","page":"API reference","title":"HiddenMarkovModels.viterbi","text":"viterbi(hmm, obs_seq; ...)\nviterbi(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.\n\nReturn a tuple (storage.q, storage.logL) where storage is of type ViterbiStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward","page":"API reference","title":"HiddenMarkovModels.forward_backward","text":"forward_backward(hmm, obs_seq; ...)\nforward_backward(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.\n\nReturn a tuple (storage.γ, storage.logL) where storage is of type ForwardBackwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#Learning","page":"API reference","title":"Learning","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"baum_welch\nfit!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch","page":"API reference","title":"HiddenMarkovModels.baum_welch","text":"baum_welch(hmm_guess, obs_seq; ...)\nbaum_welch(\n hmm_guess,\n obs_seq,\n control_seq;\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\nApply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.\n\nReturn a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.\n\nKeyword arguments\n\natol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)\nmax_iterations: maximum number of iterations of the algorithm\nloglikelihood_increasing: whether to throw an error if the loglikelihood decreases\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsAPI.fit!","page":"API reference","title":"StatsAPI.fit!","text":"StatsAPI.fit!(\n hmm, fb_storage::ForwardBackwardStorage,\n obs_seq, [control_seq]; seq_ends,\n)\n\nUpdate hmm in-place based on information generated during forward-backward.\n\nThis function is allowed to reuse fb_storage as a scratch space, so its contents should not be trusted afterwards.\n\n\n\n\n\n","category":"function"},{"location":"api/#In-place-versions","page":"API reference","title":"In-place versions","text":"","category":"section"},{"location":"api/#Forward","page":"API reference","title":"Forward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardStorage\nHiddenMarkovModels.initialize_forward\nHiddenMarkovModels.forward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardStorage","text":"struct ForwardStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nα::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nc::Vector\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward","page":"API reference","title":"HiddenMarkovModels.initialize_forward","text":"initialize_forward(hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward!","page":"API reference","title":"HiddenMarkovModels.forward!","text":"forward!(storage, hmm, obs_seq, control_seq, t1, t2)\n\n\n\n\n\n\nforward!(storage, hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Viterbi","page":"API reference","title":"Viterbi","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ViterbiStorage\nHiddenMarkovModels.initialize_viterbi\nHiddenMarkovModels.viterbi!","category":"page"},{"location":"api/#HiddenMarkovModels.ViterbiStorage","page":"API reference","title":"HiddenMarkovModels.ViterbiStorage","text":"struct ViterbiStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nq::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])\nlogL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence\nlogB::Matrix\nϕ::Matrix\nψ::Matrix{Int64}\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_viterbi","page":"API reference","title":"HiddenMarkovModels.initialize_viterbi","text":"initialize_viterbi(hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi!","page":"API reference","title":"HiddenMarkovModels.viterbi!","text":"viterbi!(storage, hmm, obs_seq, control_seq, t1, t2)\n\n\n\n\n\n\nviterbi!(storage, hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Forward-backward","page":"API reference","title":"Forward-backward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardBackwardStorage\nHiddenMarkovModels.initialize_forward_backward\nHiddenMarkovModels.forward_backward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardBackwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardBackwardStorage","text":"struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nγ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])\nξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nα::Matrix\nc::Vector\nβ::Matrix\nBβ::Matrix\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward_backward","page":"API reference","title":"HiddenMarkovModels.initialize_forward_backward","text":"initialize_forward_backward(\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward!","page":"API reference","title":"HiddenMarkovModels.forward_backward!","text":"forward_backward!(\n storage,\n hmm,\n obs_seq,\n control_seq,\n t1,\n t2;\n transition_marginals\n)\n\n\n\n\n\n\nforward_backward!(\n storage,\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Baum-Welch","page":"API reference","title":"Baum-Welch","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.baum_welch!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch!","page":"API reference","title":"HiddenMarkovModels.baum_welch!","text":"baum_welch!(\n fb_storage,\n logL_evolution,\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Miscellaneous","page":"API reference","title":"Miscellaneous","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.valid_hmm\nHiddenMarkovModels.rand_prob_vec\nHiddenMarkovModels.rand_trans_mat\nHiddenMarkovModels.fit_in_sequence!","category":"page"},{"location":"api/#HiddenMarkovModels.valid_hmm","page":"API reference","title":"HiddenMarkovModels.valid_hmm","text":"valid_hmm(hmm)\n\nPerform some checks to rule out obvious inconsistencies with an AbstractHMM object.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_prob_vec","page":"API reference","title":"HiddenMarkovModels.rand_prob_vec","text":"rand_prob_vec([rng, ::Type{R},] N)\n\nGenerate a random probability distribution of size N with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_trans_mat","page":"API reference","title":"HiddenMarkovModels.rand_trans_mat","text":"rand_trans_mat([rng, ::Type{R},] N)\n\nGenerate a random transition matrix of size (N, N) with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.fit_in_sequence!","page":"API reference","title":"HiddenMarkovModels.fit_in_sequence!","text":"fit_in_sequence!(dists, i, x, w)\n\n\nModify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.\n\nDefault behavior:\n\nfit!(dists[i], x, w)\n\nOverride for Distributions.jl (in the package extension)\n\ndists[i] = fit(eltype(dists), turn_into_vector(x), w)\n\n\n\n\n\n","category":"function"},{"location":"api/#Internals","page":"API reference","title":"Internals","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.LightDiagNormal\nHiddenMarkovModels.LightCategorical\nHiddenMarkovModels.log_initialization\nHiddenMarkovModels.log_transition_matrix\nHiddenMarkovModels.mul_rows_cols!\nHiddenMarkovModels.argmaxplus_transmul!","category":"page"},{"location":"api/#HiddenMarkovModels.LightDiagNormal","page":"API reference","title":"HiddenMarkovModels.LightDiagNormal","text":"struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}\n\nAn HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\nμ::AbstractVector: means\nσ::AbstractVector: standard deviations\nlogσ::AbstractVector: log standard deviations\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.LightCategorical","page":"API reference","title":"HiddenMarkovModels.LightCategorical","text":"struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}\n\nAn HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\np::AbstractVector: class probabilities\nlogp::AbstractVector: log class probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.log_initialization","page":"API reference","title":"HiddenMarkovModels.log_initialization","text":"log_initialization(hmm)\n\nReturn the vector of initial state log-probabilities for hmm.\n\nFalls back on initialization.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.log_transition_matrix","page":"API reference","title":"HiddenMarkovModels.log_transition_matrix","text":"log_transition_matrix(hmm)\nlog_transition_matrix(hmm, control)\n\nReturn the matrix of state transition log-probabilities for hmm (possibly when control is applied).\n\nFalls back on transition_matrix.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.mul_rows_cols!","page":"API reference","title":"HiddenMarkovModels.mul_rows_cols!","text":"mul_rows_cols!(B, l, A, r)\n\nPerform the in-place operation B .= l .* A .* transpose(r).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.argmaxplus_transmul!","page":"API reference","title":"HiddenMarkovModels.argmaxplus_transmul!","text":"argmaxplus_transmul!(y, ind, A, x)\n\nPerform the in-place multiplication transpose(A) * x in the sense of max-plus algebra, store the result in y, and store the index of the maximum for each component of y in ind.\n\n\n\n\n\n","category":"function"},{"location":"api/#Index","page":"API reference","title":"Index","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"EditURL = \"../../../examples/temporal.jl\"","category":"page"},{"location":"examples/temporal/#Time-dependency","page":"Time dependency","title":"Time dependency","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Here, we demonstrate what to do transition and observation laws depend on the current time. This time-dependent HMM is implemented as a particular case of controlled HMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing Random\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/temporal/#Model","page":"Time dependency","title":"Model","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We focus on the particular case of a periodic HMM with period L. It has only one initialization vector, but L transition matrices and L vectors of observation distributions. As in Custom HMM structures, we need to subtype AbstractHMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"struct PeriodicHMM{T<:Number,D,L} <: AbstractHMM\n init::Vector{T}\n trans_per::NTuple{L,Matrix{T}}\n dists_per::NTuple{L,Vector{D}}\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The interface definition is almost the same as in the homogeneous case, but we give the control variable (here the time) as an additional argument to transition_matrix and obs_distributions.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"period(::PeriodicHMM{T,D,L}) where {T,D,L} = L\n\nfunction HMMs.initialization(hmm::PeriodicHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.trans_per[l]\nend\n\nfunction HMMs.obs_distributions(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.dists_per[l]\nend","category":"page"},{"location":"examples/temporal/#Simulation","page":"Time dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init = [0.6, 0.4]\ntrans_per = ([0.7 0.3; 0.2 0.8], [0.3 0.7; 0.8 0.2])\ndists_per = ([Normal(-1.0), Normal(-2.0)], [Normal(+1.0), Normal(+2.0)])\nhmm = PeriodicHMM(init, trans_per, dists_per);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Since the behavior of the model depends on control variables, we need to pass these to the simulation routine (instead of just the number of time steps T).","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seq = 1:10\nstate_seq, obs_seq = rand(rng, hmm, control_seq);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The observations mostly alternate between positive and negative values, which is coherent with negative observation means at odd times and positive observation means at even times.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"obs_seq'","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We now generate several sequences of variable lengths, for inference and learning tasks.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seqs = [1:rand(rng, 100:200) for k in 1:1000]\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in eachindex(control_seqs)];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/temporal/#Inference","page":"Time dependency","title":"Inference","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"All three inference algorithms work in the same way, except that we need to provide the control sequence as the last positional argument.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"For Viterbi, unsurprisingly, the most likely state sequence aligns with the sign of the observations.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"vcat(obs_seq', best_state_seq')","category":"page"},{"location":"examples/temporal/#Learning","page":"Time dependency","title":"Learning","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"When estimating parameters for a custom subtype of AbstractHMM, we have to override the fitting procedure after forward-backward, with an additional control_seq positional argument. The key is to split the observations according to which periodic parameter they belong to.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"function StatsAPI.fit!(\n hmm::PeriodicHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector,\n control_seq::AbstractVector;\n seq_ends,\n) where {T}\n (; γ, ξ) = fb_storage\n L, N = period(hmm), length(hmm)\n\n hmm.init .= zero(T)\n for l in 1:L\n hmm.trans_per[l] .= zero(T)\n end\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n for l in 1:L\n hmm.trans_per[l] .+= sum(ξ[(t1 + l - 1):L:t2])\n end\n end\n hmm.init ./= sum(hmm.init)\n for l in 1:L, row in eachrow(hmm.trans_per[l])\n row ./= sum(row)\n end\n\n for l in 1:L\n times_l = Int[]\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n append!(times_l, (t1 + l - 1):L:t2)\n end\n for i in 1:N\n HMMs.fit_in_sequence!(hmm.dists_per[l], i, obs_seq[times_l], γ[i, times_l])\n end\n end\n\n for l in 1:L\n @assert HMMs.valid_hmm(hmm, l)\n end\n return nothing\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Now let's test our procedure with a reasonable guess.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init_guess = [0.7, 0.3]\ntrans_per_guess = ([0.6 0.4; 0.3 0.7], [0.4 0.6; 0.7 0.3])\ndists_per_guess = ([Normal(-1.1), Normal(-2.1)], [Normal(+1.1), Normal(+2.1)])\nhmm_guess = PeriodicHMM(init_guess, trans_per_guess, dists_per_guess);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Naturally, Baum-Welch also requires knowing control_seq.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq, control_seq; seq_ends);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Did we do well?","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 1), transition_matrix(hmm, 1); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 2), transition_matrix(hmm, 2); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"map(mean, hcat(obs_distributions(hmm_est, 1), obs_distributions(hmm, 1)))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"map(mean, hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2)))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"alternatives/#Competitors","page":"Alternatives","title":"Competitors","text":"","category":"section"},{"location":"alternatives/#Julia","page":"Alternatives","title":"Julia","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Julia packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"HiddenMarkovModels.jl\nHMMBase.jl\nHMMGradients.jl","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" HiddenMarkovModels.jl HMMBase.jl HMMGradients.jl\nAlgorithms[1] V, FB, BW V, FB, BW FB\nNumber types anything Float64 AbstractFloat\nObservation types anything number or vector anything\nObservation distributions DensityInterface.jl Distributions.jl manual\nMultiple sequences yes no yes\nPriors / structures possible no possible\nControl dependency yes no no\nAutomatic differentiation yes no yes\nLinear algebra speedup yes yes no\nNumerical stability scaling+ scaling+ log","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"info: Very small probabilities\nIn all HMM algorithms, we work with probabilities that may become very small as time progresses. There are two main solutions for this problem: scaling and logarithmic computations. This package implements the Viterbi algorithm in log scale, but the other algorithms use scaling to exploit BLAS operations. As was done in HMMBase.jl, we enhance scaling with a division by the highest observation loglikelihood: instead of working with b_it = mathbbP(Y_t X_t = i), we use b_it max_i b_it. See Formulas for details.","category":"page"},{"location":"alternatives/#Python","page":"Alternatives","title":"Python","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Python packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"hmmlearn (based on NumPy)\npomegranate (based on PyTorch)\ndynamax (based on JAX)","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" hmmlearn pomegranate dynamax\nAlgorithms[1] V, FB, BW, VI FB, BW FB, V, BW, GD\nNumber types NumPy formats PyTorch formats JAX formats\nObservation types number or vector number or vector number or vector\nObservation distributions hmmlearn catalogue pomegranate catalogue dynamax catalogue\nMultiple sequences yes yes yes\nPriors / structures yes no yes\nControl dependency no no yes\nAutomatic differentiation no yes yes\nLinear algebra speedup yes yes yes\nNumerical stability scaling / log log log","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"[1]: V = Viterbi, FB = Forward-Backward, BW = Baum-Welch, VI = Variational Inference, GD = Gradient Descent","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"EditURL = \"../../../examples/autodiff.jl\"","category":"page"},{"location":"examples/autodiff/#Autodiff","page":"Autodiff","title":"Autodiff","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Here we show how to compute gradients of the observation sequence loglikelihood with respect to various inputs.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"using ComponentArrays\nusing DensityInterface\nusing Distributions\nusing Enzyme: Enzyme\nusing ForwardDiff: ForwardDiff\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StableRNGs\nusing StatsAPI\nusing Zygote: Zygote","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Diffusion-HMM","page":"Autodiff","title":"Diffusion HMM","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"To play around with automatic differentiation, we define a simple controlled HMM.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"struct DiffusionHMM{V1<:AbstractVector,M2<:AbstractMatrix,V3<:AbstractVector} <: AbstractHMM\n init::V1\n trans::M2\n means::V3\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Both its transition matrix and its vector of observation means result from a convex combination between the corresponding field and a base value (aka diffusion). The coefficient lambda of this convex combination is given as a control.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"HMMs.initialization(hmm::DiffusionHMM) = hmm.init\n\nfunction HMMs.transition_matrix(hmm::DiffusionHMM, λ::Number)\n N = length(hmm)\n return (1 - λ) * hmm.trans + λ * ones(N, N) / N\nend\n\nfunction HMMs.obs_distributions(hmm::DiffusionHMM, λ::Number)\n return [Normal((1 - λ) * hmm.means[i] + λ * 0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"We now construct an instance of this object and draw samples from it.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.3 0.7]\nmeans = [-1.0, 1.0]\nhmm = DiffusionHMM(init, trans, means);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"It is essential that the controls are taken between 0 and 1.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"control_seqs = [rand(rng, 3), rand(rng, 5)];\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in 1:2];\n\ncontrol_seq = reduce(vcat, control_seqs)\nobs_seq = reduce(vcat, obs_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/autodiff/#What-to-differentiate?","page":"Autodiff","title":"What to differentiate?","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The key function we are interested in is the loglikelihood of the observation sequence. We can differentiate it with respect to","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"the model itself (hmm), or more precisely its parameters\nthe observation sequence (obs_seq)\nthe control sequence (control_seq).\nbut not with respect to the sequence limits (seq_ends), which are discrete.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"logdensityof(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"To ensure compatibility with backends that only accept a single input, we wrap all parameters inside a ComponentVector from ComponentArrays.jl, and define a new function to differentiate.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"parameters = ComponentVector(; init, trans, means)\n\nfunction f(parameters::ComponentVector, obs_seq, control_seq; seq_ends)\n new_hmm = DiffusionHMM(parameters.init, parameters.trans, parameters.means)\n return logdensityof(new_hmm, obs_seq, control_seq; seq_ends)\nend;\n\nf(parameters, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/autodiff/#Forward-mode","page":"Autodiff","title":"Forward mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Since all of our code is type-generic, it is amenable to forward-mode automatic differentiation with ForwardDiff.jl.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Because ForwardDiff.jl only accepts a single input, we must compute derivatives one at a time.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_forwarddiff = ForwardDiff.gradient(\n _parameters -> f(_parameters, obs_seq, control_seq; seq_ends), parameters\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_forwarddiff = ForwardDiff.gradient(\n _obs_seq -> f(parameters, _obs_seq, control_seq; seq_ends), obs_seq\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_forwarddiff = ForwardDiff.gradient(\n _control_seq -> f(parameters, obs_seq, _control_seq; seq_ends), control_seq\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"These values will serve as ground truth when we compare with reverse mode.","category":"page"},{"location":"examples/autodiff/#Reverse-mode-with-Zygote.jl","page":"Autodiff","title":"Reverse mode with Zygote.jl","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"In the presence of many parameters, reverse mode automatic differentiation of the loglikelihood will be much more efficient. The package includes a handwritten chain rule for logdensityof, which means backends like Zygote.jl can be used out of the box. Using it, we can compute all derivatives at once.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇all_zygote = Zygote.gradient(\n (_a, _b, _c) -> f(_a, _b, _c; seq_ends), parameters, obs_seq, control_seq\n);\n\n∇parameters_zygote, ∇obs_zygote, ∇control_zygote = ∇all_zygote;\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"We can check the results to validate our chain rule.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_zygote ≈ ∇parameters_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_zygote ≈ ∇obs_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_zygote ≈ ∇control_forwarddiff","category":"page"},{"location":"examples/autodiff/#Reverse-mode-with-Enzyme.jl","page":"Autodiff","title":"Reverse mode with Enzyme.jl","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The more efficient Enzyme.jl also works natively as long as there are no type instabilities, which is why we avoid the closure and the keyword arguments with f_aux:","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"function f_aux(parameters, obs_seq, control_seq, seq_ends)\n return f(parameters, obs_seq, control_seq; seq_ends)\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Enzyme.jl requires preallocated storage for the gradients, which we happily provide.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_enzyme = Enzyme.make_zero(parameters)\n∇obs_enzyme = Enzyme.make_zero(obs_seq)\n∇control_enzyme = Enzyme.make_zero(control_seq);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The syntax is a bit more complex, see the Enzyme.jl docs for details.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Enzyme.autodiff(\n Enzyme.Reverse,\n f_aux,\n Enzyme.Active,\n Enzyme.Duplicated(parameters, ∇parameters_enzyme),\n Enzyme.Duplicated(obs_seq, ∇obs_enzyme),\n Enzyme.Duplicated(control_seq, ∇control_enzyme),\n Enzyme.Const(seq_ends),\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once again we can check the results.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_enzyme ≈ ∇parameters_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_enzyme ≈ ∇obs_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_enzyme ≈ ∇control_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"For increased efficiency, we could provide temporary storage to Enzyme.jl in order to avoid allocations. This requires going one level deeper and leveraging the in-place HiddenMarkovModels.forward! function.","category":"page"},{"location":"examples/autodiff/#Gradient-methods","page":"Autodiff","title":"Gradient methods","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"EditURL = \"../../../examples/types.jl\"","category":"page"},{"location":"examples/types/#Types","page":"Types","title":"Types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Here we explain why playing with different number and array types can be useful in an HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing LogarithmicNumbers\nusing Measurements\nusing Random\nusing SparseArrays\nusing StableRNGs","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/types/#General-principle","page":"Types","title":"General principle","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"The whole package is agnostic with respect to types, it performs the right promotions automatically. Therefore, the types we get in the output only depend only on the types present in the input HMM and the observation sequences.","category":"page"},{"location":"examples/types/#Weird-number-types","page":"Types","title":"Weird number types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"A wide variety of number types can be plugged into HMM parameters to enhance precision or change inference behavior. Some examples are:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"BigFloat for arbitrary precision\nLogarithmicNumbers.jl to increase numerical stability\nMeasurements.jl to propagate uncertainties\nForwardDiff.jl for dual numbers in automatic differentiation","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"To give an example, let us first generate some data from a vanilla HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)\nstate_seq, obs_seq = rand(rng, hmm, 100);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we construct a new HMM with some uncertainty on the observation means, using Measurements.jl. Note that uncertainty on the transition parameters would throw an error because the matrix has to be stochastic.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"dists_guess = [Normal(-1.0 ± 0.1), Normal(1.0 ± 0.2)]\nhmm_uncertain = HMM(init, trans, dists_guess)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Every quantity we compute with this new HMM will have propagated uncertainties around it.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"logdensityof(hmm_uncertain, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can check that the interval is centered around the true value.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Measurements.value(logdensityof(hmm_uncertain, obs_seq)) ≈ logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"warning: Number types in Baum-Welch\nFor now, the Baum-Welch algorithm will generally fail with custom number types due to promotion. The reason is that if some parameters have type T1 and some T2, the forward-backward algorithm will compute quantities of type T = promote_type(T1, T2). These quantities may not be suited to the existing containers inside an HMM, and since updates happen in-place for performance, we cannot create a new one. Suggestions are welcome to fix this issue.","category":"page"},{"location":"examples/types/#Sparse-matrices","page":"Types","title":"Sparse matrices","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Sparse matrices are very useful for large models, because it means the memory and computational requirements will scale as the number of possible transitions. In general, this number is much smaller than the square of the number of states.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can easily construct an HMM with a sparse transition matrix, where some transitions are structurally forbidden.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"trans = sparse([\n 0.7 0.3 0\n 0 0.7 0.3\n 0.3 0 0.7\n])","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.2, 0.6, 0.2]\ndists = [Normal(1.0), Normal(2.0), Normal(3.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"When we simulate it, the transitions outside of the nonzero coefficients simply cannot happen.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"state_seq, obs_seq = rand(rng, hmm, 1000)\nstate_transitions = collect(zip(state_seq[1:(end - 1)], state_seq[2:end]));\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"For a possible transition:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 2)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"For an impossible transition:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 1)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we apply Baum-Welch from a guess with the right sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init_guess = [0.3, 0.4, 0.3]\ntrans_guess = sparse([\n 0.6 0.4 0\n 0 0.6 0.4\n 0.4 0 0.6\n])\ndists_guess = [Normal(1.1), Normal(2.1), Normal(3.1)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"The estimated model has kept the same sparsity pattern as the guess.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Another useful array type is StaticArrays.jl, which reduces allocations for small state spaces.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"EditURL = \"../../../examples/interfaces.jl\"","category":"page"},{"location":"examples/interfaces/#Interfaces","page":"Interfaces","title":"Interfaces","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Here we discuss how to extend the observation distributions or model fitting to satisfy specific needs.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"using DensityInterface\nusing Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/interfaces/#Custom-distributions","page":"Interfaces","title":"Custom distributions","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In an HMM object, the observation distributions do not need to come from Distributions.jl. They only need to implement three methods:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Random.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In addition, the observations can be arbitrary Julia types. So let's construct a distribution that generates stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct Stuff{T}\n quantity::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The associated distribution will only be a wrapper for a normal distribution on the quantity.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"mutable struct StuffDist{T}\n quantity_mean::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Simulation is fairly easy.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function Random.rand(rng::AbstractRNG, dist::StuffDist)\n quantity = dist.quantity_mean + randn(rng)\n return Stuff(quantity)\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is important to declare to DensityInterface.jl that the custom distribution has a density, thanks to the following trait. The logdensity itself can be computed up to an additive constant without issue.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"DensityInterface.DensityKind(::StuffDist) = HasDensity()\n\nfunction DensityInterface.logdensityof(dist::StuffDist, obs::Stuff)\n return -abs2(obs.quantity - dist.quantity_mean) / 2\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, the fitting procedure must happen in place, and take a sequence of weighted samples.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n dist::StuffDist, obs_seq::AbstractVector{<:Stuff}, weight_seq::AbstractVector{<:Real}\n)\n dist.quantity_mean =\n sum(weight_seq[k] * obs_seq[k].quantity for k in eachindex(obs_seq, weight_seq)) /\n sum(weight_seq)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's put it to the test.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [StuffDist(-1.0), StuffDist(+1.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"When we sample an observation sequence, we get a vector of Stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"state_seq, obs_seq = rand(rng, hmm, 100)\neltype(obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"And we can pass these observations to all of our inference algorithms.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"viterbi(hmm, obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If we implement fit!, Baum-Welch also works seamlessly.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndists_guess = [StuffDist(-1.1), StuffDist(+1.1)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm, obs_seq)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"obs_distributions(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If you want more sophisticated examples, check out HiddenMarkovModels.LightDiagNormal and HiddenMarkovModels.LightCategorical, which are designed to be fast and allocation-free.","category":"page"},{"location":"examples/interfaces/#Custom-HMM-structures","page":"Interfaces","title":"Custom HMM structures","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In some scenarios, the vanilla Baum-Welch algorithm is not exactly what we want. For instance, we might have a prior on the parameters of our model, which we want to apply during the fitting step of the iterative procedure. Then we need to create a new type that satisfies the AbstractHMM interface.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's make a simpler version of the built-in HMM, with a prior saying that each transition has already been observed a certain number of times. Such a prior can be very useful to regularize estimation and avoid numerical instabilities. It amounts to drawing every row of the transition matrix from a Dirichlet distribution, where each Dirichlet parameter is one plus the number of times the corresponding transition has been observed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct PriorHMM{T,D} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dists::Vector{D}\n trans_prior_count::Int\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The basic requirements for AbstractHMM are the following three functions: initialization, transition_matrix and obs_distributions.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"HiddenMarkovModels.initialization(hmm::PriorHMM) = hmm.init\nHiddenMarkovModels.transition_matrix(hmm::PriorHMM) = hmm.trans\nHiddenMarkovModels.obs_distributions(hmm::PriorHMM) = hmm.dists","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is also possible to override logdensityof(hmm) and specify a prior loglikelihood for the model itself. If we forget to implement this, the loglikelihood computed in Baum-Welch will be missing a term, and thus it might decrease.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function DensityInterface.logdensityof(hmm::PriorHMM)\n prior = Dirichlet(fill(hmm.trans_prior_count + 1, length(hmm)))\n return sum(logdensityof(prior, row) for row in eachrow(transition_matrix(hmm)))\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, we must redefine the specific method of fit! that is used during Baum-Welch re-estimation. This function takes as inputs:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"the hmm itself\na fb_storage of type HiddenMarkovModels.ForwardBackwardStorage containing the results of the forward-backward algorithm.\nthe same inputs as baum_welch for multiple sequences","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The goal is to modify hmm in-place, updating parameters with their maximum likelihood estimates given current inference results. We will make use of the fields fb_storage.γ and fb_storage.ξ, which contain the state and transition marginals γ[i, t] and ξ[t][i, j] at each time step.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n hmm::PriorHMM,\n fb_storage::HiddenMarkovModels.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n seq_ends,\n)\n # initialize to defaults without observations\n hmm.init .= 0\n hmm.trans .= hmm.trans_prior_count # our prior comes into play, otherwise 0\n # iterate over observation sequences\n for k in eachindex(seq_ends)\n # get sequence endpoints\n t1, t2 = seq_limits(seq_ends, k)\n # add estimated number of initializations in each state\n hmm.init .+= fb_storage.γ[:, t1]\n # add estimated number of transitions between each pair of states\n hmm.trans .+= sum(fb_storage.ξ[t1:t2])\n end\n # normalize\n hmm.init ./= sum(hmm.init)\n hmm.trans ./= sum(hmm.trans; dims=2)\n\n for i in 1:length(hmm)\n # weigh each sample by the marginal probability of being in state i\n weight_seq = fb_storage.γ[i, :]\n # fit observation distribution i using those weights\n fit!(hmm.dists[i], obs_seq, weight_seq)\n end\n\n # perform a few checks on the model\n @assert HMMs.valid_hmm(hmm)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"warning: When distributions don't comply\nNote that some distributions, such as those from Distributions.jl:do not support in-place fitting\nexpect different input formats, e.g. matrices instead of a vector of vectorsThe function HiddenMarkovModels.fit_in_sequence! is a replacement for fit!, designed to handle Distributions.jl without committing type piracy. Check out its source code, and overload it for your other distributions too if they do not support in-place fitting.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Now let's see that everything works, even with our custom distribution from before.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"trans_prior_count = 10\nprior_hmm_guess = PriorHMM(init_guess, trans_guess, dists_guess, trans_prior_count);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"prior_hmm_est, prior_logl_evolution = baum_welch(prior_hmm_guess, obs_seq)\nfirst(prior_logl_evolution), last(prior_logl_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"As we can see, the transition matrix for our Bayesian version is slightly more spread out, although this effect would nearly disappear with enough data.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"cat(transition_matrix(hmm_est), transition_matrix(prior_hmm_est); dims=3)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"std(vec(transition_matrix(hmm_est))) < std(vec(transition_matrix(hmm)))","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"EditURL = \"../../../examples/basics.jl\"","category":"page"},{"location":"examples/basics/#Basics","page":"Basics","title":"Basics","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Here we show how to use the essential ingredients of the package.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing Random\nusing StableRNGs","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/basics/#Model","page":"Basics","title":"Model","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The package provides a versatile HMM type with three main attributes:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a vector init of state initialization probabilities\na matrix trans of state transition probabilities\na vector dists of observation distributions, one for each state","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Any scalar- or vector-valued distribution from Distributions.jl can be used for the last part, as well as Custom distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [MvNormal([-0.5, -0.8], I), MvNormal([0.5, 0.8], I)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"examples/basics/#Simulation","page":"Basics","title":"Simulation","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"You can simulate a pair of state and observation sequences with rand by specifying how long you want them to be.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"T = 20\nstate_seq, obs_seq = rand(rng, hmm, T);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The state sequence is a vector of integers.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq[1:3]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The observation sequence is a vector whose elements have whatever type an observation distribution returns when sampled. Here we chose a multivariate normal distribution, so we get vectors at each time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"warning: Difference from HMMBase.jl\nIn the case of multivariate observations, HMMBase.jl works with matrices, whereas HiddenMarkovModels.jl works with vectors of vectors. This allows us to accept more generic observations than just numbers or vectors inside the sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"obs_seq[1:3]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.","category":"page"},{"location":"examples/basics/#Inference","page":"Basics","title":"Inference","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Viterbi algorithm (viterbi) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"the most likely state sequence hatX_1T = undersetX_1TmathrmargmaxmathbbP(X_1T vert Y_1T),\nthe joint loglikelihood mathbbP(hatX_1T Y_1T) (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);\nonly(best_joint_loglikelihood)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"As we can see, the most likely state sequence is very close to the true state sequence, but not necessarily equal.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"(state_seq .== best_state_seq)'","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward algorithm (forward) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a matrix of filtered state marginals alphai t = mathbbP(X_t = i Y_1t),\nthe loglikelihood mathbbP(Y_1T) of the observation sequence (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals, obs_seq_loglikelihood_f = forward(hmm, obs_seq);\nonly(obs_seq_loglikelihood_f)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, these filtered marginals take only the observations up to time t into account. This is particularly useful to infer the marginal distribution of the last state.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward-backward algorithm (forward_backward) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a matrix of smoothed state marginals gammai t = mathbbP(X_t = i Y_1T),\nthe loglikelihood mathbbP(Y_1T) of the observation sequence (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"smoothed_state_marginals, obs_seq_loglikelihood_fb = forward_backward(hmm, obs_seq);\nonly(obs_seq_loglikelihood_fb)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes all observations up to time T into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T - 1] ≈ smoothed_state_marginals[:, T - 1]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T] ≈ smoothed_state_marginals[:, T]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Finally, we provide a thin wrapper (logdensityof) around the forward algorithm for observation sequence loglikelihoods mathbbP(Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Another function (joint_logdensityof) can compute joint loglikelihoods mathbbP(X_1T Y_1T) which take the states into account.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"joint_logdensityof(hmm, obs_seq, state_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"joint_logdensityof(hmm, obs_seq, best_state_seq)","category":"page"},{"location":"examples/basics/#Learning","page":"Basics","title":"Learning","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Baum-Welch algorithm (baum_welch) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndists_guess = [MvNormal([-0.4, -0.7], I), MvNormal([0.4, 0.7], I)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Let's estimate parameters based on a slightly longer sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq = rand(rng, hmm, 200)\nhmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"first(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We can check that the transition matrix estimate has improved.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"And so have the estimates for the observation distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(mean, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state i in the true model will correspond to state i in the estimated model. This is important to keep in mind when testing new models.","category":"page"},{"location":"examples/basics/#Multiple-sequences","page":"Basics","title":"Multiple sequences","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In many applications, we have access to various observation sequences of different lengths.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"nb_seqs = 300\nlong_obs_seqs = [last(rand(rng, hmm, rand(rng, 100:200))) for k in 1:nb_seqs];\ntypeof(long_obs_seqs)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"long_obs_seq_concat = reduce(vcat, long_obs_seqs)\ntypeof(long_obs_seq_concat)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"seq_ends = cumsum(length.(long_obs_seqs))\nseq_ends'","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size length(seq_ends)).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(\n hmm, long_obs_seq_concat; seq_ends\n);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"length(best_joint_loglikelihood_concat) == length(seq_ends)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"length(best_state_seq_concat) == last(seq_ends)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"start2, stop2 = seq_limits(seq_ends, 2)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seqs[2]))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Our estimate should be a little better.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(mean, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est_concat), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"formulas/#Formulas","page":"Formulas","title":"Formulas","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Suppose we are given observations Y_1 Y_T, with hidden states X_1 X_T. Following (Rabiner, 1989), we use the following notations:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"let pi in mathbbR^N be the initial state distribution pi_i = mathbbP(X_1 = i)\nlet A_t in mathbbR^N times N be the transition matrix a_ijt = mathbbP(X_t+1=j X_t = i)\nlet B in mathbbR^N times T be the matrix of statewise observation likelihoods b_it = mathbbP(Y_t X_t = i)","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The conditioning on the known controls U_1T is implicit throughout.","category":"page"},{"location":"formulas/#Vanilla-forward-backward","page":"Formulas","title":"Vanilla forward-backward","text":"","category":"section"},{"location":"formulas/#Recursion","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The forward and backward variables are defined by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it = mathbbP(Y_1t X_t=i) \nbeta_it = mathbbP(Y_t+1T X_t=i)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"They are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_i1 = pi_i b_i1 \nbeta_iT = 1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_jt+1 = left(sum_i=1^N alpha_it a_ijtright) b_jt+1 \nbeta_it = sum_j=1^N a_ijt b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Likelihood","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The likelihood of the whole sequence of observations is given by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = mathbbP(Y_1T) = sum_i=1^N alpha_iT","category":"page"},{"location":"formulas/#Marginals","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We notice that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it beta_it = mathbbP(Y_1T X_t=i) \nalpha_it a_ijt b_jt+1 beta_jt+1 = mathbbP(Y_1T X_t=i X_t+1=j)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Thus we deduce the one-state and two-state marginals","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = mathbbP(X_t=i Y_1T) = frac1mathcalL alpha_it beta_it \nxi_ijt = mathbbP(X_t=i X_t+1=j Y_1T) = frac1mathcalL alpha_it a_ijt b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"According to (Qin et al., 2000), derivatives of the likelihood can be obtained as follows:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 \nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 \nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijt-1right) beta_jt \nendalign*","category":"page"},{"location":"formulas/#Scaled-forward-backward","page":"Formulas","title":"Scaled forward-backward","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In this package, we use a slightly different version of the algorithm, including both the traditional scaling of (Rabiner, 1989) and a normalization of B using m_t = max_i b_it.","category":"page"},{"location":"formulas/#Recursion-2","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The variables are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_i1 = pi_i fracb_i1m_1 c_1 = frac1sum_i hatalpha_i1 baralpha_i1 = c_1 hatalpha_i1 \nhatbeta_iT = 1 barbeta_1T = c_T hatbeta_1T\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_jt+1 = left(sum_i=1^N baralpha_it a_ijtright) fracb_jt+1m_t+1 c_t+1 = frac1sum_j hatalpha_jt+1 baralpha_jt+1 = c_t+1 hatalpha_jt+1 \nhatbeta_it = sum_j=1^N a_ijt fracb_jt+1m_t+1 barbeta_jt+1 barbeta_jt = c_t hatbeta_jt\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In terms of the original variables, we find","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nbaralpha_it = alpha_it left(prod_s=1^t fracc_sm_sright) \nbarbeta_it = beta_it left(c_t prod_s=t+1^T fracc_sm_sright)\nendalign*","category":"page"},{"location":"formulas/#Likelihood-2","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Since we normalized baralpha at each time step,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"1 = sum_i=1^N baralpha_iT = left(sum_i=1^N alpha_iTright) left(prod_s=1^T fracc_sm_sright) ","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"which means","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = sum_i=1^N alpha_iT = prod_s=1^T fracm_sc_s","category":"page"},{"location":"formulas/#Marginals-2","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We can now express the marginals using scaled variables:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = frac1mathcalL alpha_it beta_it = frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) left(barbeta_it frac1c_t prod_s=t+1^T fracm_sc_sright) \n= frac1mathcalL fracbaralpha_it barbeta_itc_t left(prod_s=1^T fracm_sc_sright) = fracbaralpha_it barbeta_itc_t\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nxi_ijt = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1 \n= frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) a_ijt b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_sright) \n= frac1mathcalL baralpha_it a_ijt fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_sright) \n= baralpha_it a_ijt fracb_jt+1m_t+1 barbeta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives-2","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And we also need to adapt the derivatives. For the initial distribution,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 = left(barbeta_i1 frac1c_1 prod_s=2^T fracm_sc_s right) b_i1 \n= left(prod_s=1^T fracm_sc_sright) barbeta_i1 fracb_i1m_1 = mathcalL barbeta_i1 fracb_i1m_1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"For the transition matrix,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \n= sum_t=1^T-1 left(baralpha_it prod_s=1^t fracm_sc_s right) b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_s right) \n= sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_s right) \n= mathcalL sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And for the statewise observation likelihoods,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 = pi_j barbeta_j1 frac1c_1 prod_s=2^T fracm_sc_s = mathcalL pi_j barbeta_j1 frac1m_1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijt-1right) beta_jt \n= sum_i=1^N left(baralpha_it-1 prod_s=1^t-1 fracm_sc_sright) a_ijt-1 left(barbeta_jt frac1c_t prod_s=t+1^T fracm_sc_s right) \n= sum_i=1^N baralpha_it-1 a_ijt-1 barbeta_jt frac1m_t left(prod_s=1^T fracm_sc_sright) \n= mathcalL sum_i=1^N baralpha_it-1 a_ijt-1 barbeta_jt frac1m_t \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Finally, we note that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"fracpartial log mathcalLpartial log b_jt = fracpartial log mathcalLpartial b_jt b_jt","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"To sum up,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial log mathcalLpartial pi_i = fracb_i1m_1 barbeta_i1 \nfracpartial log mathcalLpartial a_ij = sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nfracpartial log mathcalLpartial log b_j1 = pi_j fracb_j1m_1 barbeta_j1 = fracbaralpha_j1 barbeta_j1c_1 = gamma_j1 \nfracpartial log mathcalLpartial log b_jt = sum_i=1^N baralpha_it-1 a_ijt-1 fracb_jtm_t barbeta_jt = fracbaralpha_jt barbeta_jtc_t = gamma_jt\nendalign*","category":"page"},{"location":"formulas/#Bibliography","page":"Formulas","title":"Bibliography","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Qin, F.; Auerbach, A. and Sachs, F. (2000). A Direct Optimization Approach to Hidden Markov Modeling for Single Channel Kinetics. Biophysical Journal 79, 1915–1927. Accessed on Sep 10, 2023.\n\n\n\nRabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286. Accessed on Sep 10, 2023.\n\n\n\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/gdalle/HiddenMarkovModels.jl/blob/main/README.md\"","category":"page"},{"location":"#HiddenMarkovModels.jl","page":"Home","title":"HiddenMarkovModels.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Stable) (Image: Dev) (Image: Build Status) (Image: Coverage)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: Code Style: Blue) (Image: Aqua QA) (Image: JET)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: DOI) (Image: DOI)","category":"page"},{"location":"","page":"Home","title":"Home","text":"A Julia package for simulation, inference and learning of Hidden Markov Models.","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package can be installed using Julia's package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add HiddenMarkovModels","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, you can create your first model as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Distributions, HiddenMarkovModels\ninit = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at the documentation to know what to do next!","category":"page"},{"location":"#Some-background","page":"Home","title":"Some background","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence (Y_t) by assuming the existence of a latent Markovian state sequence (X_t) whose current value determines the distribution of observations. In some scenarios, the state and the observation sequence are also allowed to depend on a known control sequence (U_t). Each of the problems below has an efficient solution algorithm, available here:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Problem Goal Algorithm\nEvaluation Likelihood of the observation sequence Forward\nFiltering Last state marginals Forward\nSmoothing All state marginals Forward-backward\nDecoding Most likely state sequence Viterbi\nLearning Maximum likelihood parameter Baum-Welch","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at this tutorial to know more about the math:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A tutorial on hidden Markov models and selected applications in speech recognition, Rabiner (1989)","category":"page"},{"location":"#Main-features","page":"Home","title":"Main features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.","category":"page"}] +[{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"EditURL = \"../../../examples/controlled.jl\"","category":"page"},{"location":"examples/controlled/#Control-dependency","page":"Control dependency","title":"Control dependency","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Here, we give a example of controlled HMM (also called input-output HMM), in the special case of Markov switching regression.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/controlled/#Model","page":"Control dependency","title":"Model","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"A Markov switching regression is like a classical regression, except that the weights depend on the unobserved state of an HMM. We can represent it with the following subtype of AbstractHMM (see Custom HMM structures), which has one vector of coefficients beta_i per state.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"struct ControlledGaussianHMM{T} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dist_coeffs::Vector{Vector{T}}\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In state i with a vector of controls u, our observation is given by the linear model y sim mathcalN(beta_i^top u 1). Controls must be provided to both transition_matrix and obs_distributions even if they are only used by one.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function HMMs.initialization(hmm::ControlledGaussianHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::ControlledGaussianHMM, control::AbstractVector)\n return hmm.trans\nend\n\nfunction HMMs.obs_distributions(hmm::ControlledGaussianHMM, control::AbstractVector)\n return [Normal(dot(hmm.dist_coeffs[i], control), 1.0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"In this case, the transition matrix does not depend on the control.","category":"page"},{"location":"examples/controlled/#Simulation","page":"Control dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"d = 3\ninit = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndist_coeffs = [-ones(d), ones(d)]\nhmm = ControlledGaussianHMM(init, trans, dist_coeffs);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Simulation requires a vector of controls, each being a vector itself with the right dimension.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Let us build several sequences of variable lengths.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"control_seqs = [[randn(rng, d) for t in 1:rand(100:200)] for k in 1:1000];\nobs_seqs = [rand(rng, hmm, control_seq).obs_seq for control_seq in control_seqs];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/controlled/#Inference","page":"Control dependency","title":"Inference","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Not much changes from the case with simple time dependency.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/controlled/#Learning","page":"Control dependency","title":"Learning","text":"","category":"section"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Once more, we override the fit! function. The state-related parameters are estimated in the standard way. Meanwhile, the observation coefficients are given by the formula for weighted least squares.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"function StatsAPI.fit!(\n hmm::ControlledGaussianHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector,\n control_seq::AbstractVector;\n seq_ends,\n) where {T}\n (; γ, ξ) = fb_storage\n N = length(hmm)\n\n hmm.init .= 0\n hmm.trans .= 0\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n hmm.trans .+= sum(ξ[t1:t2])\n end\n hmm.init ./= sum(hmm.init)\n for row in eachrow(hmm.trans)\n row ./= sum(row)\n end\n\n U = reduce(hcat, control_seq)'\n y = obs_seq\n for i in 1:N\n W = sqrt.(Diagonal(γ[i, :]))\n hmm.dist_coeffs[i] = (W * U) \\ (W * y)\n end\nend","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"Now we put it to the test.","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndist_coeffs_guess = [-1.1 * ones(d), 1.1 * ones(d)]\nhmm_guess = ControlledGaussianHMM(init_guess, trans_guess, dist_coeffs_guess);\nnothing #hide","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq, control_seq; seq_ends)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"How did we perform?","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"cat(hmm_est.trans, hmm.trans; dims=3)","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[1], hmm.dist_coeffs[1])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"hcat(hmm_est.dist_coeffs[2], hmm.dist_coeffs[2])","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"","category":"page"},{"location":"examples/controlled/","page":"Control dependency","title":"Control dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"debugging/#Debugging","page":"Debugging","title":"Debugging","text":"","category":"section"},{"location":"debugging/#Numerical-underflow","page":"Debugging","title":"Numerical underflow","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"The most frequent error you will encounter is an underflow during inference, caused by some values being infinite or NaN. This can happen for a variety of reasons, so here are a few leads worth investigating:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Increase the duration of the sequence / the number of sequences to get more data\nAdd a prior to your transition matrix / observation distributions to avoid degenerate behavior (like zero variance in a Gaussian or zero probability in a Bernoulli)\nReduce the number of states to make every one of them useful\nPick a better initialization to start closer to the supposed ground truth\nUse numerically stable number types (such as LogarithmicNumbers.jl) in strategic places, but beware: these numbers don't play nicely with Distributions.jl, so you may have to roll out your own Custom distributions.","category":"page"},{"location":"debugging/#Method-errors","page":"Debugging","title":"Method errors","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"This might be caused by: ","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"forgetting to define methods for your custom type\nomitting control_seq or seq_ends in some places.","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Check the API reference.","category":"page"},{"location":"debugging/#Performance","page":"Debugging","title":"Performance","text":"","category":"section"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"If your algorithms are too slow, you can leverage the existing Interfaces to improve the components of your model separately (first observation distributions, then fitting). The usual advice always applies:","category":"page"},{"location":"debugging/","page":"Debugging","title":"Debugging","text":"Use BenchmarkTools.jl to establish a baseline\nUse profiling to see where you spend most of your time\nUse JET.jl to track down type instabilities\nUse AllocCheck.jl to reduce allocations","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"CollapsedDocStrings = true","category":"page"},{"location":"api/#API-reference","page":"API reference","title":"API reference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels","category":"page"},{"location":"api/#HiddenMarkovModels","page":"API reference","title":"HiddenMarkovModels","text":"HiddenMarkovModels\n\nA Julia package for HMM modeling, simulation, inference and learning.\n\nExports\n\nAbstractHMM\nHMM\nbaum_welch\nfit!\nforward\nforward_backward\ninitialization\njoint_logdensityof\nlogdensityof\nobs_distributions\nseq_limits\ntransition_matrix\nviterbi\n\n\n\n\n\n","category":"module"},{"location":"api/#Sequence-formatting","page":"API reference","title":"Sequence formatting","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"Most algorithms below ingest the data with two positional arguments obs_seq (mandatory) and control_seq (optional), and a keyword argument seq_ends (optional).","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"If the data consists of a single sequence, obs_seq and control_seq are the corresponding vectors of observations and controls, and you don't need to provide seq_ends.\nIf the data consists of multiple sequences, obs_seq and control_seq are concatenations of several vectors, whose end indices are given by seq_ends. Starting from separate sequences obs_seqs and control_seqs, you can run the following snippet:","category":"page"},{"location":"api/","page":"API reference","title":"API reference","text":"obs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs))","category":"page"},{"location":"api/#Types","page":"API reference","title":"Types","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"AbstractHMM\nHMM","category":"page"},{"location":"api/#HiddenMarkovModels.AbstractHMM","page":"API reference","title":"HiddenMarkovModels.AbstractHMM","text":"AbstractHMM\n\nAbstract supertype for an HMM amenable to simulation, inference and learning.\n\nInterface\n\nTo create your own subtype of AbstractHMM, you need to implement the following methods:\n\ninitialization\ntransition_matrix\nobs_distributions\nfit! (for learning)\n\nApplicable functions\n\nAny AbstractHMM which satisfies the interface can be given to the following functions:\n\nrand\nlogdensityof\nforward\nviterbi\nforward_backward\nbaum_welch (if [fit!](@ref) is implemented)\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.HMM","page":"API reference","title":"HiddenMarkovModels.HMM","text":"struct HMM{V<:(AbstractVector), M<:(AbstractMatrix), VD<:(AbstractVector), Vl<:(AbstractVector), Ml<:(AbstractMatrix)} <: AbstractHMM\n\nBasic implementation of an HMM.\n\nFields\n\ninit::AbstractVector: initial state probabilities\ntrans::AbstractMatrix: state transition probabilities\ndists::AbstractVector: observation distributions\nloginit::AbstractVector: logarithms of initial state probabilities\nlogtrans::AbstractMatrix: logarithms of state transition probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#Interface","page":"API reference","title":"Interface","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"initialization\ntransition_matrix\nobs_distributions","category":"page"},{"location":"api/#HiddenMarkovModels.initialization","page":"API reference","title":"HiddenMarkovModels.initialization","text":"initialization(hmm)\n\nReturn the vector of initial state probabilities for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.transition_matrix","page":"API reference","title":"HiddenMarkovModels.transition_matrix","text":"transition_matrix(hmm)\ntransition_matrix(hmm, control)\n\nReturn the matrix of state transition probabilities for hmm (possibly when control is applied).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.obs_distributions","page":"API reference","title":"HiddenMarkovModels.obs_distributions","text":"obs_distributions(hmm)\nobs_distributions(hmm, control)\n\nReturn a vector of observation distributions, one for each state of hmm (possibly when control is applied).\n\nThese distribution objects should implement\n\nRandom.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning\n\n\n\n\n\n","category":"function"},{"location":"api/#Utils","page":"API reference","title":"Utils","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"length\nrand\neltype\nseq_limits","category":"page"},{"location":"api/#Base.length","page":"API reference","title":"Base.length","text":"length(hmm)\n\nReturn the number of states of hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.rand","page":"API reference","title":"Base.rand","text":"rand([rng,] hmm, T)\nrand([rng,] hmm, control_seq)\n\nSimulate hmm for T time steps, or when the sequence control_seq is applied.\n\nReturn a named tuple (; state_seq, obs_seq).\n\n\n\n\n\n","category":"function"},{"location":"api/#Base.eltype","page":"API reference","title":"Base.eltype","text":"eltype(hmm, obs, control)\n\nReturn a type that can accommodate forward-backward computations for hmm on observations similar to obs.\n\nIt is typically a promotion between the element type of the initialization, the element type of the transition matrix, and the type of an observation logdensity evaluated at obs.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.seq_limits","page":"API reference","title":"HiddenMarkovModels.seq_limits","text":"seq_limits(seq_ends, k)\n\n\nReturn a tuple (t1, t2) giving the begin and end indices of subsequence k within a set of sequences ending at seq_ends.\n\n\n\n\n\n","category":"function"},{"location":"api/#Inference","page":"API reference","title":"Inference","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"logdensityof\njoint_logdensityof\nforward\nviterbi\nforward_backward","category":"page"},{"location":"api/#DensityInterface.logdensityof","page":"API reference","title":"DensityInterface.logdensityof","text":"logdensityof(hmm)\n\nReturn the prior loglikelihood associated with the parameters of hmm.\n\n\n\n\n\nlogdensityof(hmm, obs_seq; ...)\nlogdensityof(hmm, obs_seq, control_seq; seq_ends)\n\n\nRun the forward algorithm to compute the loglikelihood of obs_seq for hmm, integrating over all possible state sequences.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.joint_logdensityof","page":"API reference","title":"HiddenMarkovModels.joint_logdensityof","text":"joint_logdensityof(hmm, obs_seq, state_seq; ...)\njoint_logdensityof(\n hmm,\n obs_seq,\n state_seq,\n control_seq;\n seq_ends\n)\n\n\nRun the forward algorithm to compute the the joint loglikelihood of obs_seq and state_seq for hmm.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward","page":"API reference","title":"HiddenMarkovModels.forward","text":"forward(hmm, obs_seq; ...)\nforward(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the forward algorithm to infer the current state after sequence obs_seq for hmm.\n\nReturn a tuple (storage.α, storage.logL) where storage is of type ForwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi","page":"API reference","title":"HiddenMarkovModels.viterbi","text":"viterbi(hmm, obs_seq; ...)\nviterbi(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the Viterbi algorithm to infer the most likely state sequence corresponding to obs_seq for hmm.\n\nReturn a tuple (storage.q, storage.logL) where storage is of type ViterbiStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward","page":"API reference","title":"HiddenMarkovModels.forward_backward","text":"forward_backward(hmm, obs_seq; ...)\nforward_backward(hmm, obs_seq, control_seq; seq_ends)\n\n\nApply the forward-backward algorithm to infer the posterior state and transition marginals during sequence obs_seq for hmm.\n\nReturn a tuple (storage.γ, storage.logL) where storage is of type ForwardBackwardStorage.\n\n\n\n\n\n","category":"function"},{"location":"api/#Learning","page":"API reference","title":"Learning","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"baum_welch\nfit!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch","page":"API reference","title":"HiddenMarkovModels.baum_welch","text":"baum_welch(hmm_guess, obs_seq; ...)\nbaum_welch(\n hmm_guess,\n obs_seq,\n control_seq;\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\nApply the Baum-Welch algorithm to estimate the parameters of an HMM on obs_seq, starting from hmm_guess.\n\nReturn a tuple (hmm_est, loglikelihood_evolution) where hmm_est is the estimated HMM and loglikelihood_evolution is a vector of loglikelihood values, one per iteration of the algorithm.\n\nKeyword arguments\n\natol: minimum loglikelihood increase at an iteration of the algorithm (otherwise the algorithm is deemed to have converged)\nmax_iterations: maximum number of iterations of the algorithm\nloglikelihood_increasing: whether to throw an error if the loglikelihood decreases\n\n\n\n\n\n","category":"function"},{"location":"api/#StatsAPI.fit!","page":"API reference","title":"StatsAPI.fit!","text":"StatsAPI.fit!(\n hmm, fb_storage::ForwardBackwardStorage,\n obs_seq, [control_seq]; seq_ends,\n)\n\nUpdate hmm in-place based on information generated during forward-backward.\n\nThis function is allowed to reuse fb_storage as a scratch space, so its contents should not be trusted afterwards.\n\n\n\n\n\n","category":"function"},{"location":"api/#In-place-versions","page":"API reference","title":"In-place versions","text":"","category":"section"},{"location":"api/#Forward","page":"API reference","title":"Forward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardStorage\nHiddenMarkovModels.initialize_forward\nHiddenMarkovModels.forward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardStorage","text":"struct ForwardStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nα::Matrix: posterior last state marginals α[i] = ℙ(X[T]=i | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nc::Vector\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward","page":"API reference","title":"HiddenMarkovModels.initialize_forward","text":"initialize_forward(hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward!","page":"API reference","title":"HiddenMarkovModels.forward!","text":"forward!(storage, hmm, obs_seq, control_seq, t1, t2)\n\n\n\n\n\n\nforward!(storage, hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Viterbi","page":"API reference","title":"Viterbi","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ViterbiStorage\nHiddenMarkovModels.initialize_viterbi\nHiddenMarkovModels.viterbi!","category":"page"},{"location":"api/#HiddenMarkovModels.ViterbiStorage","page":"API reference","title":"HiddenMarkovModels.ViterbiStorage","text":"struct ViterbiStorage{R}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nq::Vector{Int64}: most likely state sequence q[t] = argmaxᵢ ℙ(X[t]=i | Y[1:T])\nlogL::Vector: one joint loglikelihood per pair of observation sequence and most likely state sequence\nlogB::Matrix\nϕ::Matrix\nψ::Matrix{Int64}\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_viterbi","page":"API reference","title":"HiddenMarkovModels.initialize_viterbi","text":"initialize_viterbi(hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.viterbi!","page":"API reference","title":"HiddenMarkovModels.viterbi!","text":"viterbi!(storage, hmm, obs_seq, control_seq, t1, t2)\n\n\n\n\n\n\nviterbi!(storage, hmm, obs_seq, control_seq; seq_ends)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Forward-backward","page":"API reference","title":"Forward-backward","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.ForwardBackwardStorage\nHiddenMarkovModels.initialize_forward_backward\nHiddenMarkovModels.forward_backward!","category":"page"},{"location":"api/#HiddenMarkovModels.ForwardBackwardStorage","page":"API reference","title":"HiddenMarkovModels.ForwardBackwardStorage","text":"struct ForwardBackwardStorage{R, M<:AbstractArray{R, 2}}\n\nFields\n\nOnly the fields with a description are part of the public API.\n\nγ::Matrix: posterior state marginals γ[i,t] = ℙ(X[t]=i | Y[1:T])\nξ::Vector{M} where {R, M<:AbstractMatrix{R}}: posterior transition marginals ξ[t][i,j] = ℙ(X[t]=i, X[t+1]=j | Y[1:T])\nlogL::Vector: one loglikelihood per observation sequence\nB::Matrix\nα::Matrix\nc::Vector\nβ::Matrix\nBβ::Matrix\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.initialize_forward_backward","page":"API reference","title":"HiddenMarkovModels.initialize_forward_backward","text":"initialize_forward_backward(\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.forward_backward!","page":"API reference","title":"HiddenMarkovModels.forward_backward!","text":"forward_backward!(\n storage,\n hmm,\n obs_seq,\n control_seq,\n t1,\n t2;\n transition_marginals\n)\n\n\n\n\n\n\nforward_backward!(\n storage,\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n transition_marginals\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Baum-Welch","page":"API reference","title":"Baum-Welch","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.baum_welch!","category":"page"},{"location":"api/#HiddenMarkovModels.baum_welch!","page":"API reference","title":"HiddenMarkovModels.baum_welch!","text":"baum_welch!(\n fb_storage,\n logL_evolution,\n hmm,\n obs_seq,\n control_seq;\n seq_ends,\n atol,\n max_iterations,\n loglikelihood_increasing\n)\n\n\n\n\n\n\n","category":"function"},{"location":"api/#Miscellaneous","page":"API reference","title":"Miscellaneous","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.valid_hmm\nHiddenMarkovModels.rand_prob_vec\nHiddenMarkovModels.rand_trans_mat\nHiddenMarkovModels.fit_in_sequence!","category":"page"},{"location":"api/#HiddenMarkovModels.valid_hmm","page":"API reference","title":"HiddenMarkovModels.valid_hmm","text":"valid_hmm(hmm)\n\nPerform some checks to rule out obvious inconsistencies with an AbstractHMM object.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_prob_vec","page":"API reference","title":"HiddenMarkovModels.rand_prob_vec","text":"rand_prob_vec([rng, ::Type{R},] N)\n\nGenerate a random probability distribution of size N with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.rand_trans_mat","page":"API reference","title":"HiddenMarkovModels.rand_trans_mat","text":"rand_trans_mat([rng, ::Type{R},] N)\n\nGenerate a random transition matrix of size (N, N) with normalized uniform entries.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.fit_in_sequence!","page":"API reference","title":"HiddenMarkovModels.fit_in_sequence!","text":"fit_in_sequence!(dists, i, x, w)\n\n\nModify the i-th element of dists by fitting it to an observation sequence x with associated weight sequence w.\n\nDefault behavior:\n\nfit!(dists[i], x, w)\n\nOverride for Distributions.jl (in the package extension)\n\ndists[i] = fit(eltype(dists), turn_into_vector(x), w)\n\n\n\n\n\n","category":"function"},{"location":"api/#Internals","page":"API reference","title":"Internals","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"HiddenMarkovModels.LightDiagNormal\nHiddenMarkovModels.LightCategorical\nHiddenMarkovModels.log_initialization\nHiddenMarkovModels.log_transition_matrix\nHiddenMarkovModels.mul_rows_cols!\nHiddenMarkovModels.argmaxplus_transmul!","category":"page"},{"location":"api/#HiddenMarkovModels.LightDiagNormal","page":"API reference","title":"HiddenMarkovModels.LightDiagNormal","text":"struct LightDiagNormal{T1, T2, T3, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}, V3<:AbstractArray{T3, 1}}\n\nAn HMMs-compatible implementation of a multivariate normal distribution with diagonal covariance, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\nμ::AbstractVector: means\nσ::AbstractVector: standard deviations\nlogσ::AbstractVector: log standard deviations\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.LightCategorical","page":"API reference","title":"HiddenMarkovModels.LightCategorical","text":"struct LightCategorical{T1, T2, V1<:AbstractArray{T1, 1}, V2<:AbstractArray{T2, 1}}\n\nAn HMMs-compatible implementation of a discrete categorical distribution, enabling allocation-free in-place estimation.\n\nThis is not part of the public API and is expected to change.\n\nFields\n\np::AbstractVector: class probabilities\nlogp::AbstractVector: log class probabilities\n\n\n\n\n\n","category":"type"},{"location":"api/#HiddenMarkovModels.log_initialization","page":"API reference","title":"HiddenMarkovModels.log_initialization","text":"log_initialization(hmm)\n\nReturn the vector of initial state log-probabilities for hmm.\n\nFalls back on initialization.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.log_transition_matrix","page":"API reference","title":"HiddenMarkovModels.log_transition_matrix","text":"log_transition_matrix(hmm)\nlog_transition_matrix(hmm, control)\n\nReturn the matrix of state transition log-probabilities for hmm (possibly when control is applied).\n\nFalls back on transition_matrix.\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.mul_rows_cols!","page":"API reference","title":"HiddenMarkovModels.mul_rows_cols!","text":"mul_rows_cols!(B, l, A, r)\n\nPerform the in-place operation B .= l .* A .* transpose(r).\n\n\n\n\n\n","category":"function"},{"location":"api/#HiddenMarkovModels.argmaxplus_transmul!","page":"API reference","title":"HiddenMarkovModels.argmaxplus_transmul!","text":"argmaxplus_transmul!(y, ind, A, x)\n\nPerform the in-place multiplication transpose(A) * x in the sense of max-plus algebra, store the result in y, and store the index of the maximum for each component of y in ind.\n\n\n\n\n\n","category":"function"},{"location":"api/#Index","page":"API reference","title":"Index","text":"","category":"section"},{"location":"api/","page":"API reference","title":"API reference","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"EditURL = \"../../../examples/temporal.jl\"","category":"page"},{"location":"examples/temporal/#Time-dependency","page":"Time dependency","title":"Time dependency","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Here, we demonstrate what to do transition and observation laws depend on the current time. This time-dependent HMM is implemented as a particular case of controlled HMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"using Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing Random\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/temporal/#Model","page":"Time dependency","title":"Model","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We focus on the particular case of a periodic HMM with period L. It has only one initialization vector, but L transition matrices and L vectors of observation distributions. As in Custom HMM structures, we need to subtype AbstractHMM.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"struct PeriodicHMM{T<:Number,D,L} <: AbstractHMM\n init::Vector{T}\n trans_per::NTuple{L,Matrix{T}}\n dists_per::NTuple{L,Vector{D}}\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The interface definition is almost the same as in the homogeneous case, but we give the control variable (here the time) as an additional argument to transition_matrix and obs_distributions.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"period(::PeriodicHMM{T,D,L}) where {T,D,L} = L\n\nfunction HMMs.initialization(hmm::PeriodicHMM)\n return hmm.init\nend\n\nfunction HMMs.transition_matrix(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.trans_per[l]\nend\n\nfunction HMMs.obs_distributions(hmm::PeriodicHMM, t::Integer)\n l = (t - 1) % period(hmm) + 1\n return hmm.dists_per[l]\nend","category":"page"},{"location":"examples/temporal/#Simulation","page":"Time dependency","title":"Simulation","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init = [0.6, 0.4]\ntrans_per = ([0.7 0.3; 0.2 0.8], [0.3 0.7; 0.8 0.2])\ndists_per = ([Normal(-1.0), Normal(-2.0)], [Normal(+1.0), Normal(+2.0)])\nhmm = PeriodicHMM(init, trans_per, dists_per);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Since the behavior of the model depends on control variables, we need to pass these to the simulation routine (instead of just the number of time steps T).","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seq = 1:10\nstate_seq, obs_seq = rand(rng, hmm, control_seq);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"The observations mostly alternate between positive and negative values, which is coherent with negative observation means at odd times and positive observation means at even times.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"obs_seq'","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"We now generate several sequences of variable lengths, for inference and learning tasks.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"control_seqs = [1:rand(rng, 100:200) for k in 1:1000]\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in eachindex(control_seqs)];\n\nobs_seq = reduce(vcat, obs_seqs)\ncontrol_seq = reduce(vcat, control_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/temporal/#Inference","page":"Time dependency","title":"Inference","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"All three inference algorithms work in the same way, except that we need to provide the control sequence as the last positional argument.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"best_state_seq, _ = viterbi(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"For Viterbi, unsurprisingly, the most likely state sequence aligns with the sign of the observations.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"vcat(obs_seq', best_state_seq')","category":"page"},{"location":"examples/temporal/#Learning","page":"Time dependency","title":"Learning","text":"","category":"section"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"When estimating parameters for a custom subtype of AbstractHMM, we have to override the fitting procedure after forward-backward, with an additional control_seq positional argument. The key is to split the observations according to which periodic parameter they belong to.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"function StatsAPI.fit!(\n hmm::PeriodicHMM{T},\n fb_storage::HMMs.ForwardBackwardStorage,\n obs_seq::AbstractVector,\n control_seq::AbstractVector;\n seq_ends,\n) where {T}\n (; γ, ξ) = fb_storage\n L, N = period(hmm), length(hmm)\n\n hmm.init .= zero(T)\n for l in 1:L\n hmm.trans_per[l] .= zero(T)\n end\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n hmm.init .+= γ[:, t1]\n for l in 1:L\n hmm.trans_per[l] .+= sum(ξ[(t1 + l - 1):L:t2])\n end\n end\n hmm.init ./= sum(hmm.init)\n for l in 1:L, row in eachrow(hmm.trans_per[l])\n row ./= sum(row)\n end\n\n for l in 1:L\n times_l = Int[]\n for k in eachindex(seq_ends)\n t1, t2 = HMMs.seq_limits(seq_ends, k)\n append!(times_l, (t1 + l - 1):L:t2)\n end\n for i in 1:N\n HMMs.fit_in_sequence!(hmm.dists_per[l], i, obs_seq[times_l], γ[i, times_l])\n end\n end\n\n for l in 1:L\n @assert HMMs.valid_hmm(hmm, l)\n end\n return nothing\nend","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Now let's test our procedure with a reasonable guess.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"init_guess = [0.7, 0.3]\ntrans_per_guess = ([0.6 0.4; 0.3 0.7], [0.4 0.6; 0.7 0.3])\ndists_per_guess = ([Normal(-1.1), Normal(-2.1)], [Normal(+1.1), Normal(+2.1)])\nhmm_guess = PeriodicHMM(init_guess, trans_per_guess, dists_per_guess);\nnothing #hide","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Naturally, Baum-Welch also requires knowing control_seq.","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq, control_seq; seq_ends);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"Did we do well?","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 1), transition_matrix(hmm, 1); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"cat(transition_matrix(hmm_est, 2), transition_matrix(hmm, 2); dims=3)","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"map(mean, hcat(obs_distributions(hmm_est, 1), obs_distributions(hmm, 1)))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"map(mean, hcat(obs_distributions(hmm_est, 2), obs_distributions(hmm, 2)))","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"","category":"page"},{"location":"examples/temporal/","page":"Time dependency","title":"Time dependency","text":"This page was generated using Literate.jl.","category":"page"},{"location":"alternatives/#Competitors","page":"Alternatives","title":"Competitors","text":"","category":"section"},{"location":"alternatives/#Julia","page":"Alternatives","title":"Julia","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Julia packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"HiddenMarkovModels.jl\nHMMBase.jl\nHMMGradients.jl","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We discard MarkovModels.jl because its focus is GPU computation. There are also more generic packages for probabilistic programming, which are able to perform MCMC or variational inference (eg. Turing.jl) but we leave those aside.","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" HiddenMarkovModels.jl HMMBase.jl HMMGradients.jl\nAlgorithms[1] V, FB, BW V, FB, BW FB\nNumber types anything Float64 AbstractFloat\nObservation types anything number or vector anything\nObservation distributions DensityInterface.jl Distributions.jl manual\nMultiple sequences yes no yes\nPriors / structures possible no possible\nControl dependency yes no no\nAutomatic differentiation yes no yes\nLinear algebra speedup yes yes no\nNumerical stability scaling+ scaling+ log","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"info: Very small probabilities\nIn all HMM algorithms, we work with probabilities that may become very small as time progresses. There are two main solutions for this problem: scaling and logarithmic computations. This package implements the Viterbi algorithm in log scale, but the other algorithms use scaling to exploit BLAS operations. As was done in HMMBase.jl, we enhance scaling with a division by the highest observation loglikelihood: instead of working with b_it = mathbbP(Y_t X_t = i), we use b_it max_i b_it. See Formulas for details.","category":"page"},{"location":"alternatives/#Python","page":"Alternatives","title":"Python","text":"","category":"section"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"We compare features among the following Python packages:","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"hmmlearn (based on NumPy)\npomegranate (based on PyTorch)\ndynamax (based on JAX)","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":" hmmlearn pomegranate dynamax\nAlgorithms[1] V, FB, BW, VI FB, BW FB, V, BW, GD\nNumber types NumPy formats PyTorch formats JAX formats\nObservation types number or vector number or vector number or vector\nObservation distributions hmmlearn catalogue pomegranate catalogue dynamax catalogue\nMultiple sequences yes yes yes\nPriors / structures yes no yes\nControl dependency no no yes\nAutomatic differentiation no yes yes\nLinear algebra speedup yes yes yes\nNumerical stability scaling / log log log","category":"page"},{"location":"alternatives/","page":"Alternatives","title":"Alternatives","text":"[1]: V = Viterbi, FB = Forward-Backward, BW = Baum-Welch, VI = Variational Inference, GD = Gradient Descent","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"EditURL = \"../../../examples/autodiff.jl\"","category":"page"},{"location":"examples/autodiff/#Autodiff","page":"Autodiff","title":"Autodiff","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Here we show how to compute gradients of the observation sequence loglikelihood with respect to various inputs.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"using ComponentArrays\nusing DensityInterface\nusing Distributions\nusing Enzyme: Enzyme\nusing ForwardDiff: ForwardDiff\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StableRNGs\nusing StatsAPI\nusing Zygote: Zygote","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/autodiff/#Diffusion-HMM","page":"Autodiff","title":"Diffusion HMM","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"To play around with automatic differentiation, we define a simple controlled HMM.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"struct DiffusionHMM{V1<:AbstractVector,M2<:AbstractMatrix,V3<:AbstractVector} <: AbstractHMM\n init::V1\n trans::M2\n means::V3\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Both its transition matrix and its vector of observation means result from a convex combination between the corresponding field and a base value (aka diffusion). The coefficient lambda of this convex combination is given as a control.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"HMMs.initialization(hmm::DiffusionHMM) = hmm.init\n\nfunction HMMs.transition_matrix(hmm::DiffusionHMM, λ::Number)\n N = length(hmm)\n return (1 - λ) * hmm.trans + λ * ones(N, N) / N\nend\n\nfunction HMMs.obs_distributions(hmm::DiffusionHMM, λ::Number)\n return [Normal((1 - λ) * hmm.means[i] + λ * 0) for i in 1:length(hmm)]\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"We now construct an instance of this object and draw samples from it.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.3 0.7]\nmeans = [-1.0, 1.0]\nhmm = DiffusionHMM(init, trans, means);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"It is essential that the controls are taken between 0 and 1.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"control_seqs = [rand(rng, 3), rand(rng, 5)];\nobs_seqs = [rand(rng, hmm, control_seqs[k]).obs_seq for k in 1:2];\n\ncontrol_seq = reduce(vcat, control_seqs)\nobs_seq = reduce(vcat, obs_seqs)\nseq_ends = cumsum(length.(obs_seqs));\nnothing #hide","category":"page"},{"location":"examples/autodiff/#What-to-differentiate?","page":"Autodiff","title":"What to differentiate?","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The key function we are interested in is the loglikelihood of the observation sequence. We can differentiate it with respect to","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"the model itself (hmm), or more precisely its parameters\nthe observation sequence (obs_seq)\nthe control sequence (control_seq).\nbut not with respect to the sequence limits (seq_ends), which are discrete.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"logdensityof(hmm, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"To ensure compatibility with backends that only accept a single input, we wrap all parameters inside a ComponentVector from ComponentArrays.jl, and define a new function to differentiate.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"parameters = ComponentVector(; init, trans, means)\n\nfunction f(parameters::ComponentVector, obs_seq, control_seq; seq_ends)\n new_hmm = DiffusionHMM(parameters.init, parameters.trans, parameters.means)\n return logdensityof(new_hmm, obs_seq, control_seq; seq_ends)\nend;\n\nf(parameters, obs_seq, control_seq; seq_ends)","category":"page"},{"location":"examples/autodiff/#Forward-mode","page":"Autodiff","title":"Forward mode","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Since all of our code is type-generic, it is amenable to forward-mode automatic differentiation with ForwardDiff.jl.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Because ForwardDiff.jl only accepts a single input, we must compute derivatives one at a time.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_forwarddiff = ForwardDiff.gradient(\n _parameters -> f(_parameters, obs_seq, control_seq; seq_ends), parameters\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_forwarddiff = ForwardDiff.gradient(\n _obs_seq -> f(parameters, _obs_seq, control_seq; seq_ends), obs_seq\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_forwarddiff = ForwardDiff.gradient(\n _control_seq -> f(parameters, obs_seq, _control_seq; seq_ends), control_seq\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"These values will serve as ground truth when we compare with reverse mode.","category":"page"},{"location":"examples/autodiff/#Reverse-mode-with-Zygote.jl","page":"Autodiff","title":"Reverse mode with Zygote.jl","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"In the presence of many parameters, reverse mode automatic differentiation of the loglikelihood will be much more efficient. The package includes a handwritten chain rule for logdensityof, which means backends like Zygote.jl can be used out of the box. Using it, we can compute all derivatives at once.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇all_zygote = Zygote.gradient(\n (_a, _b, _c) -> f(_a, _b, _c; seq_ends), parameters, obs_seq, control_seq\n);\n\n∇parameters_zygote, ∇obs_zygote, ∇control_zygote = ∇all_zygote;\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"We can check the results to validate our chain rule.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_zygote ≈ ∇parameters_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_zygote ≈ ∇obs_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_zygote ≈ ∇control_forwarddiff","category":"page"},{"location":"examples/autodiff/#Reverse-mode-with-Enzyme.jl","page":"Autodiff","title":"Reverse mode with Enzyme.jl","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The more efficient Enzyme.jl also works natively as long as there are no type instabilities, which is why we avoid the closure and the keyword arguments with f_aux:","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"function f_aux(parameters, obs_seq, control_seq, seq_ends)\n return f(parameters, obs_seq, control_seq; seq_ends)\nend","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Enzyme.jl requires preallocated storage for the gradients, which we happily provide.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_enzyme = Enzyme.make_zero(parameters)\n∇obs_enzyme = Enzyme.make_zero(obs_seq)\n∇control_enzyme = Enzyme.make_zero(control_seq);\nnothing #hide","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"The syntax is a bit more complex, see the Enzyme.jl docs for details.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Enzyme.autodiff(\n Enzyme.Reverse,\n f_aux,\n Enzyme.Active,\n Enzyme.Duplicated(parameters, ∇parameters_enzyme),\n Enzyme.Duplicated(obs_seq, ∇obs_enzyme),\n Enzyme.Duplicated(control_seq, ∇control_enzyme),\n Enzyme.Const(seq_ends),\n)","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once again we can check the results.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇parameters_enzyme ≈ ∇parameters_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇obs_enzyme ≈ ∇obs_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"∇control_enzyme ≈ ∇control_forwarddiff","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"For increased efficiency, we could provide temporary storage to Enzyme.jl in order to avoid allocations. This requires going one level deeper and leveraging the in-place HiddenMarkovModels.forward! function.","category":"page"},{"location":"examples/autodiff/#Gradient-methods","page":"Autodiff","title":"Gradient methods","text":"","category":"section"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Once we have gradients of the loglikelihood, it is a natural idea to perform gradient descent in order to fit the parameters of a custom HMM. However, there are two caveats we must keep in mind.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"First, computing a gradient essentially requires running the forward-backward algorithm, which means it is expensive. Given the output of forward-backward, if there is a way to perform a more accurate parameter update (like going straight to the maximum likelihood value), it is probably worth it. That is what we show in the other tutorials with the reimplementation of the fit! method.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Second, HMM parameters live in a constrained space, which calls for a projected gradient descent. Most notably, the transition matrix must be stochastic, and the orthogonal projection onto this set (the Birkhoff polytope) is not easy to obtain.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"Still, first order optimization can be relevant when we lack explicit formulas for maximum likelihood.","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"","category":"page"},{"location":"examples/autodiff/","page":"Autodiff","title":"Autodiff","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"EditURL = \"../../../examples/types.jl\"","category":"page"},{"location":"examples/types/#Types","page":"Types","title":"Types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Here we explain why playing with different number and array types can be useful in an HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing LogarithmicNumbers\nusing Measurements\nusing Random\nusing SparseArrays\nusing StableRNGs","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/types/#General-principle","page":"Types","title":"General principle","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"The whole package is agnostic with respect to types, it performs the right promotions automatically. Therefore, the types we get in the output only depend only on the types present in the input HMM and the observation sequences.","category":"page"},{"location":"examples/types/#Weird-number-types","page":"Types","title":"Weird number types","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"A wide variety of number types can be plugged into HMM parameters to enhance precision or change inference behavior. Some examples are:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"BigFloat for arbitrary precision\nLogarithmicNumbers.jl to increase numerical stability\nMeasurements.jl to propagate uncertainties\nForwardDiff.jl for dual numbers in automatic differentiation","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"To give an example, let us first generate some data from a vanilla HMM.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)\nstate_seq, obs_seq = rand(rng, hmm, 100);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we construct a new HMM with some uncertainty on the observation means, using Measurements.jl. Note that uncertainty on the transition parameters would throw an error because the matrix has to be stochastic.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"dists_guess = [Normal(-1.0 ± 0.1), Normal(1.0 ± 0.2)]\nhmm_uncertain = HMM(init, trans, dists_guess)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Every quantity we compute with this new HMM will have propagated uncertainties around it.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"logdensityof(hmm_uncertain, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can check that the interval is centered around the true value.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Measurements.value(logdensityof(hmm_uncertain, obs_seq)) ≈ logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"warning: Number types in Baum-Welch\nFor now, the Baum-Welch algorithm will generally fail with custom number types due to promotion. The reason is that if some parameters have type T1 and some T2, the forward-backward algorithm will compute quantities of type T = promote_type(T1, T2). These quantities may not be suited to the existing containers inside an HMM, and since updates happen in-place for performance, we cannot create a new one. Suggestions are welcome to fix this issue.","category":"page"},{"location":"examples/types/#Sparse-matrices","page":"Types","title":"Sparse matrices","text":"","category":"section"},{"location":"examples/types/","page":"Types","title":"Types","text":"Sparse matrices are very useful for large models, because it means the memory and computational requirements will scale as the number of possible transitions. In general, this number is much smaller than the square of the number of states.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"We can easily construct an HMM with a sparse transition matrix, where some transitions are structurally forbidden.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"trans = sparse([\n 0.7 0.3 0\n 0 0.7 0.3\n 0.3 0 0.7\n])","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init = [0.2, 0.6, 0.2]\ndists = [Normal(1.0), Normal(2.0), Normal(3.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"When we simulate it, the transitions outside of the nonzero coefficients simply cannot happen.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"state_seq, obs_seq = rand(rng, hmm, 1000)\nstate_transitions = collect(zip(state_seq[1:(end - 1)], state_seq[2:end]));\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"For a possible transition:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 2)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"For an impossible transition:","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"count(isequal((2, 1)), state_transitions)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Now we apply Baum-Welch from a guess with the right sparsity pattern.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"init_guess = [0.3, 0.4, 0.3]\ntrans_guess = sparse([\n 0.6 0.4 0\n 0 0.6 0.4\n 0.4 0 0.6\n])\ndists_guess = [Normal(1.1), Normal(2.1), Normal(3.1)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm_guess, obs_seq);\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"The estimated model has kept the same sparsity pattern as the guess.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"Another useful array type is StaticArrays.jl, which reduces allocations for small state spaces.","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"","category":"page"},{"location":"examples/types/","page":"Types","title":"Types","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"EditURL = \"../../../examples/interfaces.jl\"","category":"page"},{"location":"examples/interfaces/#Interfaces","page":"Interfaces","title":"Interfaces","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Here we discuss how to extend the observation distributions or model fitting to satisfy specific needs.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"using DensityInterface\nusing Distributions\nusing HiddenMarkovModels\nimport HiddenMarkovModels as HMMs\nusing LinearAlgebra\nusing Random: Random, AbstractRNG\nusing StableRNGs\nusing StatsAPI","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/interfaces/#Custom-distributions","page":"Interfaces","title":"Custom distributions","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In an HMM object, the observation distributions do not need to come from Distributions.jl. They only need to implement three methods:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Random.rand(rng, dist) for sampling\nDensityInterface.logdensityof(dist, obs) for inference\nStatsAPI.fit!(dist, obs_seq, weight_seq) for learning","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In addition, the observations can be arbitrary Julia types. So let's construct a distribution that generates stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct Stuff{T}\n quantity::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The associated distribution will only be a wrapper for a normal distribution on the quantity.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"mutable struct StuffDist{T}\n quantity_mean::T\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Simulation is fairly easy.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function Random.rand(rng::AbstractRNG, dist::StuffDist)\n quantity = dist.quantity_mean + randn(rng)\n return Stuff(quantity)\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is important to declare to DensityInterface.jl that the custom distribution has a density, thanks to the following trait. The logdensity itself can be computed up to an additive constant without issue.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"DensityInterface.DensityKind(::StuffDist) = HasDensity()\n\nfunction DensityInterface.logdensityof(dist::StuffDist, obs::Stuff)\n return -abs2(obs.quantity - dist.quantity_mean) / 2\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, the fitting procedure must happen in place, and take a sequence of weighted samples.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n dist::StuffDist, obs_seq::AbstractVector{<:Stuff}, weight_seq::AbstractVector{<:Real}\n)\n dist.quantity_mean =\n sum(weight_seq[k] * obs_seq[k].quantity for k in eachindex(obs_seq, weight_seq)) /\n sum(weight_seq)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's put it to the test.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [StuffDist(-1.0), StuffDist(+1.0)]\nhmm = HMM(init, trans, dists);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"When we sample an observation sequence, we get a vector of Stuff.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"state_seq, obs_seq = rand(rng, hmm, 100)\neltype(obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"And we can pass these observations to all of our inference algorithms.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"viterbi(hmm, obs_seq)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If we implement fit!, Baum-Welch also works seamlessly.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndists_guess = [StuffDist(-1.1), StuffDist(+1.1)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"hmm_est, loglikelihood_evolution = baum_welch(hmm, obs_seq)\nfirst(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"obs_distributions(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"transition_matrix(hmm_est)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"If you want more sophisticated examples, check out HiddenMarkovModels.LightDiagNormal and HiddenMarkovModels.LightCategorical, which are designed to be fast and allocation-free.","category":"page"},{"location":"examples/interfaces/#Custom-HMM-structures","page":"Interfaces","title":"Custom HMM structures","text":"","category":"section"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"In some scenarios, the vanilla Baum-Welch algorithm is not exactly what we want. For instance, we might have a prior on the parameters of our model, which we want to apply during the fitting step of the iterative procedure. Then we need to create a new type that satisfies the AbstractHMM interface.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Let's make a simpler version of the built-in HMM, with a prior saying that each transition has already been observed a certain number of times. Such a prior can be very useful to regularize estimation and avoid numerical instabilities. It amounts to drawing every row of the transition matrix from a Dirichlet distribution, where each Dirichlet parameter is one plus the number of times the corresponding transition has been observed.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"struct PriorHMM{T,D} <: AbstractHMM\n init::Vector{T}\n trans::Matrix{T}\n dists::Vector{D}\n trans_prior_count::Int\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The basic requirements for AbstractHMM are the following three functions: initialization, transition_matrix and obs_distributions.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"HiddenMarkovModels.initialization(hmm::PriorHMM) = hmm.init\nHiddenMarkovModels.transition_matrix(hmm::PriorHMM) = hmm.trans\nHiddenMarkovModels.obs_distributions(hmm::PriorHMM) = hmm.dists","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"It is also possible to override logdensityof(hmm) and specify a prior loglikelihood for the model itself. If we forget to implement this, the loglikelihood computed in Baum-Welch will be missing a term, and thus it might decrease.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function DensityInterface.logdensityof(hmm::PriorHMM)\n prior = Dirichlet(fill(hmm.trans_prior_count + 1, length(hmm)))\n return sum(logdensityof(prior, row) for row in eachrow(transition_matrix(hmm)))\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Finally, we must redefine the specific method of fit! that is used during Baum-Welch re-estimation. This function takes as inputs:","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"the hmm itself\na fb_storage of type HiddenMarkovModels.ForwardBackwardStorage containing the results of the forward-backward algorithm.\nthe same inputs as baum_welch for multiple sequences","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"The goal is to modify hmm in-place, updating parameters with their maximum likelihood estimates given current inference results. We will make use of the fields fb_storage.γ and fb_storage.ξ, which contain the state and transition marginals γ[i, t] and ξ[t][i, j] at each time step.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"function StatsAPI.fit!(\n hmm::PriorHMM,\n fb_storage::HiddenMarkovModels.ForwardBackwardStorage,\n obs_seq::AbstractVector;\n seq_ends,\n)\n # initialize to defaults without observations\n hmm.init .= 0\n hmm.trans .= hmm.trans_prior_count # our prior comes into play, otherwise 0\n # iterate over observation sequences\n for k in eachindex(seq_ends)\n # get sequence endpoints\n t1, t2 = seq_limits(seq_ends, k)\n # add estimated number of initializations in each state\n hmm.init .+= fb_storage.γ[:, t1]\n # add estimated number of transitions between each pair of states\n hmm.trans .+= sum(fb_storage.ξ[t1:t2])\n end\n # normalize\n hmm.init ./= sum(hmm.init)\n hmm.trans ./= sum(hmm.trans; dims=2)\n\n for i in 1:length(hmm)\n # weigh each sample by the marginal probability of being in state i\n weight_seq = fb_storage.γ[i, :]\n # fit observation distribution i using those weights\n fit!(hmm.dists[i], obs_seq, weight_seq)\n end\n\n # perform a few checks on the model\n @assert HMMs.valid_hmm(hmm)\n return nothing\nend","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"warning: When distributions don't comply\nNote that some distributions, such as those from Distributions.jl:do not support in-place fitting\nexpect different input formats, e.g. matrices instead of a vector of vectorsThe function HiddenMarkovModels.fit_in_sequence! is a replacement for fit!, designed to handle Distributions.jl without committing type piracy. Check out its source code, and overload it for your other distributions too if they do not support in-place fitting.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"Now let's see that everything works, even with our custom distribution from before.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"trans_prior_count = 10\nprior_hmm_guess = PriorHMM(init_guess, trans_guess, dists_guess, trans_prior_count);\nnothing #hide","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"prior_hmm_est, prior_logl_evolution = baum_welch(prior_hmm_guess, obs_seq)\nfirst(prior_logl_evolution), last(prior_logl_evolution)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"As we can see, the transition matrix for our Bayesian version is slightly more spread out, although this effect would nearly disappear with enough data.","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"cat(transition_matrix(hmm_est), transition_matrix(prior_hmm_est); dims=3)","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"std(vec(transition_matrix(hmm_est))) < std(vec(transition_matrix(hmm)))","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"","category":"page"},{"location":"examples/interfaces/","page":"Interfaces","title":"Interfaces","text":"This page was generated using Literate.jl.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"EditURL = \"../../../examples/basics.jl\"","category":"page"},{"location":"examples/basics/#Basics","page":"Basics","title":"Basics","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Here we show how to use the essential ingredients of the package.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"using Distributions\nusing HiddenMarkovModels\nusing LinearAlgebra\nusing Random\nusing StableRNGs","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"rng = StableRNG(63);\nnothing #hide","category":"page"},{"location":"examples/basics/#Model","page":"Basics","title":"Model","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The package provides a versatile HMM type with three main attributes:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a vector init of state initialization probabilities\na matrix trans of state transition probabilities\na vector dists of observation distributions, one for each state","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Any scalar- or vector-valued distribution from Distributions.jl can be used for the last part, as well as Custom distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [MvNormal([-0.5, -0.8], I), MvNormal([0.5, 0.8], I)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"examples/basics/#Simulation","page":"Basics","title":"Simulation","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"You can simulate a pair of state and observation sequences with rand by specifying how long you want them to be.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"T = 20\nstate_seq, obs_seq = rand(rng, hmm, T);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The state sequence is a vector of integers.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"state_seq[1:3]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The observation sequence is a vector whose elements have whatever type an observation distribution returns when sampled. Here we chose a multivariate normal distribution, so we get vectors at each time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"warning: Difference from HMMBase.jl\nIn the case of multivariate observations, HMMBase.jl works with matrices, whereas HiddenMarkovModels.jl works with vectors of vectors. This allows us to accept more generic observations than just numbers or vectors inside the sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"obs_seq[1:3]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In practical applications, the state sequence is not known, which is why we need inference algorithms to gather information about it.","category":"page"},{"location":"examples/basics/#Inference","page":"Basics","title":"Inference","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Viterbi algorithm (viterbi) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"the most likely state sequence hatX_1T = undersetX_1TmathrmargmaxmathbbP(X_1T vert Y_1T),\nthe joint loglikelihood mathbbP(hatX_1T Y_1T) (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq, best_joint_loglikelihood = viterbi(hmm, obs_seq);\nonly(best_joint_loglikelihood)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"As we can see, the most likely state sequence is very close to the true state sequence, but not necessarily equal.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"(state_seq .== best_state_seq)'","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward algorithm (forward) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a matrix of filtered state marginals alphai t = mathbbP(X_t = i Y_1t),\nthe loglikelihood mathbbP(Y_1T) of the observation sequence (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals, obs_seq_loglikelihood_f = forward(hmm, obs_seq);\nonly(obs_seq_loglikelihood_f)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, these filtered marginals take only the observations up to time t into account. This is particularly useful to infer the marginal distribution of the last state.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The forward-backward algorithm (forward_backward) returns:","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"a matrix of smoothed state marginals gammai t = mathbbP(X_t = i Y_1T),\nthe loglikelihood mathbbP(Y_1T) of the observation sequence (in a vector of size 1).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"smoothed_state_marginals, obs_seq_loglikelihood_fb = forward_backward(hmm, obs_seq);\nonly(obs_seq_loglikelihood_fb)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"At each time t, it takes all observations up to time T into account. This is particularly useful during learning. Note that forward and forward-backward only coincide at the last time step.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T - 1] ≈ smoothed_state_marginals[:, T - 1]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"filtered_state_marginals[:, T] ≈ smoothed_state_marginals[:, T]","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Finally, we provide a thin wrapper (logdensityof) around the forward algorithm for observation sequence loglikelihoods mathbbP(Y_1T).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"logdensityof(hmm, obs_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Another function (joint_logdensityof) can compute joint loglikelihoods mathbbP(X_1T Y_1T) which take the states into account.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"joint_logdensityof(hmm, obs_seq, state_seq)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"For instance, we can check that the output of Viterbi is at least as likely as the true state sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"joint_logdensityof(hmm, obs_seq, best_state_seq)","category":"page"},{"location":"examples/basics/#Learning","page":"Basics","title":"Learning","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The Baum-Welch algorithm (baum_welch) is a variant of Expectation-Maximization, designed specifically to estimate HMM parameters. Since it is a local optimization procedure, it requires a starting point that is close enough to the true model.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"init_guess = [0.5, 0.5]\ntrans_guess = [0.6 0.4; 0.3 0.7]\ndists_guess = [MvNormal([-0.4, -0.7], I), MvNormal([0.4, 0.7], I)]\nhmm_guess = HMM(init_guess, trans_guess, dists_guess);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Let's estimate parameters based on a slightly longer sequence.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"_, long_obs_seq = rand(rng, hmm, 200)\nhmm_est, loglikelihood_evolution = baum_welch(hmm_guess, long_obs_seq);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"An essential guarantee of this algorithm is that the loglikelihood of the observation sequence keeps increasing as the model improves.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"first(loglikelihood_evolution), last(loglikelihood_evolution)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"We can check that the transition matrix estimate has improved.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"And so have the estimates for the observation distributions.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(mean, hcat(obs_distributions(hmm_est), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"On the other hand, the initialization is concentrated on one state. This effect can be mitigated by learning from several independent sequences.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Since HMMs are not identifiable up to a permutation of the states, there is no guarantee that state i in the true model will correspond to state i in the estimated model. This is important to keep in mind when testing new models.","category":"page"},{"location":"examples/basics/#Multiple-sequences","page":"Basics","title":"Multiple sequences","text":"","category":"section"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"In many applications, we have access to various observation sequences of different lengths.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"nb_seqs = 1000\nlong_obs_seqs = [last(rand(rng, hmm, rand(rng, 100:200))) for k in 1:nb_seqs];\ntypeof(long_obs_seqs)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Every algorithm in the package accepts multiple sequences in a concatenated form. The user must also specify where each sequence ends in the concatenated vector, by passing seq_ends as a keyword argument. Otherwise, the input will be treated as a unique observation sequence, which is mathematically incorrect.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"long_obs_seq_concat = reduce(vcat, long_obs_seqs)\ntypeof(long_obs_seq_concat)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"seq_ends = cumsum(length.(long_obs_seqs))\nseq_ends'","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The outputs of inference algorithms are then concatenated, and the associated loglikelihoods are split by sequence (in a vector of size length(seq_ends)).","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat, best_joint_loglikelihood_concat = viterbi(\n hmm, long_obs_seq_concat; seq_ends\n);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"length(best_joint_loglikelihood_concat) == length(seq_ends)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"length(best_state_seq_concat) == last(seq_ends)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"The function seq_limits returns the begin and end of a given sequence in the concatenated vector. It can be used to untangle the results.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"start2, stop2 = seq_limits(seq_ends, 2)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"best_state_seq_concat[start2:stop2] == first(viterbi(hmm, long_obs_seqs[2]))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"While inference algorithms can also be run separately on each sequence without changing the results, considering multiple sequences together is nontrivial for Baum-Welch. That is why the package takes care of it automatically.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hmm_est_concat, _ = baum_welch(hmm_guess, long_obs_seq_concat; seq_ends);\nnothing #hide","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"Our estimate should be a little better.","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"cat(transition_matrix(hmm_est_concat), transition_matrix(hmm); dims=3)","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"map(mean, hcat(obs_distributions(hmm_est_concat), obs_distributions(hmm)))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"hcat(initialization(hmm_est_concat), initialization(hmm))","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"","category":"page"},{"location":"examples/basics/","page":"Basics","title":"Basics","text":"This page was generated using Literate.jl.","category":"page"},{"location":"formulas/#Formulas","page":"Formulas","title":"Formulas","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Suppose we are given observations Y_1 Y_T, with hidden states X_1 X_T. Following (Rabiner, 1989), we use the following notations:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"let pi in mathbbR^N be the initial state distribution pi_i = mathbbP(X_1 = i)\nlet A_t in mathbbR^N times N be the transition matrix a_ijt = mathbbP(X_t+1=j X_t = i)\nlet B in mathbbR^N times T be the matrix of statewise observation likelihoods b_it = mathbbP(Y_t X_t = i)","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The conditioning on the known controls U_1T is implicit throughout.","category":"page"},{"location":"formulas/#Vanilla-forward-backward","page":"Formulas","title":"Vanilla forward-backward","text":"","category":"section"},{"location":"formulas/#Recursion","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The forward and backward variables are defined by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it = mathbbP(Y_1t X_t=i) \nbeta_it = mathbbP(Y_t+1T X_t=i)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"They are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_i1 = pi_i b_i1 \nbeta_iT = 1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_jt+1 = left(sum_i=1^N alpha_it a_ijtright) b_jt+1 \nbeta_it = sum_j=1^N a_ijt b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Likelihood","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The likelihood of the whole sequence of observations is given by","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = mathbbP(Y_1T) = sum_i=1^N alpha_iT","category":"page"},{"location":"formulas/#Marginals","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We notice that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nalpha_it beta_it = mathbbP(Y_1T X_t=i) \nalpha_it a_ijt b_jt+1 beta_jt+1 = mathbbP(Y_1T X_t=i X_t+1=j)\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Thus we deduce the one-state and two-state marginals","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = mathbbP(X_t=i Y_1T) = frac1mathcalL alpha_it beta_it \nxi_ijt = mathbbP(X_t=i X_t+1=j Y_1T) = frac1mathcalL alpha_it a_ijt b_jt+1 beta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"According to (Qin et al., 2000), derivatives of the likelihood can be obtained as follows:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 \nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 \nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijt-1right) beta_jt \nendalign*","category":"page"},{"location":"formulas/#Scaled-forward-backward","page":"Formulas","title":"Scaled forward-backward","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In this package, we use a slightly different version of the algorithm, including both the traditional scaling of (Rabiner, 1989) and a normalization of B using m_t = max_i b_it.","category":"page"},{"location":"formulas/#Recursion-2","page":"Formulas","title":"Recursion","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"The variables are initialized with","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_i1 = pi_i fracb_i1m_1 c_1 = frac1sum_i hatalpha_i1 baralpha_i1 = c_1 hatalpha_i1 \nhatbeta_iT = 1 barbeta_1T = c_T hatbeta_1T\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"and satisfy the dynamic programming equations","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nhatalpha_jt+1 = left(sum_i=1^N baralpha_it a_ijtright) fracb_jt+1m_t+1 c_t+1 = frac1sum_j hatalpha_jt+1 baralpha_jt+1 = c_t+1 hatalpha_jt+1 \nhatbeta_it = sum_j=1^N a_ijt fracb_jt+1m_t+1 barbeta_jt+1 barbeta_jt = c_t hatbeta_jt\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"In terms of the original variables, we find","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nbaralpha_it = alpha_it left(prod_s=1^t fracc_sm_sright) \nbarbeta_it = beta_it left(c_t prod_s=t+1^T fracc_sm_sright)\nendalign*","category":"page"},{"location":"formulas/#Likelihood-2","page":"Formulas","title":"Likelihood","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Since we normalized baralpha at each time step,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"1 = sum_i=1^N baralpha_iT = left(sum_i=1^N alpha_iTright) left(prod_s=1^T fracc_sm_sright) ","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"which means","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"mathcalL = sum_i=1^N alpha_iT = prod_s=1^T fracm_sc_s","category":"page"},{"location":"formulas/#Marginals-2","page":"Formulas","title":"Marginals","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"We can now express the marginals using scaled variables:","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\ngamma_it = frac1mathcalL alpha_it beta_it = frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) left(barbeta_it frac1c_t prod_s=t+1^T fracm_sc_sright) \n= frac1mathcalL fracbaralpha_it barbeta_itc_t left(prod_s=1^T fracm_sc_sright) = fracbaralpha_it barbeta_itc_t\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nxi_ijt = frac1mathcalL alpha_it a_ij b_jt+1 beta_jt+1 \n= frac1mathcalL left(baralpha_it prod_s=1^t fracm_sc_sright) a_ijt b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_sright) \n= frac1mathcalL baralpha_it a_ijt fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_sright) \n= baralpha_it a_ijt fracb_jt+1m_t+1 barbeta_jt+1\nendalign*","category":"page"},{"location":"formulas/#Derivatives-2","page":"Formulas","title":"Derivatives","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And we also need to adapt the derivatives. For the initial distribution,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial pi_i = beta_i1 b_i1 = left(barbeta_i1 frac1c_1 prod_s=2^T fracm_sc_s right) b_i1 \n= left(prod_s=1^T fracm_sc_sright) barbeta_i1 fracb_i1m_1 = mathcalL barbeta_i1 fracb_i1m_1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"For the transition matrix,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial a_ij = sum_t=1^T-1 alpha_it b_jt+1 beta_jt+1 \n= sum_t=1^T-1 left(baralpha_it prod_s=1^t fracm_sc_s right) b_jt+1 left(barbeta_jt+1 frac1c_t+1 prod_s=t+2^T fracm_sc_s right) \n= sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 left(prod_s=1^T fracm_sc_s right) \n= mathcalL sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"And for the statewise observation likelihoods,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_j1 = pi_j beta_j1 = pi_j barbeta_j1 frac1c_1 prod_s=2^T fracm_sc_s = mathcalL pi_j barbeta_j1 frac1m_1\nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial mathcalLpartial b_jt = left(sum_i=1^N alpha_it-1 a_ijt-1right) beta_jt \n= sum_i=1^N left(baralpha_it-1 prod_s=1^t-1 fracm_sc_sright) a_ijt-1 left(barbeta_jt frac1c_t prod_s=t+1^T fracm_sc_s right) \n= sum_i=1^N baralpha_it-1 a_ijt-1 barbeta_jt frac1m_t left(prod_s=1^T fracm_sc_sright) \n= mathcalL sum_i=1^N baralpha_it-1 a_ijt-1 barbeta_jt frac1m_t \nendalign*","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Finally, we note that","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"fracpartial log mathcalLpartial log b_jt = fracpartial log mathcalLpartial b_jt b_jt","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"To sum up,","category":"page"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"beginalign*\nfracpartial log mathcalLpartial pi_i = fracb_i1m_1 barbeta_i1 \nfracpartial log mathcalLpartial a_ij = sum_t=1^T-1 baralpha_it fracb_jt+1m_t+1 barbeta_jt+1 \nfracpartial log mathcalLpartial log b_j1 = pi_j fracb_j1m_1 barbeta_j1 = fracbaralpha_j1 barbeta_j1c_1 = gamma_j1 \nfracpartial log mathcalLpartial log b_jt = sum_i=1^N baralpha_it-1 a_ijt-1 fracb_jtm_t barbeta_jt = fracbaralpha_jt barbeta_jtc_t = gamma_jt\nendalign*","category":"page"},{"location":"formulas/#Bibliography","page":"Formulas","title":"Bibliography","text":"","category":"section"},{"location":"formulas/","page":"Formulas","title":"Formulas","text":"Qin, F.; Auerbach, A. and Sachs, F. (2000). A Direct Optimization Approach to Hidden Markov Modeling for Single Channel Kinetics. Biophysical Journal 79, 1915–1927. Accessed on Sep 10, 2023.\n\n\n\nRabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286. Accessed on Sep 10, 2023.\n\n\n\n","category":"page"},{"location":"","page":"Home","title":"Home","text":"EditURL = \"https://github.com/gdalle/HiddenMarkovModels.jl/blob/main/README.md\"","category":"page"},{"location":"#HiddenMarkovModels.jl","page":"Home","title":"HiddenMarkovModels.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"(Image: Stable) (Image: Dev) (Image: Build Status) (Image: Coverage)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: Code Style: Blue) (Image: Aqua QA) (Image: JET)","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: DOI) (Image: DOI)","category":"page"},{"location":"","page":"Home","title":"Home","text":"A Julia package for simulation, inference and learning of Hidden Markov Models.","category":"page"},{"location":"#Getting-started","page":"Home","title":"Getting started","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package can be installed using Julia's package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add HiddenMarkovModels","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, you can create your first model as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Distributions, HiddenMarkovModels\ninit = [0.6, 0.4]\ntrans = [0.7 0.3; 0.2 0.8]\ndists = [Normal(-1.0), Normal(1.0)]\nhmm = HMM(init, trans, dists)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at the documentation to know what to do next!","category":"page"},{"location":"#Some-background","page":"Home","title":"Some background","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Hidden Markov Models (HMMs) are a widely used modeling framework in signal processing, bioinformatics and plenty of other fields. They explain an observation sequence (Y_t) by assuming the existence of a latent Markovian state sequence (X_t) whose current value determines the distribution of observations. In some scenarios, the state and the observation sequence are also allowed to depend on a known control sequence (U_t). Each of the problems below has an efficient solution algorithm, available here:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Problem Goal Algorithm\nEvaluation Likelihood of the observation sequence Forward\nFiltering Last state marginals Forward\nSmoothing All state marginals Forward-backward\nDecoding Most likely state sequence Viterbi\nLearning Maximum likelihood parameter Baum-Welch","category":"page"},{"location":"","page":"Home","title":"Home","text":"Take a look at this tutorial to know more about the math:","category":"page"},{"location":"","page":"Home","title":"Home","text":"A tutorial on hidden Markov models and selected applications in speech recognition, Rabiner (1989)","category":"page"},{"location":"#Main-features","page":"Home","title":"Main features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package is generic. Observations can be arbitrary Julia objects, not just scalars or arrays. Number types are not restricted to floating point, which enables automatic differentiation. Time-dependent or controlled HMMs are supported out of the box.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is fast. All the inference functions have allocation-free versions, which leverage efficient linear algebra subroutines. We will include extensive benchmarks against Julia and Python competitors.","category":"page"},{"location":"","page":"Home","title":"Home","text":"This package is reliable. It gives the same results as the previous reference package up to numerical accuracy. The test suite incorporates quality checks as well as type stability and allocation analysis.","category":"page"},{"location":"#Contributing","page":"Home","title":"Contributing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you spot a bug or want to ask about a new feature, please open an issue on the GitHub repository. Once the issue receives positive feedback, feel free to try and fix it with a pull request that follows the BlueStyle guidelines.","category":"page"},{"location":"#Acknowledgements","page":"Home","title":"Acknowledgements","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"A big thank you to Maxime Mouchet and Jacob Schreiber, the respective lead devs of alternative packages HMMBase.jl and pomegranate, for their help and advice. Logo by Clément Mantoux based on a portrait of Andrey Markov.","category":"page"}] }