Text models model for biological sequences


Go back !
  Here we concider three model for random text generation that can be used for representation of real biological sequences

Multivariate Bernoulli/Markov(0)
Markov(k)
HMM

Also you can learn about all three models at the page of Lloyd Allison, Faculty of Information Technology, Clayton, Monash University, Australia.


Multivariate Bernoulli or Markov(0) model is a generative model that supposes no dependencies between letters in generated text. It is the particular case of Markov(k) model with dependence order k equal ro zero.

Formally, given an alphabet Α = {αi} and probabilities pαi: Σ pαi = 1, the probability to get letter α at any position j is equal to pα and does not depend on position number nor on letters on previous or subsequent positions.


In Markov model of order k the letter Xj on position j depends on letters Xj-1,...,Xj-k at k previous positions. And
If Xj-l does not exist for some l, i.e. j-l < 0, then Xj-l is eliminated from conditional probability. For example, for k=2 and n=4:
.
Thus, to set Markov(k) model one needs to set all conditional probabilities
for all .
Also one need to set parameters of starting distribution.

The most widely used is Markov model of order 1. The other name is time-homogeneous markov chain. Read about Markov chains in Wikipedia.

In our case, when the alphabet Α is finite, the transition probability distribution can be represented by a matrix P, called the transition matrix, with the (i, j)'th element
of P equal to .
P is a stochastic matrix. Further, the k-step transition probability can be computed as the k'th power of the transition matrix, Pk.

The stationary distribution π is a (row) vector which satisfies the equation π = πP. In other words, the stationary distribution π is a normalized left eigenvector of the transition matrix associated with the eigenvalue 1.


Text can be considered as generated according to Hidden Markov Model (HMM). In order not to copy we advise you to read about HMM in Wikipedia.


Also you can learn about all three models here. This is the page of Lloyd Allison, Faculty of Information Technology, Clayton, Monash University, Clayton, Victoria 3800, Australia.

Last modified 15 January 2007