Models of representation
of Transcription Factor Binding Sites (TFBS) motifs

Go back !
There is a wide range of ways to represent TFBS motifs. Here some of them are listed with references to thier discription; the most widely used of them are discribed below.

At first I had wanted to write about all these models, but after a quick search in the Web I understood that it would be wasted time, since everithing has been already written in Wikipedia. So I just give references:

       About motifs in general.
       About binding sites and transcription machinery.
       About Position Frequency Matrices (PFMs).
       About Position Weight Matrices (PWMs, PSSMs, PSWMs).

But some brief description can be found below.


All sites potentially bound by factors could be simply enumerated. The information about binding sites can be determined from SELEX experiments.

This is an exemple of a list of experimentally confirmed binding sites for transcription factor (TF) bicoid:

Bicoid motif:
... Use the reference to see all sites!

The list can be used as it is, or converted to PFM or PWM and generate wider list of words (see List to PWM).

Position Frequency Matrix (PFM)

A position frequency matrix (PFM) records the position-dependent frequency of each residue or nucleotide. PFMs can be experimentally determined from SELEX experiments or computationally discovered by tools such as MEME using hidden Markov models.
An example of a PFM from the TRANSFAC database for the transcription factor AP-1:

This was taken from Wikipedia.

PWM / PSSM / PSWM & threshold

You can read about it in Wikipedia. The text below was copied from there.

A position weight matrix (PWM), also called position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM), is a commonly used representation of motifs (patterns) in biological sequences.

A PWM is a matrix of score values that gives a weighted match to any given substring of fixed length. It has one row for each symbol of the alphabet, and one column for each position in the pattern. PWM score is defined as \sum_{j=1}^{N}{m_{i(j),j}}, where j represents position in the substring, i(j) is the symbol at position j in the substring, and mi,j is the score in row i, column j of the matrix. In other words, a PWM score is the sum of position-specific scores for each symbol in the substring.

This is an example of PWM for bicoid:

-0.398 0.422 -0.329 0.128
-0.398 -2.054 -2.054 0.992
1.135 -2.054 -1.400 -2.054
1.164 -2.054 -2.054 -2.054
-2.054 -1.018 -0.728 1.025
-2.054 1.408 -2.054 -2.054
-1.520 1.185 -1.008 -0.702
-0.713 0.422 0.356 -0.260

A position weight matrix (PWM) contains log odds weights for computing a match score. A cutoff is needed to specify whether an input sequence matches the motif or not. PWMs are calculated from PFMs.

Given a PWM and a threshold value one can get a set of words (substrings) scoring above the threshold.

List to PWM

The PWM can be obtained using an aligned list of words. First one creates the Postion Frequency Matrix (PFM) by simple computation of nucleotide occurrences in each position of the alignment. Then, one transmits PFM to PWM.

As threshold value one can take the minimal score value given by the constructed PWM for each word in the list.


Also a motif can be presented by a IUPAC consensus. An example of a consensus from the TRANSFAC database for the transcription factor AP-1 is shown above.

The nomenclature of the International Union of Pure and Applied Chemistry (IUPAC) is as follows:

	   A = adenine
	   C = cytosine
	   G = guanine
	   T = thymine
	   U = uracil
	   R = G A (purine)
	   Y = T C (pyrimidine)
	   K = G T (keto)
	   M = A C (amino)
	   S = G C (strong bonds)
	   W = A T (weak bonds)
	   B = G T C (all but A)
	   D = G A T (all but C)
	   H = A C T (all but G)
	   V = G C A (all but T)
	   N = A G C T (any)
Thus IUPAC consensus of a motif can be liken to a list of words.

Word & number of mismatches

Motif can be described as a consensus word with a given number of mismatches. For example, for bicoid it could be:

CCTAATCCC and 3 mismatches.

But this way of motif representation is quite infrequent.

Last modified 15 January 2007