There is a wide range of ways to represent TFBS motifs. Here some of them are listed with
references to thier discription; the most widely used of them are discribed below.
At first I had wanted to write about all these models, but after a quick search in the Web I
understood that it would be wasted time, since everithing has been already
So I just give references:
About motifs in general.
About binding sites and
About Position Frequency Matrices (PFMs).
About Position Weight Matrices (PWMs, PSSMs, PSWMs).
But some brief description can be found below.
All sites potentially bound by factors could be simply enumerated.
about binding sites can be determined from
an exemple of a list of experimentally confirmed binding sites for transcription
factor (TF) bicoid:
to see all sites!
The list can be used as it is, or converted to PFM or
PWM and generate wider
list of words (see List to PWM).
A position frequency matrix (PFM) records the position-dependent
each residue or nucleotide. PFMs can be experimentally determined
experiments or computationally
discovered by tools such as MEME
of a PFM from the
database for the transcription factor
This was taken from
You can read about it in Wikipedia.
The text below was copied from there.
A position weight matrix (PWM), also called position-specific weight
(PSWM) or position-specific scoring matrix (PSSM), is a commonly
representation of motifs
(patterns) in biological sequences.
A PWM is a matrix of score values that gives a weighted match to any given
substring of fixed length. It has one row for each symbol of the alphabet, and one
column for each position in the pattern. PWM score is defined as ,
where j represents position in the substring,
i(j) is the symbol at position j in the
mi,j is the score in
row i, column j of the matrix. In other words, a
PWM score is the sum
of position-specific scores for each symbol in the substring.
This is an example of PWM for bicoid:
|-0.398|| 0.422|| -0.329|| 0.128
|-0.398|| -2.054|| -2.054|| 0.992
|1.135|| -2.054|| -1.400|| -2.054
|1.164|| -2.054|| -2.054|| -2.054
|-2.054|| -1.018|| -0.728|| 1.025
|-2.054|| 1.408 || -2.054|| -2.054
|-1.520|| 1.185 || -1.008|| -0.702
|-0.713|| 0.422 || 0.356|| -0.260
A position weight matrix (PWM) contains log odds weights for computing a
score. A cutoff is needed to specify whether an input
sequence matches the motif
or not. PWMs are calculated from PFMs.
Given a PWM and a threshold value one can get a set of words (substrings) scoring
above the threshold.
Also a motif can be presented by a
. An example of a consensus
from the TRANSFAC
database for the
transcription factor AP-1 is shown above
The nomenclature of the International Union of Pure and Applied
is as follows:
A = adenine
C = cytosine
G = guanine
T = thymine
U = uracil
R = G A (purine)
Y = T C (pyrimidine)
K = G T (keto)
M = A C (amino)
S = G C (strong bonds)
W = A T (weak bonds)
B = G T C (all but A)
D = G A T (all but C)
H = A C T (all but G)
V = G C A (all but T)
N = A G C T (any)
Thus IUPAC consensus of a motif can be liken to a list of words.
Motif can be described as a consensus word with a given number of mismatches.
For example, for bicoid it could be:
CCTAATCCC and 3 mismatches.
But this way of motif representation is quite infrequent.