Genetics and the Shannon Index
Abstract
Information and uncertainty deal with the process of selecting an object
from a larger set of objects. Before and object is selected, we are
uncertain as to what will appear. Once an object is selected, the information
regarding the object increases, and our uncertainty decreases.
Shannon studied this process and derived the following formula for
H,
entropy, or the degree of randomness (uncertainty):
H =
-
Pi log2 Pi
(bits per symbol).
We can use this formula to determine the amount of informationthe
decrease in uncertaintyabout a particular system:
I = Hmax - H.
Additionally, we can use this notion to calculate the conditional entropy
(the randomness about
Y given
X):
H(
Y|
X) =
p(xi, yj) log2 p(yj|xi).
We can apply this to the field of genetics and use the conditional entropy to
determine how linked (or associated) certain genes are. With the Lyme Disease causing
bacteria,
Borrelia burgdorferi sensu stricto, we determined
that one gene in particular (ospC) is good at predicting a set of eleven other
genes. That is, the conditional entropy,
H(
Y|
X), is roughly equivalent to the
H(
Y) for all
Y, given ospC.
Table of Contents
Complete List of References
- [BC1]
Beck, C. and F. Schlögl.
Thermodynamics of Chaotic Systems: An Introduction.
Cambridge University Press, Cambridge, 1993.
- [BA1]
Bruen, A. A. and M. A. Forcintio.
Cryptography, Information Theory, and Error-Correction: A Handbook for the 21st Century.
Wiley-Interscience, Hoboken, N.J., 2005.
- [DM1]
Dekking, M., Ludolf M., and Rik L. C.
A Modern Introduction to Probability and Statistics.
Springer, New York, 2004.
- [FE1]
Feil, E. J.
Small change: keeping pace with microevolution.
Nat Rev. Microbio. 2: 483- 495, 2004.
- [MD1]
MacKay, D. J. C.
Information Theory, Inference, and Learning Algorithms.
Cambridge University Press, Cambridge, UK, 2004.
- [QW1]
Qiu, W. G., et al.
Genetic exchange and plasmid transfers in Borrelia burgdorferi sensu stricto revealed by three-way genome comparisons and multilocus sequence typing.
Proc. Natl. Acad. Sci. 101: 14150-14155.
- [ST1]
Schneider, T.
Information Theory Primer.
- [SC1]
Shannon, C. E.
A mathematical theory of communication.
Bell System Tech. J., 27:379-423, 623-656, 1948.
- [SC2]
Shannon, C. E.
Claude Elwood Shannon: Collected Papers (ed. N.J.A. Sloane, Aaron D. Wyner).
IEEE Press, New York, 1993.
- [SM1]
Smith, M. J., et al.
How clonal are bacteria?
Proc. Natl. Acad. Sci. 90: 4384-4388, 1993.
- [VH1]
von Baeyer, H. C.
Information: The New Language of Science.
Harvard University Press, Cambridge, Mass., 2004.