Genetics and the Shannon Index

Author: Jessica Larson [ profile | email ]

Abstract

Information and uncertainty deal with the process of selecting an object from a larger set of objects. Before and object is selected, we are uncertain as to what will appear. Once an object is selected, the information regarding the object increases, and our uncertainty decreases. Shannon studied this process and derived the following formula for H, entropy, or the degree of randomness (uncertainty): H = -Pi log2 Pi (bits per symbol). We can use this formula to determine the amount of information—the decrease in uncertainty—about a particular system: I = Hmax - H. Additionally, we can use this notion to calculate the conditional entropy (the randomness about Y given X): H(Y|X) = p(xi, yj) log2 p(yj|xi). We can apply this to the field of genetics and use the conditional entropy to determine how linked (or associated) certain genes are. With the Lyme Disease causing bacteria, Borrelia burgdorferi sensu stricto, we determined that one gene in particular (ospC) is good at predicting a set of eleven other genes. That is, the conditional entropy, H(Y|X), is roughly equivalent to the H(Y) for all Y, given ospC.

Table of Contents


Complete List of References

Back to Top