Skip to main content

Endogamous Pedigree Reconstruction Using Identity By Decent


Kelly Finke and Michael Kourakos, advised by Sara Mathieson

As DNA sequencing decreases in cost and increases in popularity, more
and more genetic information for individuals and families becomes
accessible; still, however, it is often impossible to obtain genetic
information for an entire family tree, especially if the tree extends
many generations back in time. Since it is often useful for biologists
to examine large, complete pedigrees -- such as when studying disease
inheritance -- a number of algorithms have been created to fill these
gaps by reconstructing ancestral genotypes using estimates of the
sources of known, sequenced descendants’ genotypes. But many of these
algorithms fail when working with complicated pedigrees, such as
pedigrees that involve inbreeding, cross-generational marriages, or
remarriage. When working with endogamous pedigrees -- pedigrees marked
by a close sharing of DNA due to the practice of marrying within the
same ethnic, cultural, social, religious or tribal group -- all of these
complications are common. Our endogamous pedigree reconstruction
algorithm, therefore, is designed to take these factors into account,
allowing users to work both with typical pedigrees and more complex,
endogamous pedigrees. This algorithm allows biologists to better gain
insights from these genetically unique populations which can inform many
studies of rare traits and diseases that are disproportionately
expressed in endogamous populations. Our motivation for this project
comes from our collaboration with UPenn biologists Maja Búcan and Rachel
Kember, who are studying inheritance patterns of bipolar disorder within
an Old Order Amish pedigree. 

Our method centers around segments of DNA that are identical by descent
(IBD segments): identical areas in DNA sequences shared by multiple
individuals due to descent from a common ancestor. When we can find a
common ancestor of all individuals known to share an IBD segment, we can
estimate the likelihood that that ancestor is the source of the IBD
segment, meaning that that segment of DNA must be present in the
ancestor and all of the descendents on the path from the ancestor to the
individuals known to contain the IBD segment. Due to the complications
in our pedigree, our algorithm often has to keep track of many existing
common ancestors and various potential paths down which the IBD segment
could have been passed. Using information based on the specifics of the
family structure and the number of meioses separating each possible
ancestor from all the descendents on various possible paths, we can
probabilistically determine the most likely ancestor “source” and
descendance path. We can tune these probabilities by detecting conflicts
between overlapping IBD segments, and eventually converge on the most
probable sequences for all ancestors in the complete pedigree.