Content area

Abstract

Most biological systems are composed of interacting constituents which collectively determine the behavior of the system. Graph theory provides an appropriate language for a quantitative description of such systems. In this thesis we study two examples of physical processes on biological networks - evolving amino acid sequences and self-assembling macromolecules. Both these processes are commonly modeled using appropriately defined networks. However, these models are often computationally intractable using standard algorithms and quantitative investigations rely almost exclusively on heuristics that do not provide guarantees on their accuracy. This thesis presents techniques with reliable error guarantees for computations on both these systems. In most non-trivial instances these methods are orders of magnitude more efficient than existing exact methods and are often comparable in run times to popular heuristics. A common theme behind these methods is the recognition that for most instances the 'hardness' of the computational problem is limited to subgraphs of the full network and that by a process of isolating and solving the problem on these hard regions, we can piece together a complete solution to the original problem. Clearly, the efficiency of any such program depends on the relative size of these computationally intractable islands and how precisely we can identify them. This thesis attempts to answer some of these questions for the following computations For the problem of computing the weighted most parsimonious phylogeny on multistate characters, we present a provably optimal method which relies on extending the standard Buneman pruning from binary character to characters with arbitrary finite number of states. We use an integer linear program to solve the maximum parsimony problem on this pruned subgraph on large data sets in time competitive with popular heuristics. Next, we present a novel optimization based Markov chain to sample phylogenies. We prove that at sufficiently high temperatures (a measure for the roughness of the probability landscape) our method is guaranteed to give well mixed samples from the likelihood distribution in steps that grow polynomially with the number of input taxa and the sequence length. Finally, we present two algorithms for simulating general continuous time Markov models, which include models used in simulating self-assembly in biological systems. These algorithms also rely on identifying intractable subgraphs by a method we call Automated Discovery. We then solve the Markov model exactly on these 'stiff' sub-graphs using complete spectral decomposition. We also propose a method with arbitrary user-defined accuracy which does not rely on a full spectral decomposition. We empirically validate the efficiency of these methods for some popular networks used in the literature.

Details

Title
Algorithms for reliable computing in molecular biology
Author
Misra, Navodit
Year
2010
Publisher
ProQuest Dissertations Publishing
ISBN
978-1-124-26641-1
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
758062053
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.