Help   About ProQuest | 

Dissertations & Theses
The world's most comprehensive collection of dissertations and theses.Learn More...

Citation/Abstract

Print  |  Email  |  Order a Copy  
Statistical methods for network-based analysis of genomic data
by Wei, Zhi, Ph.D., University of Pennsylvania, 2008, 142 pages; AAT 3309523

Abstract (Summary)

After many years of biomedical research, biologists have accumulated much knowledge about genes' collaborative activity. This knowledge is summarized in the form of biological pathways. Knowledge about biological pathways turns out to be useful in genome research and many computational methods have been proposed to utilize this information in the analysis of high-dimensional data. However, many of these methods utilize pathway information in post hoc ways and pathways are hardly used in the modeling step. This dissertation studies statistical methods for systematically modeling gene dependency encoded in biological pathways. The first part of this dissertation models pathway group structure. Specifically, Chapter 2 develops a pathway-based gradient descent boosting procedure for nonparametric pathway-based regression (NPR) analysis of genomic data. Such NPR models treat genes in the same pathway as a group and consider multiple pathways simultaneously, while allowing complex interactions among genes within a pathway. Our simulation studies and real-world applications indicate that the NPR models can indeed identify relevant genes and pathways. The second part of this dissertation models pathway graphic structure and develops several Markov random fields (MRF) to model the dependency of gene expression patterns in biological pathways. Specifically, Chapter 3 proposes a hidden MRF (hMRF) model for analysis of non-temporal genomic data in microarray time course (MTC) data, genes exhibit not only pathway graphic dependency but also temporal dependency. Chapter 4 extends the hMRF model further into a hidden spatial-temporal MRF model to simultaneously consider the graphic and temporal dependencies for analysis of MTC data. Alternatively, for short MTC data with a few time points, Chapter 5 treats observed gene expression data as multivariate vectors and assume genes share the same expression patterns over time. A Bayesian framework with the hMRF model as the prior is employed. Different multivariate empirical Bayesian models are developed to serve as the emission probabilities for longitudinal and cross-sectional designs. Simulation studies and real-world applications, by utilizing pathway graphic structure information, show that these MRF-based models are quite effective in identifying genes and modified subnetworks with higher sensitivity than common procedures and comparable false discovery rates.

Indexing (document details)

Advisor:Li, Hongzhe
School:University of Pennsylvania
School Location:United States -- Pennsylvania
Keyword(s):Boosting, Gene networks, Regression trees, Microarrays, SNP, Markov random fields, Genomic data
Source:DAI-B 69/04, Oct 2008
Source type:Dissertation
Subjects:Biostatistics, Bioinformatics, Computer science
Publication Number: AAT 3309523
ISBN:9780549574958
Document URL:http://proquest.umi.com/pqdlink?did=1529486651&Fmt=7&clientI d=79356&RQT=309&VName=PQD
ProQuest document ID:1529486651


 

 » Purchase the full text

Dissertations and theses can be purchased in a variety of formats which may include: PDF for web download, softcover, hardcover, or microform. Click the "Order a Copy" button to see the formats available for this item.

Available without purchase:

Preview  Preview

Print  |  Email  |  Order a Copy  
^Back to Top
Copyright © 2009 ProQuest LLC. All rights reserved. Terms and Conditions