The Gupta group uses statistical and computational approaches to find conserved stochastic patterns or motifs in genome sequences. They are particularly interested in using these approaches to discover gene regulatory modules and interaction networks involved in specific biological processes. A number of theoretical issues arise from these studies such as: What are the limitations beyond which a pattern is completely unidentifiable from the background? How is local and global sequence composition related to the degree of difficulty in finding patterns? One observation from the genomes of higher organisms such as the mouse or human is that true motifs are not always well-conserved, but often occur in clusters or regulatory modules close to the regulation start site. In such a scenario, standard motif searching methods are not effective and often lead to a high number of false predictions or missed sites. Dr. Gupta and colleagues have developed a Monte Carlo approach to find the optimal set of pattern classes by introducing a framework that is assumed to have an underlying Markov structure for pattern-type occurrences and inter-site distances. Under a Bayesian framework, using appropriate choices of priors then allows the formulation of a recursive algorithm to evaluate the new likelihood function exactly, obtain posterior samples and derive improved parameter estimates. This framework for the module model has the potential to elucidate gene regulation networks using only the genomic sequence information. For example, if the binding sites for a large group of human transcription factors were defined experimentally, one can search for the best subset of motifs in the non-coding regions of their target sequences. Transcription factors that bind the same subset of motifs may be co-regulated and may therefore be functionally related. These regulatory modules should lead to experimentally testable models that will shed light on many interesting biological processes and disease states.

Selected References:
Maki A, Kono H, Gupta M, Asakawa M, Suzuki T, Matsuda M, Fujii H, Rusyn I. (2007) Predictive power of biomarkers of oxidative stress and inflammation in patients with hepatitis C virus-associated hepatocellular carcinoma. Ann Surg Oncol. 14:1182-90.

Giresi PG, Gupta M, Lieb JD. (2006) Regulation of nucleosome stability as a mediator of chromatin function. Curr Opin Genet Dev. 16:171-6.

Gupta M, Liu JS. (2005) De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Natl Acad Sci U S A 102:7079-84.

Gupta M and Liu JS. (2003) Discovery of conserved sequence patterns using a stochastic dictionary model. J Am Stat Assoc 98:55-66.

Liu JS, Gupta M, Liu XL and Lawrence CL. (2002) Statistical models for motif discovery. In Case Studies in Bayesian Statistics vol 6, Springer-Verlag, New York.

 
       
 

contact information:

[phone]
(919) 843-3656

[email]