The Wang group designs novel data models and algorithms to address fundamental computational issues in analyzing large sets of experimental data. Ongoing research projects include:

1. Classification and clustering analysis of gene-expression profiles.
The goal of this project is to design and evaluate new metrics for measuring the proximity between gene expression profiles, and to build models for characterizing these profiles. The Wang group has studied various similarity functions that can model the expression pattern of co-regulated genes.

2. Discovery of discriminative structural motifs in proteins.
This project aims to build a fully automatic tool for finding conserved structural motifs that have strong associations with known protein functions. These motifs may not be detected by existing methods which rely upon multiple sequence alignments (due to their potentially low sequential similarity) or geometric superimposition (due to the prohibitive computational complexity for large numbers of proteins). The conserved structural motifs are then used as features to build computational models that can unambiguously distinguish proteins of different classes, folds and families.

3. Query and integration of heterogeneous databases.
The focus of this project is integrating data and knowledge from different resources and different species. The aim is to develop a powerful search engine with heterogeneous biological databases in response to the fast growth of database volume and increasingly diverse data formats and query types. Based on the characteristics of the data formats, query types and similarity metrics, novel data structures and algorithms are designed to resolve the incompatibility between datasets from different resources and to accelerate the query processing on massive databases.


Selected Publications:
Zhang X and Wang W. (2007) An efficient algorithm for mining coherent patterns from heterogeneous microarrays. Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM).

Zhang X, Wang W, and Huan J. (2007) On demand phenotype ranking through subspace clustering. Proceedings of the 7th SIAM Conference on Data Mining (SDM).

Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. (2006) Structure-based function inference using protein family-specific fingerprints. Protein Sci. 15(6):1537-43.

Zhang X and Wang W. (2006) Mining coherent patterns from heterogeneous microarray data. Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM), pp. 838-839.

Wang W and Yang J. (2005) Mining Sequential Patterns from Large Data Sets, in Series of Advances in Database Systems, edited by Ahmed Elmagarmid, Kluwer.

Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A. (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12:657-71.


     
 

contact information:

[phone]
(919) 962-1744

[email]

[website]