The Wang group designs novel data models
and algorithms to address fundamental computational issues in analyzing
large sets of experimental data. Ongoing research projects include:
1. Classification and clustering analysis of gene-expression profiles.
The goal of this project is to design and evaluate new metrics for measuring
the proximity between gene expression profiles, and to build models
for characterizing these profiles. The Wang group has studied various
similarity functions that can model the expression pattern of co-regulated
genes. 2. Discovery of discriminative structural motifs in proteins.
This project aims to build a fully automatic tool for finding conserved
structural motifs that have strong associations with known protein
functions. These motifs may not be detected by existing methods which
rely upon multiple sequence alignments (due to their potentially low
sequential similarity) or geometric superimposition (due to the prohibitive
computational complexity for large numbers of proteins). The conserved
structural motifs are then used as features to build computational
models that can unambiguously distinguish proteins of different classes,
folds and families. 3. Query and integration of heterogeneous databases.
The focus of this project is integrating data and knowledge from different
resources and different species. The aim is to develop a powerful search
engine with heterogeneous biological databases in response to the fast
growth of database volume and increasingly diverse data formats and
query types. Based on the characteristics of the data formats, query
types and similarity metrics, novel data structures and algorithms
are designed to resolve the incompatibility between datasets from different
resources and to accelerate the query processing on massive databases.
Selected Publications:
Zhang X and Wang W. (2007) An efficient algorithm for mining coherent patterns from heterogeneous microarrays. Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM).
Zhang X, Wang W, and Huan J. (2007) On demand phenotype ranking through subspace clustering. Proceedings of the 7th SIAM Conference on Data Mining (SDM).
Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. (2006) Structure-based function inference using protein family-specific fingerprints. Protein Sci. 15(6):1537-43.
Zhang X and Wang W. (2006) Mining coherent patterns from heterogeneous microarray data. Proceedings of the 15th ACM Conference on Information and Knowledge Management (CIKM), pp. 838-839.
Wang W and Yang J. (2005) Mining Sequential Patterns from Large Data Sets, in Series of Advances in Database Systems, edited by Ahmed Elmagarmid, Kluwer.
Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A. (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12:657-71.
|