Dec 13, 2004

Carolina Center for Genome Sciences researcher Todd Vision and his group in the Department of Biology have recently launched an online database for plant comparative genomics named Phytome. Phytome is designed to be a platform for open-ended data exploration in evolutionary and functional genomics. It provides computational analysis of publicly available genome information from 39 different plant species, including almost all of the world’s most valuable crops.

Though whole-genome sequences are available for only a few plants (e.g., Arabidopsis, rice and poplar), expressed sequence tags and genome survey sequences are available for many more, and genetic maps containing gene-based markers are available for an increasing number of species. With this growing volume of “shallow” genomic data from a wide diversity of plants, it has become increasingly important to create tools that go beyond simple gene annotation. For example, new methodologies are required to identify conserved motifs that define particular lineages within gene families from diverse organisms, or to reconstruct genome rearrangement events during evolution such that gene content and gene order can be inferred for sparsely mapped genomes. Application of these methods will require the integration of large datasets from a number of existing public resources as well as computationally intensive analysis of those datasets. “It can be surprisingly difficult to ask some basic comparative genomics questions, even when we know how to answer them,” says Vision.

In 2002, the National Science Foundation’s Plant Genome Research Program awarded Vision a five-year, $1M grant to develop and apply tools for plant “phylogenomics.” As the centerpiece of this grant, Phytome is intended to enable scientists to harness comparative genomics for applications in functional biology, molecular breeding, and evolutionary biology, and to make it easier to apply genomics tools to non-model plant species.

After two years of work by Vision’s development team, including 460 days of computer processing time, the first version of the online database was launched in September 2004. Currently, Phytome contains information on over 730,000 unique protein sequences from over 25,000 multi-member protein families. For any given protein or protein family of interest, researchers can easily find related families or subfamilies, create multiple alignments and phylogenies, find the corresponding genes and proteins in other databases, or search for protein families based on which species are represented (or not) in the protein family.

New features will continue to be added to Phytome over the next three years. Perhaps the most significant will be the ability to compare the genetic maps of multiple species simultaneously and predict the gene content in unsequenced regions of plant genomes. This will be a unique resource for scientists trying to isolate the genes contributing to variation in traits of economic importance in crops, such as yield and disease resistance.

A persistent challenge of genomics is how to capture, analyze and distribute the massive amounts of data being churned out at record levels from ongoing genome projects. Creating useful and accessible tools such as Phytome greatly facilitates the ability of both basic and applied scientists to incorporate genomics into their own work by making the computational analyses far less daunting. According to Vision, ”Even if the data are all out there, and even if we have all the software necessary to analyze it, these two elements need to be put together in one user-friendly package before bench scientists can take full advantage of the promise of comparative genomics.”

 
back to top