Jan 11, 2005

The National Center for Research Resources, a component of the National Institutes of Health, recently awarded CCGS researcher Morgan Giddings a $1M grant to develop her proteomics software called Genome Fingerprint Scanning (GFS).

Proteomics, the analysis of an organism’s entire protein complement, has evolved naturally from the explosion of genomic information now available for hundreds of organisms. Although massive amounts of DNA sequence are readily attainable, interpreting or annotating genomic information is a significantly greater challenge. In particular, the identification of protein-encoding genes is of major importance to biomedical research since they are at the heart of understanding basic biology and the mechanisms of disease. Large-scale analysis of proteins, however, poses many technical obstacles that researchers are still struggling to overcome. The Giddings group hopes to address some of these challenges by improving and making public their GFS analysis tools.

Currently available proteomics software depends heavily on previously annotated genes deposited into public databases, which are often incomplete or incorrect since genes are usually predicted computationally. Thus, many proteins or peptides identified experimentally are not represented in the databases and cannot be analyzed with existing tools. GFS obviates the need for accurate annotation by linking protein data (in the form of mass spectra) directly to raw, unannotated or even unfinished genome sequence.

GFS grew out of a project that Giddings initiated as a postdoctoral fellow at the University of Utah. The rationale and first application of this method was published in 2003 (Proc Natl Acad Sci USA 100:20-5). GFS performs an in silico translation and proteolytic digestion of an entire genome sequence. Since only raw genome sequence is used, there is no bias in determining where genes are located or what their products might be. A peptide mass list from mass spectrometry data is then matched to the predicted proteolytic digest to look for clusters where a relatively high number of peptide masses match a particular DNA sequence, indicating the likelihood of identifying the corresponding gene. GFS has already proven to be useful in identifying novel genes in poorly annotated genomes such as Francisella tularensis and Tetrahymena thermophila.

Several improvements to the software, along with the launch of a publicly available web interface were recently published by Giddings’ group (J Proteome Res 3:1292-5). Now with a three-year grant from NCRR, Giddings plans to enhance the GFS algorithms for improved speed and reliability, in addition to creating several new capabilities. Moreover, she plans to create a more comprehensive website that provides examination of genomic regions with their matching peptide maps, and to provide complete administrator, developer, and end-user documentation and support. The source code is available to academic and government researchers free of charge. Originally developed using a Unix-based Macintosh operating system, the program will also be available for use on Linux, Windows, and other common computing platforms. “The funding from NCRR will allow us to transform our Genome Fingerprint Scanning program from an experimental, beta-quality tool, into a free and widely available resource that will benefit the global proteomics community,” says Giddings.

For additional information, see:

GFS website
NIH press release
Giddings lab website

back to top