Semantic Web is in constant flux… Hence our effort to capture and analyze data through the adoption of a dynamic and extensible Web-Based Platform accommodating the proposed INTERoperable Inference Medium consisting of:
· Subscription and Publication Service for Data and Inference Technology Providers
· Middleware to enable data integration across heterogeneous data-sources in building shared knowledge-bases.
· Inference Engine & Knowledge-Base Hosting and Management Service for seamless integration of diverse Inference Technologies and Models supporting:
1. Classification (deterministic vs. probabilistic)
A Query Service operating on the published knowledge-bases, inference technologies and models serving its subscribers with data in XML/RDF feeds.
The proposed platform: Interim-WEB is currently under development with a prototype focused on data generated through biological research where we intend to address:
· Need to locate, navigate, analyze, and integrate vast amounts of biological data
· Shortcomings of existing database management systems in capturing all the relationships between genes, proteins, and DNA sequences
· Over-reliance on elaborate models and specialized technologies at the cost of interoperability due to the development cycles that address specific problems
Models of biological research have traditionally been restricted to isolated analysis of specific genes or proteins. These limitations have historically been mandated due to the limits of technology and/or the lack of the necessary paradigmal infrastructure, where the reductionist approach was often the only way to gather meaningful data.
Biological data output has increased dramatically in the last few decades as new techniques have been developed and deployed throughout the research community. This has given rise to and been coupled with the paradigm shift towards analysis of whole systems, a term which, while still fuzzily defined, has come to encompass more than a few biological entities. The rapid progress and the attendant explosion in data generation do have their price imposing new limitations on our abilities to integrate such data. While the paradigm may be shifting towards systems analysis, data is still mostly generated independently, mostly in single laboratories, each employing various combinations of software and equipment acquired from a heterogeneous pool of vendors and/or generated in-house.
A very simple example will suffice to illustrate the problem. SAGE is a high-throughput method, useful for defining the output of large numbers of genes under various conditions through the monitoring of poly-A containing RNA molecules. SAGE has been widely used to monitor gene output levels in tumors, and the resultant large-scale data is available in the public domain. Unfortunately, none of the freely available data sets are interoperable. Moreover, slight differences in methodology used in generating the SAGE data make it harder to compare across sets.
The example above illustrates the magnitude of the problem in the context of one type of data using a single technology. We leave it to the reader to appreciate the difficulties involved when multiple types of data (genomic, proteomic, interaction, structural, biochemical) are generated using a multitude of techniques.
Some of the technologies and platforms dealing with similar issues of interoperability within the same domain:
Standards: GameXML, GFF, Fasta, TigrXML (gene annotation)
Databases: EMBL, Genbank , RefSeq, Taxonomy, Gene Ontology, Medline, BIND
Analysis Agents: BLAST, MSPcrunch, Genscan, sim4 , RepeatMasker
Platforms: Pegasys/Atlas (http://bioinformatics.ubc.ca/topic/pegasys ), MOBY-S (http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1104155 )
What makes Interim-WEB different?
Interim-WEB does not enforce an ontology or a proprietary data exchange format. Registered data sources and the analysis agents incorporating a variety of inference technologies do not need to operate within a unified model. Individual models are designated for each engine along with specific data adapters for the targeted data resources: It is not realistic to rely on achieving consensus on how to capture all the genome information. This separates Interim-WEB from similar approaches like Pegasys' Atlas, or MOBY-S.