GBOL & GBOL Stack

To enable interoperability of genome annotations, we have developed the Genome Biology Ontology Language (GBOL) and associated stack (GBOL stack). GBOL is provenance centered and provides a consistent representation of genome derived automated predictions linked to the dataset-wise and element-wise provenance of predicted elements. GBOL is modular in design, extensible and is integrated with existing ontologies. Interoperability of linked data can only be guaranteed through the application of tools that provide the means for a continuous validation of generated linked data. The GBOL stack enforces consistency within and between the OWL and ShEx definitions. Genome wide large scale functional analyses can then easily be achieved using SPARQL queries. Additionally, modules have been developed to serialize the linked data (RDF) and to generate a plain text format files with integrated support for data provenance that that mimic the indentation structure of GenBank and EMBL formats.

The GBOL R and Java APIs have been generated using Empusa. In the API/Empusa section you will get instructions on i) how to use Empusa to generate an API ii) how to use the generated API. This last section contains also examples of how the API can be used to enforce consistent and correct usage of the ontology.

Cite this article

Dam, J.C.J., Koehorst, J.J., Vik, J.O. et al. The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation. Sci Data 6, 254 (2019) doi:10.1038/s41597-019-0263-7