From Ontology to Metadata: A Crawler for Script-based Workflows

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.


Download Preprint


Giuseppe Chiapparino , Benjamin Farnbacher, Nils Hoppe, Radoslav Ralev, Vasiliki Sdralia, Christian Stemmer 


The present work introduces HOMER (HPMC tool for Ontology-based Metadata Extraction and Re-use), a python-written metadata crawler that allows to automatically retrieve relevant research metadata from script-based workflows on HPC systems. The tool offers a flexible approach to metadata collection, as the metadata scheme can be read out from an ontology file. Through minimal user input, the crawler can be adapted to the user's needs and easily implemented within the workflow, enabling to retrieve relevant metadata. The obtained information can be further automatically post-processed. For example, strings may be trimmed by regular expressions or numerical values may be averaged. Currently, data can be collected from text-files and HDF5 files, as well as directly hardcoded by the user. However, the tool has been designed in a modular way, so that it allows straightforward extension of the supported file-types, the instruction processing routines and the post-processing operations.


Data Management Software


Metadata extraction, HPMC, Research Data Management, Ontology


Published: 2023-04-27 04:37


Creative Commons Attribution 4.0

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.