From Ontology to Metadata: A Crawler for Script-based Workflows

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Downloads

Download Preprint

Authors

Giuseppe Chiapparino , Benjamin Farnbacher, Nils Hoppe, Radoslav Ralev, Vasiliki Sdralia, Christian Stemmer 

Abstract

The present work introduces HOMER (HPMC tool for Ontology-based Metadata Extraction and Re-use), a python-written metadata crawler that allows to automatically retrieve relevant research metadata from script-based workflows on HPC systems. The tool offers a flexible approach to metadata collection, as the metadata scheme can be read out from an ontology file. Through minimal user input, the crawler can be adapted to the user's needs and easily implemented within the workflow, enabling to retrieve relevant metadata. The obtained information can be further automatically post-processed. For example, strings may be trimmed by regular expressions or numerical values may be averaged. Currently, data can be collected from text-files and HDF5 files, as well as directly hardcoded by the user. However, the tool has been designed in a modular way, so that it allows straightforward extension of the supported file-types, the instruction processing routines and the post-processing operations.

Subjects

Data Management Software

Keywords

Metadata extraction, HPMC, Research Data Management, Ontology

Dates

Published: 2023-04-27 16:37

License

Creative Commons Attribution 4.0

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.