Towards Improved Findability of Energy Research Software by Introducing a Metadata-based Registry

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Downloads

Download Preprint

Authors

Stephan Ferenz , Astrid Nieße

Abstract

Research software in the energy domain becomes increasingly important for the analysis, simulation, and optimization of energy systems and supports design decisions in the required transition of energy systems to tackle the climate crisis.
To make energy research software (ERS) more findable, it should be described with metadata following the FAIR (findable, accessible, interoperable, and reusable) criteria and be registered in a common registry.
To this end, we present a concept for a metadata-based registry for ERS which should enable researchers to easily add new ERS as well as to find new ERS.

Subjects

Data Infrastructure, Data Management Software

Keywords

Interoperability, Digital Libraries, Energy Research, FAIR, Research Software, Metadata, Open Source Software, Software Reusability, Ontology, Semantic Web, Linked Data, Digital Libraries, Energy Research, FAIR, Research Software, Metadata, Open Source Software, Software Reusability

Dates

Published: 2023-03-01 11:35

License

Creative Commons Attribution 4.0

Add a Comment

You must log in to post a comment.


Comments

Invited Review Comment #19 Dorothea Iglezakis @ 2023-03-15 13:46

Summary

The article "Towards Improved Findability of Energy Research Software by Introducing a Metadata-based Registry" of Stephan Ferenz and Astrid Nieße provides a very good overview over existing schemata, ontologies and terminologies for research software and for the energy domain. The article is well written and easy to follow, but lacks details as soon as it comes to the description of the own planned software registry. 


Detailed Comments

Chapter 1: Motivation

While there is a sound definition of ERS, the terms software repository (in the sense of a source code repository and in contrast to a data repository) and software registry are not defined but quite central for the article. 

The definition and examples of software (l. 7-15) include scripts, libraries and models, but in the next paragraph models and frameworks are presented as main software outputs of the energy domain.


The need for FAIR software does not necessarily follow from the problem description in lines 16-21.

The problems described are: 

- there is a lot of parallel development, software in the domain is seldom reused

- simulation of energy system is getting more complex because of the growing number of interrelated components

While FAIR software could help solving the first problem, I cannot see how this applies to the second one. Reference 5 also argues that one of the main problems are missing valid models with clear defined interfaces. For this problem, findable and reusable software could really help. But what is the actual state of availability of ERS? How does the community get to know what software already exists? Is software in this domain mainly developed open source? Perhaps add one sentence, why FAIR software and the planned software registry could be the solution of these problems. Why is it not enough to develop open source and publish text publications about software in this domain (if this is the case)

There could also be some more information about the metadata requirements in the energy domain. What information is necessary to find and use software in this domain? Then the chapters 2 and 3 could refer to this requirements. Table 1 provides examples of metadata fields but in no relation to the problems identified. 

Chapter 2: Related work

The chapter about related work mainly focuses on metadata standards and terminologies and not on software registries. But as chapter 3 not only outlines the plans for a metadata scheme, but also for a software registry, a look at the services, advantages and disadvantages of existing software registries, repositories or archives like software heritage (https://archive.softwareheritage.org/), zenodo (https://zenodo.org/search?type=software) with its integration with GitHub, the OntoSoft Portal (https://www.ontosoft.org/portal/#list) or subject specific registries like swMath (https://swMath.org), bio.tools, CoMSES (https://www.comses.net/codebases/) or machine learning tools (https://mloss.org/software/) could not hurt.   

Not sure, if the whole audience is familiar with the meaning of URIs in metadata (l. 52), namely the unique identification of persons and entities.

The selection of properties to compare metadata schemas in table 2 and 3 does not get really clear. Are these quite formal criteria really the most important ones to compare the different approaches? If that's the case, then please argue, why these criteria are crucial to solve the problems in chapter 1. What are the advantages and disadvantages of the different terminologies and ontologies, not only in a formal way, but also in terms of content? What is missing, what is applicable to the energy domain and what is not?

How do you define "Support for URIs" in table 2. As there is a unique identifier property for persons in OntoSoft and OntoSoft is an ontology with sort of built-in support of URIs, I wondered, why OntoSoft does not have support for URIs.

I wouldn't really call CodeMeta a scheme, more a vocabulary to describe research software. In fact, it is also an ontology, but the project aims more in the direction of a scheme to describe software, but OntoSoft goes in a similar direction.


Chapter 3 Concept

This chapter - as a main contribution of the paper - remains on a very generic conceptual level. There are no real details about the metadata scheme to be developed that go beyond mere formal criteria (that are important, but are not really motivated in the first chapter).  A definition of the (also content based) requirements in chapter 1 followed by an analysis of existing schemes and ontologies in chapter 2 could now be followed by a bit more detailed plans according to the metadata scheme. Do you plan to use and/or extend existing schemes and ontologies?

There are also no details about the technical or organisational context of the planned registry. Is this registry planned in the context of NFDI4Energy? Are there any plans about building, announcing and maintaining such a service? Should the registry build on an existing platform?

As far as I understand the concept of the registry, it is a database and searchable index for metadata linking to the corresponding source code repository of a software. Are there any thoughts about also linking to published or archived versions of software in data repositories or archives like software heritage? What will happen, if a software is no longer maintained or the corresponding source code repository is deleted?

An additional reference for the idea and implementation of application profiles in section 3.1 could be the metadata profile services developed within NFDI4Ing (S3-1, https://nfdi4ing.de/base-services/s-3/) and the AIMS project (https://www.aims-projekt.de/). 

An additional reference for section 3.2 could be the Hermes workflow of the equally named project that extracts metadata from source code repositories (https://docs.software-metadata.pub/en/latest/).  Could you add some information, what metadata you expect to be able to extract and what metadata has to come from other sources?  

Smaller language issues

- line 19: add comma between "components" and "ERS"

- line 31: our contribution*s* are oder our contribution is

- line 158: as *a * first step

- line 160: *cor*responding software registry

- line 162, add comma after "software"

- line 162: write additional software -> extend the software or write additional code 

- line 163: should be publish*ed*

- line 174: constrain*t*s

- line 180: the one*s*

- page 4: In the legend of table 2 there is one ")" too much.