Towards Improved Findability of Energy Research Software by Introducing a Metadata-based Registry

This is a Preprint and has not been peer reviewed. A published version of this Preprint is available on ing.grid . This is version 4 of this Preprint.

Authors

Stephan Ferenz  , Astrid Nieße

Abstract

Research software in the energy domain becomes increasingly important for the analysis, simulation, and optimization of energy systems and supports design decisions in the required transition of energy systems to tackle the climate crisis. To make energy research software (ERS) more findable, it should be described with metadata following the FAIR (findable, accessible, interoperable, and reusable) principles and be registered in a common registry. To this end, we motivate and present a concept for a metadata-based registry for ERS which should enable researchers to easily add new ERS as well as to find new ERS.

Comments

Comment #66 Martin Thomas Horsch @ 2023-11-09 03:00

Many thanks to the reviewers again, with this it is accepted (again). Will send to journal, hope that the small corrections can be built in redactionally upon publication in the journal.

Invited Review Comment #65 Giacomo Lanza @ 2023-11-06 07:16

I appreciate the improvements and clearifying the target of the publication. Some minimal corrections on my side:

L. 218: remove "of" before "constraints".
L. 291: the future work which you describe in the following sounds more "putting your concept into practice", which would be a much bigger progress than just "developing and improving" it.
L. 293: add a comma after "Based on these requirements" and after "Afterwards"


Invited Review Comment #63 Dorothea Iglezakis @ 2023-10-25 09:02

My concerns are fully adressed by the authors. Some small remaining (language) issues:


- line 86: "include" instead of "included"

- line 89: " which provide an overview their properties" -> which provides an overview over their properties ?

- line 91: use different metadata schema as foundation -> use different metadata schemata as a foundation

- Table 3:

- in my view repositories like GitLab and GitHub are something different than Zenodo and Software Heritage, the first ones focusing on the management of software code and the second ones focusing on archiving and persistently identying software artifacts.

- there are two different dashes used in the last two lines of the table to indicate that no schema is used

    

Comment #58 Martin Thomas Horsch @ 2023-09-26 10:39

This is accepted for publication. I was asked to confirm this via a comment here, but the comment does not seem to show up.

Comment #57 Martin Thomas Horsch @ 2023-09-26 10:38

I was asked to confirm via a comment on this platform that this is accepted for publication.

Comment #36 Stephan Ferenz @ 2023-07-24 04:08

Dear Mrs. Iglezakis, dear Mr. Horsch, dear Mr. Lanza,
thank you very much for the constructive and very helpful feedback. We addressed your feedback in our manuscript and discuss the changes below. The original review comments are repeated at each bullet point. Our answer is always marked as “Answer:”. We are looking forward to your response!

## Invited Review Comment #19 by Dorothea Iglezakis:
- While there is a sound definition of ERS, the terms software repository (in the sense of a source code repository and in contrast to a data repository) and software registry are not defined but quite central for the article.
Answer: We added a short definition for software registries and software repositories in lines 24-25.
- The definition and examples of software (l. 7-15) include scripts, libraries and models, but in the next paragraph models and frameworks are presented as main software outputs of the energy domain.
Answer: We improved the definition of ERS to include frameworks as well (line 10).
- The need for FAIR software does not necessarily follow from the problem description in lines 16-21. While FAIR software could help solving the first problem, I cannot see how this applies to the second one. Reference 5 also argues that one of the main problems are missing valid models with clear defined interfaces. For this problem, findable and reusable software could really help.
Answer: We reformulated the motivation in respect to complexity of future ERS which requires more reuse of existing ERS (line 20-21).
- But what is the actual state of availability of ERS? How does the community get to know what software already exists? Is software in this domain mainly developed open source? Perhaps add one sentence, why FAIR software and the planned software registry could be the solution of these problems. Why is it not enough to develop open source and publish text publications about software in this domain (if this is the case).
Answer: Thank you for these important questions. Unfortunately, there is currently no data on the availability of ERS. We think this is an important topic for further research. We added a sentence in lines 28-29 to explain, why the planned software registry can be a relevant part of the solution.
- There could also be some more information about the metadata requirements in the energy domain. What information is necessary to find and use software in this domain? Then the chapters 2 and 3 could refer to this requirements. Table 1 provides examples of metadata fields but in no relation to the problems identified.
Answer: We agree that metadata requirements are really important for the whole concept. We are currently working on a requirements analysis based on about 30 interviews with researchers from the energy domain. We want to present these requirements in a further publication in this context. We outline the requirement analysis as further work in the outlook (lines 282 and 283).
- The chapter about related work mainly focuses on metadata standards and terminologies and not on software registries. But as chapter 3 not only outlines the plans for a metadata scheme, but also for a software registry, a look at the services, advantages and disadvantages of existing software registries, repositories or archives like software heritage (https://archive.softwareheritage.org/), zenodo (https://zenodo.org/search?type=software) with its integration with GitHub, the OntoSoft Portal (https://www.ontosoft.org/portal/#list) or subject specific registries like swMath (https://swMath.org), bio.tools, CoMSES (https://www.comses.net/codebases/) or machine learning tools (https://mloss.org/software/) could not hurt.
Answer: We added a general overview on registries and repositories for research software in the new subsection 2.2.
- Not sure, if the whole audience is familiar with the meaning of URIs in metadata (l. 52), namely the unique identification of persons and entities.
Answer: We added a definition of URI in line 55.
- The selection of properties to compare metadata schemas in table 2 and 3 does not get really clear. Are these quite formal criteria really the most important ones to compare the different approaches? If that's the case, then please argue, why these criteria are crucial to solve the problems in chapter 1. What are the advantages and disadvantages of the different terminologies and ontologies, not only in a formal way, but also in terms of content? What is missing, what is applicable to the energy domain and what is not?
Answer: We agree that the criteria remain a bit unclear and were not well explained in the text. Therefore, we added some additional information on the criteria in lines 50-53 and 120-123. These tables should give a general overview on the presented approaches. For reusing and integrating them into a new approach, a deeper look into them will be required.
- How do you define "Support for URIs" in table 2. As there is a unique identifier property for persons in OntoSoft and OntoSoft is an ontology with sort of built-in support of URIs, I wondered, why OntoSoft does not have support for URIs.
Answer: We agree and fixed the table accordingly.
- I wouldn't really call CodeMeta a scheme, more a vocabulary to describe research software. In fact, it is also an ontology, but the project aims more in the direction of a scheme to describe software, but OntoSoft goes in a similar direction.
Answer: We agree that CodeMeta is also a vocabulary. But, it aims in the direction of a scheme and it is usable as a scheme. Also, CodeMeta calls itself a scheme. Therefore, we would call it a metadata scheme.
- This chapter - as a main contribution of the paper - remains on a very generic conceptual level. There are no real details about the metadata scheme to be developed that go beyond mere formal criteria (that are important, but are not really motivated in the first chapter). A definition of the (also content based) requirements in chapter 1 followed by an analysis of existing schemes and ontologies in chapter 2 could now be followed by a bit more detailed plans according to the metadata scheme. Do you plan to use and/or extend existing schemes and ontologies?
Answer: The paper should only outline the concept for the metadata-based registry for energy research software to introduce the topic. We clarified this in the introduction and the conclusion. Existing schemes and ontologies should be used and extended as much as possible to achieve a high interoperability.
- There are also no details about the technical or organisational context of the planned registry. Is this registry planned in the context of NFDI4Energy? Are there any plans about building, announcing and maintaining such a service? Should the registry build on an existing platform?
Answer: Yes, the registry is planned within the context of NFDI4Energy. We added the link to NFDI4Energy within the conclusion and outlook. The plans for announcing and maintaining the service will be developed within NFDI4Energy but are not decided yet. We want to build on existing solutions as far as possible. But, there is no decision yet which existing platform should be used.
- As far as I understand the concept of the registry, it is a database and searchable index for metadata linking to the corresponding source code repository of a software. Are there any thoughts about also linking to published or archived versions of software in data repositories or archives like software heritage? What will happen, if a software is no longer maintained or the corresponding source code repository is deleted?
Answer: Thank you for these really relevant questions which are not answered yet. We included them as open questions in lines 269-271.
- An additional reference for the idea and implementation of application profiles in section 3.1 could be the metadata profile services developed within NFDI4Ing (S3-1, https://nfdi4ing.de/base-services/s-3/) and the AIMS project (https://www.aims-projekt.de/)
Answer: Thank you for pointing out these relevant services. We added them to the section in lines 234-235.
- An additional reference for section 3.2 could be the Hermes workflow of the equally named project that extracts metadata from source code repositories (https://docs.software-metadata.pub/en/latest/). Could you add some information, what metadata you expect to be able to extract and what metadata has to come from other sources?
Answer: Thank you for pointing out this important and relevant project. We added it to the relevant section in lines 250-253. We also added examples of metadata which we expect to extract from the different sources in that section.
- Smaller language issues
Answer: Thank you for this good list of smaller language issues. We fixed them all in our text.

## Comment #23 by Thomas Horsch:
- This article identically reuses a Figure and text from an extended abstract by Stephan Ferenz that was already published as Energy Informatics 2022, 3 (Suppl 2): S49. It is in general acceptable to republish content from conference abstracts as part of an article, but this must at least be made explicit. Please include a reference to that previous work and also state in a footnote stating that a Figure and some text are identically reused.
Answer: We added a footnote to our contribution (line 32) and a citation to Figure 2 on page 2.


Invited Review Comment #26 by Giacomo Lanza:
- As pointed out by the other reviewer, the constructive part is quite meagre and does not give any hints about how the "three main artifacts" should be built in practice, besides some general principles. The definition of the chosen criteria and their contribution to draft the ERS registry might also be expanded.
Answer: The paper should only outline the concept for the metadata-based registry for energy research software to introduce the topic. We clarified this in the introduction and the conclusion.
- "Related work" is normally used for an outlook. I suggest replacing the title with "Overview of relevant ontologies and schemas" or "State of the art".
Answer: We renamed the section to “State of the art” as you suggested.
- Please highlight the names of the ontologies which are used in the following, to separate them from the rest (e.g. in boldface, or rephrasing).
Answer: We highlighted the names of the ontologies in italic.
- I suggest rephrasing the introduction of SWO: "The Software Ontology (SWO) was developed extending the bioinformatics EDAM ontology ... ".
Answer: We rephrased the paragraph as suggested.
- Please add a table also for Section 2.3, if possible.
Answer: We added a table for Section 2.3. (now Section 2.4, see above)
- Small issues in Section 3.
Answer: Thank you for pointing us to these small issues. We fixed them all.

Best regards,
Stephan Ferenz (corresponding author)

Invited Review Comment #26 Giacomo Lanza @ 2023-05-24 11:14

The article deals with a very actual matter and reads very clearly. The problem is well highlighted and the state-of-the-art search brings a very consistent starting basis. As pointed out by the other reviewer, the constructive part is quite meagre and does not give any hints about how the "three main artifacts" should be built in practice, besides some general principles. The definition of the chosen criteria and their contribution to draft the ERS registry might also be expanded.

Some minor language corrections:

# Section 2

"Related work" is normally used for an outlook. I suggest replacing the title with "Overview of relevant ontologies and schemas" or "State of the art"

Please highlight the names of the ontologies which are used in the following, to separate them from the rest (e.g. in boldface, or rephrasing).
I suggest rephrasing the introduction of SWO: "The Software Ontology (SWO) was developed extending the bioinformatics EDAM ontology ... "

Please add a table also for Section 2.3, if possible.

# Section 3

Line 163: "should be publish" --> "should be published"

Line 173-174: "of" should be removed (three times)

Figure 2: possibly replace "schema" with "scheme" for coherency.


Comment #23 Martin Thomas Horsch @ 2023-04-27 04:13

This article identically reuses a Figure and text from an extended abstract by Stephan Ferenz that was already published as Energy Informatics 2022, 3 (Suppl 2): S49. It is in general acceptable to republish content from conference abstracts as part of an article, but this must at least be made explicit. Please include a reference to that previous work and also state in a footnote stating that a Figure and some text are identically reused.

Invited Review Comment #19 Dorothea Iglezakis @ 2023-03-15 07:46

Summary

The article "Towards Improved Findability of Energy Research Software by Introducing a Metadata-based Registry" of Stephan Ferenz and Astrid Nieße provides a very good overview over existing schemata, ontologies and terminologies for research software and for the energy domain. The article is well written and easy to follow, but lacks details as soon as it comes to the description of the own planned software registry. 


Detailed Comments

Chapter 1: Motivation

While there is a sound definition of ERS, the terms software repository (in the sense of a source code repository and in contrast to a data repository) and software registry are not defined but quite central for the article. 

The definition and examples of software (l. 7-15) include scripts, libraries and models, but in the next paragraph models and frameworks are presented as main software outputs of the energy domain.


The need for FAIR software does not necessarily follow from the problem description in lines 16-21.

The problems described are: 

- there is a lot of parallel development, software in the domain is seldom reused

- simulation of energy system is getting more complex because of the growing number of interrelated components

While FAIR software could help solving the first problem, I cannot see how this applies to the second one. Reference 5 also argues that one of the main problems are missing valid models with clear defined interfaces. For this problem, findable and reusable software could really help. But what is the actual state of availability of ERS? How does the community get to know what software already exists? Is software in this domain mainly developed open source? Perhaps add one sentence, why FAIR software and the planned software registry could be the solution of these problems. Why is it not enough to develop open source and publish text publications about software in this domain (if this is the case)

There could also be some more information about the metadata requirements in the energy domain. What information is necessary to find and use software in this domain? Then the chapters 2 and 3 could refer to this requirements. Table 1 provides examples of metadata fields but in no relation to the problems identified. 

Chapter 2: Related work

The chapter about related work mainly focuses on metadata standards and terminologies and not on software registries. But as chapter 3 not only outlines the plans for a metadata scheme, but also for a software registry, a look at the services, advantages and disadvantages of existing software registries, repositories or archives like software heritage (https://archive.softwareheritage.org/), zenodo (https://zenodo.org/search?type=software) with its integration with GitHub, the OntoSoft Portal (https://www.ontosoft.org/portal/#list) or subject specific registries like swMath (https://swMath.org), bio.tools, CoMSES (https://www.comses.net/codebases/) or machine learning tools (https://mloss.org/software/) could not hurt.   

Not sure, if the whole audience is familiar with the meaning of URIs in metadata (l. 52), namely the unique identification of persons and entities.

The selection of properties to compare metadata schemas in table 2 and 3 does not get really clear. Are these quite formal criteria really the most important ones to compare the different approaches? If that's the case, then please argue, why these criteria are crucial to solve the problems in chapter 1. What are the advantages and disadvantages of the different terminologies and ontologies, not only in a formal way, but also in terms of content? What is missing, what is applicable to the energy domain and what is not?

How do you define "Support for URIs" in table 2. As there is a unique identifier property for persons in OntoSoft and OntoSoft is an ontology with sort of built-in support of URIs, I wondered, why OntoSoft does not have support for URIs.

I wouldn't really call CodeMeta a scheme, more a vocabulary to describe research software. In fact, it is also an ontology, but the project aims more in the direction of a scheme to describe software, but OntoSoft goes in a similar direction.


Chapter 3 Concept

This chapter - as a main contribution of the paper - remains on a very generic conceptual level. There are no real details about the metadata scheme to be developed that go beyond mere formal criteria (that are important, but are not really motivated in the first chapter).  A definition of the (also content based) requirements in chapter 1 followed by an analysis of existing schemes and ontologies in chapter 2 could now be followed by a bit more detailed plans according to the metadata scheme. Do you plan to use and/or extend existing schemes and ontologies?

There are also no details about the technical or organisational context of the planned registry. Is this registry planned in the context of NFDI4Energy? Are there any plans about building, announcing and maintaining such a service? Should the registry build on an existing platform?

As far as I understand the concept of the registry, it is a database and searchable index for metadata linking to the corresponding source code repository of a software. Are there any thoughts about also linking to published or archived versions of software in data repositories or archives like software heritage? What will happen, if a software is no longer maintained or the corresponding source code repository is deleted?

An additional reference for the idea and implementation of application profiles in section 3.1 could be the metadata profile services developed within NFDI4Ing (S3-1, https://nfdi4ing.de/base-services/s-3/) and the AIMS project (https://www.aims-projekt.de/). 

An additional reference for section 3.2 could be the Hermes workflow of the equally named project that extracts metadata from source code repositories (https://docs.software-metadata.pub/en/latest/).  Could you add some information, what metadata you expect to be able to extract and what metadata has to come from other sources?  

Smaller language issues

- line 19: add comma between "components" and "ERS"

- line 31: our contribution*s* are oder our contribution is

- line 158: as *a * first step

- line 160: *cor*responding software registry

- line 162, add comma after "software"

- line 162: write additional software -> extend the software or write additional code 

- line 163: should be publish*ed*

- line 174: constrain*t*s

- line 180: the one*s*

- page 4: In the legend of table 2 there is one ")" too much. 

Downloads

Download Preprint

Metadata
  • Published: 2023-03-01
  • Last Updated: 2023-11-10
  • License: Creative Commons Attribution 4.0
  • Subjects: Data Infrastructure, Data Management Software
  • Keywords: Interoperability, Digital Libraries, Energy Research, FAIR, Research Software, Metadata, Open Source Software, Software Reusability, Ontology, Semantic Web, Linked Data, Digital Libraries, Energy Research, FAIR, Research Software, Metadata, Open Source Software, Software Reusability
Versions
All Preprints