Evaluation of tools for describing, reproducing and reusing scientific workflows

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Downloads

Download Preprint

Supplementary Files
Authors

Philipp Diercks , Dennis Gläser, Ontje Lünsdorf, Michael Selzer, Bernd Flemisch, Jörg Unger

Abstract

In the field of computational science and engineering, workflows often entail the
application of various software, for instance, for simulation or pre- and postprocessing.
Typically, these components have to be combined in arbitrarily complex workflows
to address a specific research question.
In order for peer researchers to understand, reproduce and (re)use the findings
of a scientific publication, several challenges have to be addressed.
For instance, the employed workflow has to be automated and information on all used
software must be available for a reproduction of the results.
Moreover, the results must be traceable and the workflow documented
and readable to allow for external verification and greater trust.
In this paper, existing workflow management systems~(WfMSs) are discussed regarding
their suitability for describing, reproducing and reusing scientific workflows.
To this end, a set of general requirements for WfMSs were deduced from user stories
that we deem relevant in the domain of computational science and engineering.
On the basis of an exemplary workflow implementation,
publicly hosted at GitHub~(https://github.com/BAMresearch/NFDI4IngScientificWorkflowRequirements),
a selection of different WfMSs is compared with respect to these
requirements, to support fellow scientists in identifying the WfMSs that
best suit their requirements.

Subjects

Data Management Software

Keywords

FAIR, reproducibility, scientific workflows, tool comparison, workflow management

Dates

Published: 2022-12-06 09:00

Last Updated: 2022-12-06 11:40

License

Creative Commons Attribution 4.0

Add a Comment

You must log in to post a comment.


Comments

Invited Review Comment #14 Anonymous @ 2023-01-31 09:44

Summary:
The authors present a study on workflow management systems (WfMS) for use in computational science and engineering. Their focus - according to the introduction - is on describing, reproducing and reusing scientific workflows as one aspect of FAIR research software. Based on user stories they define a set of requirements on workflow systems and compare existing solutions.

Strenghts:
- Taking a start at this important topic. Workflows are common in science and engineering as one part of research software and thus need ot contribute to the FAIR aspects.
- Defining requirements and evaluating existing solutions to help researchers in making their workflows FAIR.
- The definition of the requirements on the workflow systems is also based on user stories.
- The authors implmented an example workflow to experimentally test their requirement list on several WfMS.

Weaknesses:
- Only three user stories are taking into account for the definition of the requirements. Though these three user stories are quite different its not clear to me that they are able to take into account all varieties. As part of NFDI4Ing the authors should be able to get more user stories.
- In the introduction the authors state that they want to "discuss how WfMSs can contribute to the transparency, adaptability and reproducibility of computational research". The authors only evaluated the WfMS regarding the requirements they extracted from the user stories. The RDA working group FAIR4RS and others groups (some were even cited in the paper) also defined criteria for FAIR software or workflows. Why were these not taken into account to get a full view on the topic?
- The authors selected five WfMS - why were these selected? Are these the ones widely used worldwide in the computational science and engineering community?
- The requirements list is a mixture of requirements for FAIRness and for "better user experience" (like graphical user interface, ease of first use). It would be better to clearly seperate them in the evaluation - which of them contribute to "transparency, adaptability and reproducibility"?
- The importance of the single requirements should be clearly stated. Maybe they should be weighted?

Writing issues:
The paper is easy to read and understandable.
Some links in the references did not work and should be checked (e.g. reference 20).
There is no statement of the contributions of the authors.

Conclusion:
The topic is of importance for FAIR research software and I think the authors are on the right way. Nevertheless, to be of more use for the community, some more work is needed. Thus I would reject the current state of the work but encourage the authors to come back with a refined paper.

Invited Review Comment #13 Anonymous @ 2023-01-20 10:09

This paper presents an excellent overview of the requirements and the practical problems associated with running scientific workflows in typical heterogeneous environments from the researcher's local machine up to cluster and HPC systems. The authors present a very practical approach to comparing different workflow management systems (WMS): a concrete, non-trivial workflow is implemented and executed using different WMS, allowing to compare the different systems using well-chosen criteria. The workflow used for comparing as well as the different implementations are available publicly, and one can only hope that more workflow management systems that are used in the scientific community (e.g. Apache Airflow, Elyra etc) are added to this comparison in the future.