Creating application-specific metadata profiles while improving interoperability and consistency of research data for the engineering sciences

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Authors

Nils Preuß , Matthias Bodenbenner , Benedikt Heinrichs, Jürgen Windeck, Mario Moser, Marc Fuhrmans

Abstract

Due to the heterogeneity of data, methods, experiments, and research questions and the necessity to describe flexible and short-lived setups, no widely used subject-specific metadata schemata or terminologies have been established for the field of engineering (as well as for other disciplines facing similar challenges). Nevertheless, it is highly desirable to realize consistent and machine-actionable documentation of research data via structured metadata. In this article, we introduce a way to create subject specific RDF-compliant metadata profiles (in the sense of SHACL shapes) that allow precise and flexible documentation of research processes and data. We introduce a hierarchical inheritance concept for the profiles that we combine with a strategy that uses composition of relatively simple modular profiles to model complex setups. As a result, the individual profiles are highly reusable and can be applied in different contexts, which, in turn, increases the interoperability of the resulting data. We also demonstrate that it is possible to achieve a level of detail that is sufficiently specific for most applications, even when only general terms are available within existing terminologies, avoiding the need to create highly specific terminologies that would only have limited reusability.

Comments

Comment #208 Mario Moser @ 2025-03-10 03:37

Dear Achim Streit,
dear reviewers,

thank you for your review with valuable comments and suggestions. As authors we will carefully go through them and revise our paper accordingly.

Best regards

Comment #202 Achim Streit @ 2025-02-17 03:58

As the responsible topical editor, I would like to thank the two reviewers for their detailed and constructive feedback. After considering their comments, I advise the authors to strongly revise the paper according to the suggestions provided in the reviews. After completion, the new version of the paper should be uploaded again and will be given again to the reviewers. Thank you.

Invited Review Comment #200 Anonymous @ 2025-02-12 07:48

With minor variations, the paper “Creating application-specific metadata profiles while improving interoperability and consistency of research data for the engineering sciences, Preuß, Nils ; Bodenbenner, Matthias ; Heinrichs, Benedikt ; Windeck, Jürgen ; Moser, Mario ; Fuhrmans, Marc (2023)” is already available at 10.26083/tuprints-00024573.

In the paper „ Creating application-specific metadata profiles while improving interoperability and consistency of research data for the engineering sciences“ the authors introduce a set of modelling techniques for structuring metadata that will be implemented in a metadata infrastructure platform within the AIMS project. Starting with RDF-compliant metadata profiles they present implementation details how inheritance, composition, modularization, and specificity are realized. As an example, they use the description of an experimental consisting of two different sensors. The resulting metadata profiles are published as supplementing material.

Scope:

The presented paper fits into the focus of in.grid to enhance data infrastructure, data governance, and data literacy. It covers a manuscript describing examples for structuring metadata for the engineering sciences.

Quality requirements:

Starting from the title I expected a fair evaluation of existing methods for creating metadata profiles and how they improve interoperability and consistency of research data and recommendations, improvements, and novel findings derived from this analysis. There exist many methods to create structured metadata.

The section “State of the art” concentrates on common standards for mechanical engineering with its broad heterogeneity and many individual, short-living experimental setups. Some industry-related standards are mentioned, but besides many other professional associations like IEEE and VDI, and standardization organizations like NIST and DIN, engage with the problems of research data management. Furthermore, other disciplines fight with similar heterogeneity problems and might have also provided some solutions and ideas.

The authors propose to adopt the concepts from object-orientation: inheritance, composition of modularly designed elements, combination of existing sources, and specificity. Straightforwardly, they choose a design based on SHACL shapes. An open-minded analysis of the state of the art of other technologies and scientific comparison would be beneficial. For example, XML also implemented inheritance and JSON supports combination of various sources.

In chapter 3 the authors start engineering their models. RDF-compliant application profiles have been already intensively investigated and applied. Just to mention some examples: Open Geospatial Consortium (https://docs.ogc.org/per/18-094r1.html) , DCAT Application Profile for data Portals in Europe (https://ec.europa.eu/isa2/solutions/dcat-application-profile-data-portals-europe_en/). I am missing the reasoning why those already existing components cannot be adopted and extended.

In chapter 3 the implementation details of creating metadata profiles, arising conflicts and side conditions, are described in detail. The modelling examples are based on an application scenario with two different sensors. The resulting profiles shown are convincing and are provided as FAIR supplementary data including validation. These profiles represent the only provided results of the paper.

The authors state that their approach documents research data in a flexible and precise way that is also highly interoperable and machine-actionable. It seems obvious, that well-structured metadata improves consistency and interoperability within a system. However, in the paper it remains unclear how and to which extend interoperability and machine-actionability are reached and how the results will be applied.

It remains to be seen how the approach presented by the authors compares to other approaches. A systematic comparison of the properties, including some case studies, would significantly enhance the paper. For a starting point for such a systematic comparison the study of Stian Soiland-Reyes (https://arxiv.org/abs/2306.07436) could help, who uses also as one the recommendations of the EOSC Interoperability Framework (https://data.europa.eu/doi/10.2777/620649) as measures.

University RWTH in Aachen, Germany, operates a research data management platform Coscine that also implements RDF application profiles. Since the infrastructure is already in production it would be very interesting to see how the AIMS approach described in this paper interoperates with the Coscine application profiles on the technical as well on the semantic level. A direct comparison with research data collections on both sides and cross-use of the datasets would enhance the paper’s contributions.

Application profiles itself need to follow the FAIR principles, moreover, usability will be only accomplished if the profiles are openly available, easily discoverable, and adoptable. The AIMS web service providing a graphical user interface is definitely an important component, but requires a sustainable a reliable infrastructure for storing, preservation, and curation. This raise concerns that such application profiles will only be applicable within closed systems where technical interoperability is guaranteed.

All in all, I think the contents of the needs to be clearly placed in the state of the art. The presented work is more the result of an engineering (modelling) effort and needs to be improved significantly by a profound scientific analysis.

Formal requirements:

The formal requirements of in.grid are well met. The paper is well-written and structured. However, it would be advisable to have the paper checked by a native speaker.

Manuscript requirements:

The manuscript consists of 4 chapters:

1 Introduction
This chapter motivates the work, provides a very short summary of the state of the art of research data management in mechanical engineering, and introduces the own approach based on application profiles based on SHACL shapes.

2 Application Scenario
The application scenario describes a small lab experiment focusing on two sensors as a basis for the metadata modelling in the following chapter.

3 Modelling Approach
In this large chapter (13 pages) the engineering and process of the design, potential conflicts, and resulting metadata profiles are described in detail.

4 Summary & Outlook
Here the authors conclude that their solution can be applied also to other disciplines and describe further developments and promise that the designs will also be implemented.

Acknowledgements, roles & contributions, and references accompany the chapters.

Descriptions and analysis of the theoretical background are missing.

The results provided are the profiles of the examples, but the authors don’t mention whether these have been practically used to document real-world data with metadata. Any discussion to which extend the results improve interoperability and consistency of research data and a comparison with other solutions is missing.

Conclusion:

In its current state I would recommend to strongly revise the paper.

The contents of the paper cover the title “Creating application-specific metadata profiles while improving interoperability and consistency of research data for the engineering sciences” only partially. The presented work describes the modelling and engineering effort for application profiles based on SHACL shapes systematically and in a straightforward manner.
The analysis of the literature (and the references) does not represent the state of the art and do not consider alternative solutions. The presented results are some implemented examples of profiles but it seems that they have not been tested in practice to prove that interoperability and reusability are improved. Important components like discussion and conclusion of a scientific paper don’t exist. I therefore recommend a substantial revision of the paper and its contents. I would recommend adapting the title, to provide a comparative analysis of the state of the art, to implement the solutions, and to discuss interoperability and reuse based on the users’ and infrastructure providers’ experiences.

Invited Review Comment #193 Anonymous @ 2024-12-12 13:06

The author presents an approach with SHACL that allows precise and flexible data and metadata representation for research processes and data. This approach looks promising, and the paper would add value to the community if published.

However, I think the paper could benefit from two overall improvements: (1) A more thorough introduction that lists additional related efforts (2) Example data graphs that can validate against the shape graphs

About (1)
Discussing related efforts in the introduction would improve clarity, currently the introduction is confusing. In lines 25-27, the paper states that no widely used schemata/terminologies are used in engineering. Then, in lines 29-35, the paper states several incompatible and partially redundant vocabularies and ontologies exist. The paper mentions eCl@ss, OPC UA, and DEXPI, but these are a subset of all efforts across all disciplines of engineering. While the paper’s title broadly mentions engineering science, line 43 mentions mechanical engineering. While the statement in lines 25-27 regarding a lack of widely used semantic technologies might be true in one engineering discipline, it might not be accurate in every engineering discipline, where the list on Wikipedia is quite expansive: https://en.wikipedia.org/wiki/List_of_engineering_branches. The authors might want to avoid making such broad generalizations. The three examples (eCl@ss, OPC UA, and DEXPI) likely operate among similar/competing approaches for each. For example, OPC UA provides infrastructure for interoperability between machines and the broader enterprise. Similar approaches (perhaps popular in different disciplines and use cases) include MTConnect, SiLA, MQTT, etc.

If this paper emphasizes mechanical engineering beyond a single example, there is a growing effort to unify the materials and manufacturing design lifecycles. This is especially true within additive manufacturing, where the materials processing history and manufacturing processing history are inherently coupled. Within this specific sub-discipline, there appear to be multiple efforts at improving FAIR maturity, e.g.: - https://www.dodmantech.mil/Manufacturing-Collaborations/Joint-Additive-Manufacturing-Working-Group/ - https://profiles.opcfoundation.org/workinggroup/60 - https://doi.org/10.1007/s40192-024-00341-x

For research data more broadly, several efforts within the Research Data Alliance (RDA) may be relevant to this work. I think it would help to discuss the work presented in the paper within the context of related work in the RDA, e.g.: - https://www.rd-alliance.org/groups/interoperable-descriptions-observable-property-terminology-wg-i-adopt-wg/ - https://i-adopt.github.io/ - https://www.rd-alliance.org/groups/fair-mappings-wg/ - https://www.rd-alliance.org/groups/fair-principles-research-hardware/ - https://www.rd-alliance.org/groups/persistent-identification-instruments-wg/ - https://www.rd-alliance.org/groups/research-data-management-engineering-ig/ In summary, many efforts (with varying specificity/generality) are being made to support improved FAIR maturity within engineering, and it would help if the introduction better explained how this work fits among these efforts.

About (2)
Providing example data for improved impact Considering likely readers of this paper as a Venn diagram:
- There are many individuals with expertise in materials and manufacturing (for example)
- There are many individuals with expertise in semantic technology
- There are few individuals with expertise in both

People with only materials and manufacturing experience will not easily understand this paper because it lacks concrete examples of actual data collected from the experiment in Figure 1. Furthermore, individuals without expertise in material and manufacturing will not appreciate the real-world complexity of disciplines like materials and manufacturing, as the example shown in Figure 1 is quite simple.

One good way to broaden the paper's impact is to provide data examples of varying complexity. I.e., just like the SHACL Playground (https://shacl.org/playground/) offers a side-by-side few of the Shapes Graph and a Data Graph, this paper could benefit from data examples that can validate against the SHACL discussed in the paper. If space is an issue, these could be shared online (e.g., GitHub, supplementary information) and linked from the paper. This paper poses a simple example in Figure 1 and defines several SHACL profiles to support it. At a minimum, the paper should provide example records that validate against the SHACL profiles and show a reduced set of varying independent and dependent variables for the experiment shown in Figure 1.

Furthermore, a more complex example should be provided online or supplementary information. The example in Figure 1 appears to have only scalar values for independent and dependent variables (e.g., pressure, temperature, rotation speed, volume flow, etc.) However, in the real world, researchers often deal with much more complex datasets, and it could be very difficult for the practitioner to extrapolate simple examples to more complex datasets (e.g., https://doi.org/10.1038/s41598-022-18096-w).

Downloads

Download Preprint

Metadata

Published: 2024-10-14
Last Updated: 2024-11-25
License: Creative Commons Attribution 4.0
Subjects: Data Infrastructure, Data Management Software
Keywords: RDF, SHACL, application profiles, metadata, FAIR data, data modeling, mechanical engineering

Versions

All Preprints