Skip to main content


Envisioning and proposing Data Mesh for Research Data Management in the Engineering Sciences

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Authors

Mario Moser, Tobias Hamann, Anas Abdelrazeq, Robert Schmitt

Abstract

In Research Data Management (RDM), data publishing infrastructures play a crucial role for efficient data provisioning and reusage. Data repositories (generic or discipline-specific) serve for this. Nevertheless, they focus rather on technical aspects without including sociological elements; they struggle to cover the heterogeneous nature of research data (formats, sources); and they are typically centralised, leading to increased complexity in operation and maintenance. In industrial data management, the Data Mesh concept as a decentralised and socio-technical approach has been introduced. Data is handled as products for increased usability, ownership is shifted to the respective domains experts, and a federated governance achieves standardisation while allowing discipline-specific decisions. Based on literature review, the distributed characteristics and further requirements of (engineering) research are mapped with the Data Mesh concept. In this envisioning, Data Mesh and its design principles overall appear appropriate as research data publishing infrastructure. A high level architecture is presented leveraging existing RDM components. Although, as differences in details become apparent, items for further adaptions of Data Mesh for RDM are pointed out.

Comments

Invited Review Comment #217 Jane Wyngaard @ 2025-08-28 22:41

The whole paper needs an edit by someone who's 1st language English.  There are many grammatical errors and many paragraphs of text saying the same thing as elsewhere in buzz word filled repetitive sentences.

Chapters 1-4 are particularly bad.

Ch1:
States that the paper concludes that RDM while brownfield could and should use an industrial data mesh concept but fails to clearly articulate what that is.

Ch2:

There's a good section talking through the past with data warehouses and lakes and ecosystems in real terms, but then the authors return to buzz words and fail to make clear the distinction between for instance data ecosystems and data meshes.

There's an interesting idea being explored but the context and concept are not described clearly.  How exactly a data mesh approach would differ from others and even how it would practically bring value is not made clear. All the discussed value and outcomes sound great but it's not at all clear how it would be possible to achieve.  While it's appropriate for this section to not yet be discussing very low details (which finally come in Ch5) the chapter lacks a structure and wording that would indicate to the reader that these details are coming, it fails to even hint at the how at even a extremely high level which would give the reader a preview of where the paper is going and help a reader start thinking about this idea and these terms in the right frame of reference.

Ch4 is supposed to be methodology but doesn't describe a methodology and simply recaps again the previous sections of the paper

Ch5: Finally expands on the definition and how of the idea of a data mesh, putting it in terms that make the concept tractable

They emphasise the scope of gain is primarily to increase findability and allow the domain/source/host/expert closest to the data to define the schema, tooling, metadata, and even analysis and product generation rather than alternatives such as trying to enforce a global universal standard.  The authors go into a helpful level of detail on how this would lead to all the gains originally promised in the introduction.

At various points in the paper reference is made to how the data mesh concept is not new and has been explored elsewhere but these other explorations are never described  - that should be added to give this paper more depth and value if it is to become the current reference for data mesh application to research and specifically engineering research data management.

I would like to see this revised and published, it holds interesting ideas with potential value but needs to be restructured for that value to be realised.






Downloads

Download Preprint

Metadata

  • Published: 2025-04-28
  • Last Updated: 2025-04-28
  • License: Creative Commons Attribution 4.0
  • Subjects: Data Infrastructure
  • Keywords: Data Mesh, Research Data Management (RDM), Engineering Data, Engineering Sciences, Decentralised Data Architecture, Data Infrastructure, Data Publishing, Data Reuse
All Preprints