Comments
Invited Review Comment #223 Anonymous @ 2025-12-01 21:42
Pragmatic Research Data Management for
Heterogenous Sensor Data
The paper presents a well developed and implemented toolchain for research data management. The workflow is well validated by ongoing research in a research institute for engineering and production processes over several years. The combined expertise of all authors and the discussions are acknowledged.
Please allow to list the following remarks in general and regarding lines in the manuscript (LXX).
Title/general:
- The term ‘Pragmatic’ is never used in the paper, but is expresses the community driven work style to implement requirements that are externally defined.
- It is uncommon to write in the we-style in research articles, but is emphasis that the paper expresses more personal/institutional experiences and lessons leant.
- subtitle ‘Bridging the Gap Between Best and Current Practices’ might be misleading, as best practice (German: bewährte Vorgehensweise) implies that methods are implemented and long-term proven.
- Challenges regarding heterogeneous data are only slightly addressed with the demonstrator, timeseries of temperature values and coordinates are not really heterogeneous data, but it communicates the work
- several expressions in the manuscript seems very vague, a LLM-like-style, so it is suggested to review the text regarding repetitive and unnecessarily complex expressions
- the typical structure of a research paper in engineering science has not been fully applied, so that e.g. the experiment is introduced from the beginning
Abstract:
- Almost all sentences start with ‘We’ or ‘This paper/article’
- Strong focus on incentives for researchers identified, but are not addressed (but highlighted) -> Conclusion: “[…] the realization is oftentimes lacking due to missing incentives […]”
- Typo “double space”
- While regarding all the hard work, that is combined into this paper, the rating “Landmark Study” should be awarded from outside the group of authors.
- The abstract would benefit a revision, where success proclamations are reduced, and the real valuable contributions are summarized after typical abstract content.
L02: LLM-like-style (crutial to facilitate), First sentences proclamation is not surrounded by the context.
L14: Missing space
L18: LLM-like-style, the incentive for a single researcher is not clear, e.g. building his/her career.
L25: Might be beneficial to distinct between the stakeholder that provide user-friendly tools (emphasizing of course the contribution of the authors)
L43: The description of the experiment “Virtual Climatization” for the study is spread over the whole paper, it might be introduced in detail here (experimental setup) and the phase specific details in each subsection.
Figure 1: The caption of the figure claims points that are not visualized and not derived from the drawing.
L57: Typo Space. The limits of the reuse capability of the data needs to be addressed, so that a fundamental change in the system or setup influences the results that the data express. It raises the question how detailed the experiment needs to be described and if this is possible (plans, schematics, parameter) vs. NDA and IP.
L67: It there a distinction between Step and Phase? It seems to be the same in this context.
L70: It is vague to understand how general or domain specific the selection of tools and questionnaire is, because of “your research activity” is not explicitly formulated or stated
L74: Is really the decision for the RDMO and the NDFI4Ing questionnaire because of the funding by DFG of Virtual Climatization? No requirement comparison?
L96: Would be good to name what mechanisms are helpful for researchers, like collaboration (L100)
L75: This section has a strong LLM-like-style an seems cumbersomely written and exaggerating.
L139: Introduction text for step 2 need revision to describe the phase properly. Referenced results (toolchain etc.) might be cited.
L143: In addition a part of ‘Best Practice’ for Production should be quality control, (reasonable). Data acquisition need to meet intrinsic data quality goals, so that the hardware is good enough etc. Calibration is only addressed in 4.2. Checks might be needed for robustness against cross-sensitivities.
L152: LLM-like-style expression
L181: Missing "."
L189 : Coscine seems a central tool for the paper. It is named here the first time, but it was not introduced so far. So that its functionality, which might be still under development and expanding is unclear.
L215 / Figure 3 : Very good idea to link the Steps to the hardware/software architecture. The Step 1 might be added as well defining the hardware setup and the metadata, etc. The colors are hard to identify, why is the MQTT Broker in dark grey and not in the color of the Production step?
L240: Before (any kind of) analysis its data, not information, also regarding the following context. Data without analysis seems philosophically meaningless.
Figure 4: For Example: The Coordinates are floats in the UML Diagram and not the introduced SOIL-Element. The upcoming problem is that Timestamp is part of the SOIL-Element and it is part of the Machine Position Class at the same time. The concepts might not be combinded to that extend.
L274: How is it reflected into the Research Data Management Plan that 924 standstill points has been captured? It is definitely a relevant information, for all steps of the DLC, but not foreseeable, so that the amount of data can be calculated, etc.? Are the coordinates x,y,z in meters or steps of the stepper motor? Are such questions (automatically) answered by the system/framework?
L314: LLM-like-style might be reviewed
It might be relevant to look also how sustainable a repository (CO2, etc.) is and how they are funded.
L364: “must must”
The automatic release of data that has been placed under a restriction notice might be discussed.
L397/L432/L465: Data is not affected by Copyright law, so there is no question about licensing data itself! A database and structured collections of data can be.
L410: What was the reason to select/proposing Coscine when it is not supporting ‘open access’?
L414: What framework is RWTH Publications based on? Is it open source?
Figure 6: The ing.grid preprint framework (janeway.systems) might be introduced, because the logo is in the figure
L468ff.: For data there is not restriction to licensing or conditions. It might be good scientific practices to give attribution, but not from a legal standpoint.
L484f.: Might be reformulated.
L502ff.: Might be interesting to quantify the benefits of the RDM in terms of saved PMs.
L511: The introduction to the Discussion section need rework.
L522: Why is Coscine doing cross-checks across experiments?
L530ff.: What do you mean with “To access this paper’s full potential …”? Sentences need rework.
L535: The Step ‘Access’ has been named ‘Publication’, here. Might be more intuitive for the whole paper.
L536: There is a little conflict to the “Planning” section in L78 and Figure 1: “On top of supporting RDM, DMPs directly facilitate the execution of the research.” vs. “It [DLC] focuses primarily on handling data […] rather than guiding an entire project from start to finish.”
L539: The discussion highlights an important point about the RDMO questionnaire of NFDI4Ing.
L551: The data steward will be the same as a librarian for an institute of a patent attorney for an institute. Works at the initial phase. To scale the processes up to an international level the whole workflow including the data request process will be covered by external stakeholders and will be highly automated. Unfortunately, but for sure, someone will pay a lot of money to someone to access the (own) data. To make it complicated and introduce manual labor might not help long-term.
L570: It seems open how an Influx Time Series Database can be made publicly available and ensured that it is funded for ten years, as most as-a-service infrastructure is paid by traffic and cpu. There might be some words on how Zenodo is EU funded and how it can be claimed that the service will be available for ten more years. Is it a too big to fail? Zenodo is introducing more and more access limits (-> New rate-limits to Search API, 25.11.2025). How does that effect FAIR principles?
References
è There are many spelling mistakes in the “References” section including capitalization. In the past, the ISBN and ISSN numbers were not relevant for references.
Thank you very much for this valuable contribution. I would like to see this revised and published. It contains interesting ideas and best practices worth spreading so that it could be valuable for the research community, but it needs an additional iteration in my view.