Skip to main content


Challenges in publishing research data – a Fraunhofer Case Study

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Authors

Andrea Wuchner, Michèle Robrecht, Pierre Kehl, Robert Heinrich Schmitt

Abstract

Sharing of research data is becoming more and more established as part of the scientific process, triggered by corresponding requirements of research funders. A large number of subject-specific and institutional research data repositories were created as publication agents for the research data. Nevertheless, the publication processes are not yet established and need to find a way to best practice. The aim of this paper is to work out what challenges currently exist in the publication of research data and how these can be overcome. Answers to the research question are provided by a case study of research data publication with the participation of industrial partners in the institutional research data repository of the Fraunhofer-Gesellschaft ”Fordatis”. The publication process is described from the perspective of the researcher, the data curator and the repository operator. In summary, the challenges can be overcome primarily by the division of labor and communication.

Comments

Comment #194 Andrea Rapp @ 2024-12-16 16:51

The article deals with an important topic that has not yet been treated in a comparable way in the literature. However, the reviews provide important and very specific references to materials and literature that need to be incorporated in order for the article to reach its full potential. I therefore recommend a thorough revision based on these references.

Invited Review Comment #189 Anonymous @ 2024-11-19 10:08

The article titled „Challenges in publishing research data – a Fraunhofer Case Study“ encompasses a data publication case study using the institutional research data repository of the Fraunhofer-Gesellschaft ”Fordatis”. It provides an in-depth view on the challenges regarding communication and comprehension of the FAIR data principles between the roles of a researcher, a data curator and the repository operator.

The article covers a very important topic and one of the fewer case studies on RDM handling from three different POVs (researcher, data curator and repository operator) and is therefore highly relevant for publication. However, throughout the text, there is a strong mix between very generalized statements on FAIR and recommendations to fulfill the principles which are not tailored to engineering sciences, and more use case specific aspects (e.g. confidentiality of part of the data leading to a 3 year embargo). This I would like the authors to check and change throughout the article to focus more on the actual use case and data publication challenges described.

What I am also missing is placing the use case in the context of NFDI in general and NFDI4ING in particular, as at least one co-author is directly associated to the consortium.

 

In the following, several examples of generalization and recommendations for changes are given:

As a first remark, the articles starts with a really broad definition regarding scope and openness: Research data as defined in the article also includes “programms, AV material, databases (…)”, and openness is gereralized in a very common way. Here I would strongly recommend narrowing the focus of the article also in the introduction, as the case study entails factual data. In addition, legal aspects (in Germany) strongly differ between different research artefacts, e.g. software vs. databases vs. factual data. Again, I would recommend to focus data type included in the use case and include a more specific (engineering sciences?!) than generic approach. 

Second, because the use case also encompasses a research cooperation with industrial partners, a more detailed information on the regulations (e.g. non-disclosure agreement in place?) would help the reader to understand the preconditions, which have to be considered before data processing and publication takes place. Currently, there is no information e.g. if only part of the data is agreed for publication, the level of detail on the metadata agreed upon, the licence used for data publication and other aspects which are relevant to meet the FAIR criteria. Because FAIRness of a data publication also strongly depends on the scientific discipline where the use case is located, I would recommend giving more discipline-specific examples.

Third, please consider revising generic statements such as given from lines 98-100: “Since the capture in softwares for data management is not yet mature, open and widely used data formats such as TXT or CSV are recommended for the documentation of metadata, if possible". Such general statements might only be true for the respective use case, because of the discipline-specific views on data and metadata management. In disciplines like applied chemistry for example, metadata documentation usually follows agreed community standards and formats, and is even partly or completely automatized (e.g. using ELNs such as Chemotion). I would recommend rephrasing and outlining how metadata caption was carried out in the specific use case.

Regarding the description of data publication at Fraunhofer Fordatis, I recommend to give concrete details on how FAIR data principles are met, e.g. in the form of a table or similar. Just the provision of a DOI as stated in lines 190-191 of course greatly aids in the F(findability) of the data set, but this is only a part of the FAIRness. For example, information on metadata standards used, the provision of an open? API, licences offered (which ones?), and level of semantification/data interoperability and machine-actionability are also significant questions when in comes to the evaluation of FAIRness of a data set/repository provider.

From line 359, licencing is discussed. A statement such as “Since there are few generally applicable recommendations for licensing research data, the possibilities had to be explored and assessed by the data curator.” Should also be revised as there are in fact a lot of recommendations available, some examples given below:

https://www.openaire.eu/how-do-i-license-my-research-data
https://radar.products.fiz-karlsruhe.de/en/radarfeatures/lizenzen-fuer-forschungsdaten
https://rdm.elixir-belgium.org/data_licences
https://www.springernature.com/de/authors/research-data-policy/selecting-a-license
(only few examples, lots more available!)

Line 365: Please consider the M4I metadata standard by NFDI4ING to be included here: https://terminology.tib.eu/ts/ontologies/m4i

Also please consider revising the statement in line 418, where it is stated that a DOI “helps to ensure that most of the FAIR principles are met.” This is essentially not true, as a DOI only ensures part pf the FAIRness of the metadata, but not the fulfillment of FAIR for the data set itself.  

 

 

Invited Review Comment #177 Anonymous @ 2024-10-16 16:56

The article investigates the experience of submitting a set of research data to an institutional research data repository from three different perspectives. Findings on this topic are of high relevance for all parties involved in publishing research data, especially for infrastructure providers who want to make their services easy to use and for researchers who want to establish recommendations and processes that avoid common pitfalls.

Formal Requirements:

Authorship

I couldn’t find a classification of the authors’ contributions according to the requirement “A clear statement of contributions is provided, for example using the CRediT – Contributor Roles Taxonomy.”

Language, Units and Abbreviations

I noticed a few things that could be improved:

-       In the description of the dataflow (Fig. 2 and Section 2.4), it is not easy to understand whether this describes actions within DSpace.

-       The meaning of the sentence in line 80f "Finally, if necessary, an export control must be carried out in order to exclude the possibility of dual use" is not clear to me and should be explained in more detail.

-       Line 250: "lays" should likely be replaced with "lies"

 

The remaining formal requirements are met.

 

Quality Requirements:

The novelty and originality of the content presented in each submission is clear.

I am not aware of another article on this topic, and the authors describe their own findings.

 

The work is clearly placed in the state of the art with reference to relevant literature

  1. I miss a presentation of the state of the art in the sense of established best practices and recommendations beyond the status-quo at Fraunhofer.
  2. The experiences described depend very much on local specificities, e.g. the contract with the industrial partner or the (sub-optimal) implementation of a research data repository that requires printing out a paper copy for signing that can be accidentally skipped / missed. This reduces the findings’ value for readers outside of Fraunhofer significantly.
  3. The conclusions do not address which points need to be improved to avoid the negative parts of the experiences described. For example, how the contract with the industry partner should have looked like to allow hassle-free publication of the data.
  4. The article assumes that steps like preparation of the data and collection of metadata are only performed retrospectively during the step "giving access" (Fig. 1). While this is often the case, it is not what is recommended and should be presented as a problem rather then some kind of rule or fact.
  5. The article fails to mention that there are tools available that assist in picking appropriate licenses (e.g. https://choosealicense.com/) that would have helped the data curator and / or scientist.
  6. While the article is based on the FAIR principles, it does not address that the metadata captured in typical institutional repositories is limited to very general attributes like author, license or publication date. Information on the content and provenance (information on with what setup and with which parameters the data was obtained) of the data is at best recorded as a free text entry in a description field or even as separate "Readme" file within the data set, which is not interoperable or machine actionable at all and doesn't improve the data findability and reusability.

 

The presented work is the result of substantial scholarly effort.

As a case study using a single example, I am uncertain whether the article passes this criterium. This is amplified by the fact that Section 5 is more of a summary and does not include systematic recommendations based on the findings.

 

FAIR Principles: The submission aligns with the FAIR principles and meets domain-relevant community standards of FAIR data. The minimal requirements for adherence to FAIR principles for ing.grid are listed below in the relevant sections.

The article satisfies this requirement.

 

Recommendation

I was drawn between rejection and revision. The article requires a more thorough description of the state of the art and present findings that are more general and applicable to a broad audience. While I consider the research question relevant and encourage the authors to pursue the topic further, I believe that the amount of additional work necessary would best be handled as submission of a new article.

Downloads

Download Preprint

Metadata

  • Published: 2023-08-14
  • Last Updated: 2023-07-14
  • License: Creative Commons Attribution 4.0
  • Subjects: Data Infrastructure
  • Keywords: Research data repositories, Publishing Process, Data Curator, Open Data, FAIR Principles, Task Area Frank
All Preprints