Data-Producing Methods in CRC 985: Recommendations for Research Data Management in Large Interdisciplinary Projects

This is a Preprint and has not been peer reviewed. A published version of this Preprint is available on ing.grid . This is version 5 of this Preprint.

Authors

Sonja Herres-Pawlis , Nicole Andrea Parks , Konstantin W. Kröckert, Fabian Claßen, Walter Richtering, Matthias Müller

Abstract

Within CRC 985, groups from a variety of chemical institutes, chemical engineering, physics, and the life sciences collaborate on research on microgel systems. Over a funding period of nearly 12 years, the CRC has produced numerous publications, which are associated with a large amount of underlying research data arising from many different methods. To gain a deeper understanding of this research data, the CRC 985 INF project has carried out a survey and thus gathered information on the data producing methods involved. Based on this information, recommendations for data exchange formats, data publication and archival for the current project are made. Furthermore, we propose solutions, especially for data organization and documentation, for similar, interdisciplinary projects.

Comments

Comment #210 Bernd Flemisch @ 2025-03-20 14:39

As the responsible topical editor, I would like to thank the authors for revising their paper and the two reviewers for their second review.

My final decision is: acceptance of the paper after incorporating the minor suggestions for the revised version.

I would like to request the authors to upload the final version as a compressed LaTeX folder as described here: https://www.inggrid.org/site/editorguidelines/

Invited Review Comment #205 Anonymous @ 2025-02-28 18:59

I thank the authors for their careful revision that has adequately addressed my comments. In my opinion, the paper is now ready for publication.

Invited Review Comment #186 Hamza Oukili @ 2024-11-05 00:22

Thank you for revising the manuscript; the updates have addressed my comments thoroughly. Below are a few final suggestions focused mainly on reference formatting and minor corrections:

1. References: Several references need minor adjustments for consistency and completeness:

- [5] : Add quotation marks around the title and, if possible, a hyperlink.

- [8], [21], [50] : Confirm "visited on" dates are consistent across the manuscript, especially in [8]: L652, [21]: L688, and [50]: L753.

- [12] : Correct "University" spelling and, if the title is in German, provide both the original title and an English translation in brackets. Including a link would also improve accessibility.

- [13] : Standardize capitalization in the title to "an Open Source."

- [25] : Add a hyperlink to enhance accessibility.

- [50] : Ensure this citation is complete.

- [59] : The link appears broken; please update or remove.

2. Formatting :

- For the keywords section , consider either using "CRC985" consistently or removing "SFB 985" entirely, as this isn’t referenced in the article text.

- L212: Table caption and adjust "Case 3" to "Case 2".

With these small adjustments, the manuscript will be ready for publication. I recommend acceptance with minor revisions.

Best regards,

Comment #175 Sonja Herres-Pawlis @ 2024-10-08 16:51

Reviewer comments are prepended with "C:", answers with ""A:".

Dear Prof. Flemisch, dear Reviewers,

Thank you for your constructive and thorough review.

In addition to the specific comments to the reviews below, some additional typos and spelling errors have been corrected while some corrections have been made for additional clarity.

Reviewer: Invited Review Comment #127 Hamza Oukili
General assessment:
C: This paper offers an overview of the research data management (RDM) practices and recommendations for CRC985, which focuses on microgels. The topic is relevant for implementing RDM in similar research projects. I recommend a few minor revisions before final publication.

A: Thank you for your kind assessment.

Revisions:

C: Abstract: The abstract is too broad and lacks specificity. The first half should be shortened since it overly focuses on CRC985. It would be better to focus more on the main topic of the paper and what the journal is about. For example, consider elaborating more on the survey and the proposed solution.
A: In combination with the recommendations from the other reviewer, we have removed mention of the CRC and INF project from the abstract and made the first part more general. We have also included more specifics in the second part.

C: The questionnaire referenced in the paper is currently not accessible (the link provided does not work). It would be beneficial to include a discussion of the questionnaire in the paper, perhaps by providing some examples of the questions. Additionally, explaining how the questions were selected would add valuable context.
A: Yes, unfortunately, we had no place to add the reviewer link–although, in hindsight, a comment with the link would have probably been a good option. After acceptance, we have published the data which is now available using the provided DOI.
A: We have provided some detail on the questions and their relevance within the methodology section for the original version as well as the revised versions.

C: Figure 4 takes up an entire page. If it doesn’t provide additional important information beyond what is shown in Figure 5, consider moving it to the supplementary material
A: We believe this figure underline not only the breadth of methods, but also the variety of subject areas indicated by the individual institues, and thus enhances the manuscript.

C: Subsection 3.3.1 is too long and text-heavy compared to the other subsections. You can make it easier to read by splitting it into two subsections: one for Data Governance and another for Documentation.
A: Agreed, we did actually struggle here and really appreciate the idea. We have implemented it your suggestion.

C: Table 1: In the first row, for 'Data Exchange Format or File Extension,' you might want to specify that these are 'recommended' by FAIRsharing, NFDI4Chem, and the Chemotion Repository. Additionally, listing one file extension per row takes up too much space, causing the table to span four pages with excessive white space. Consider placing multiple extensions on each row to save space.
A: Yes, agreed, that helps, it is more compact now.

C: Table 2: Similarly, for Table 2. It spans 4 pages. You could combine the first and second columns by placing the type of repository after its name.
A: We have implemented your suggestion, which made it a bit more compact.

Small Issues
C: Table 1 (end of page 14): JCAMP-DX is listed in the method column, but it seems like it should be in one of the other two columns. Please verify.
A: Formatting problem, fixed
C: L226: Replace "analogues" with the adjective "analogous."
A: Fixed
C: L273: Change "to chose from" to "to choose from."
A: Fixed
C: L336: "The sample management system was also optimized, leading to more information when storing data." Could you elaborate on how the optimization improved the system?
A: We've added some further information.
C: L450: Replace “as such as” by “as a” in “Microsoft SharePoint serves as such as collaboration platform".
A: Fixed
C: L454: Use "prior" instead of "a prior."
A: Fixed
C: L466: "The the" appears twice; remove one.
A: Fixed

Reviewer: Anonymous

C: The paper reports on a survey conducted over a long period of time within a Collaborative Research Center (CRC) with respect to collected data formats and methods used for data collection. Based on the results of the survey, the main contribution of this paper is a set of recommendations concerning data exchange formats, data publication, and data archival for large collaborative research projects in the interdisciplinary intersection of chemistry, chemical engineering, physics, and life sciences.
In the following, I summarize both strong points and main opportunities for improvement. I then provide detailed comments (roughly in the order of appearance in the manuscript).
== Strong points ==
S1) The manuscript is well-structured and easy to read. It follows all necessary guidelines.
S2) The paper offers a detailed view of the state of the art in data formats and collection methods. From this, the problems to be surmounted in the context of the project the paper describes (a project INF of a CRC) become evident.
S3) The discussion of selected possible solutions is extensive and clear and may provide some guidance for the future.
A: Thank you very much.

== Opportunities for improvement ==

C: O1) I must admit that I found the information provided in the paper not very compelling as it mainly focuses on survey results on the one hand side and possible solutions to address (quite generic) problems on the other hand. I encourage the authors to better connect the dots by discussing why some solutions make sense to address specific problems encountered in the CRC while others do not (e.g., what are selection criteria? Desiderata? What are the pros and cons of alternatives? …). I believe this would greatly improve the paper’s impact on future and similar initiatives and allow for a better assessment of whether the recommendations apply to another project’s setting.
A: We've added points in the Sections "Data Publication and Archival" as well as "Recommendations for Future CRCs and INF Projects" (renamed from "Possible requirements or Future CRCs and INF Projects") on what worked within this project and what areas INF projects can benefit from focusing on.

C: O2) I would also appreciate reports on test scenarios or sample deployments of the recommendations, discussing the results / experience / lessons learned. The paper sometimes indicates that such cases exist (to be expected for a 12-year funded project) but limits the discussion to generic statements (works well). For a scientific publication, I expect a more detailed discussion to provide some evidence that the recommendations are reasonable.
A: Agreed, unfortunately, we lack reports from users at this point and therefore the focus here is on recommendations. One of the greatest lessons learned was a difficulty for an external project to implemenent RDM solutions that must be made on a research group level. We have added information on the acceptance of data publication repositories.
A: We have added practical use cases into the discussion.

== Detailed comments ==

C: D1) Abstract: While I understand that the Journal primarily targets a German audience, I suggest the abstract to be written in a more generally understandable way, i.e., avoiding references to CRC, INF.
A: The abstract has been adjusted accordingly.

C: D2) p2, last paragraph: Some reference to validation use cases or specific statements on how NFDI consortia may benefit would be more convincing and should be discussed in the paper (relates back to O2).

C: D3) p3., l. 55: Please provide a link or reference to the surveys / questionnaires when you first mention them.
A: We have added reference to dataset.

C: D4) Given the long funding period of the CRC, including the INF project, it is somewhat surprising that the issue of FAIR science has only been addressed quite late (survey between 2021 and 2023). Can you provide some context on why this was considered in the final stage of the CRC? What were previous hurdles? What was the focus of INF before? …
A: Added more information on the previous focus of the INF prior to the 3rd funding period in Section 1

C: D5) Section 3.1: This section does not add much more compared to the introductory text. It mainly reports what has been done. I would be interested in details on how things were implemented and why specific approaches were chosen. Just as an example, when referring to a living document, I wonder how it is managed. Does it keep track of versions (last sentence suggests but it is barely understandable based on poor English and the link was not working when I tested it) ? Can it faithfully associate answers to versions? How many versions or iterations were needed? How do you systematically keep track of verbal exchanges? Which “user group” (researchers) was targeted (PhDs, PIs)?
A: The dataset has been published now. We apologize, we should have shared the reviewer link in the comments.
A: Grammar in this sentence has been fixed.
A: Agreed, this is quite redundant. We have added more details on the acquisition of participants, as well as the versioning within the methodology section. In addition, in Section 3.1, we have added a short discussion which acknowledges possible shortcomings in the results.
A: We have ensured the version is included in the naming convention of each completed questionnaire and noted this at the end of Section 3.1

C: D6) p.8, ;. 173: Can you please elaborate on how the recommendations fit the data lifecycle?
A: We've added points here to clarify this.

C: D7) p.9: The reported survey results give little idea of the status quo with respect to data organization and documentation. The recommendations are mostly generic statements, and they may sound like common sense without providing further context. In general, knowing more about the practical, domain-specific aspects that require support for FAIRness beyond the survey results would give readers a clearer justification of why the recommendations both make sense and are necessary to be “spelled out” explicitly.
A: We've added more context. This greatly was in response to uncertainties reported, many times anectdotal, when participants were asked about formats, data workflows and storgae.

C: D8) p.10, l. 215ff: Again, the scope of the statement is not clear. To what extent are the more systematic platforms already used within the CRC? What is the status quo in the project? What problems persist for which recommendations can then be developed (on some ground). This also extends to l. 229, where you state: “In cases where ELNs are not quite suited”. I wonder how often this is the case? What characterizes those cases so I can easily identify them and then resort to other (suggested) solutions?
A: We have added which ELNs were used in the CRC.
A: We have added specific areas where ELNs are not neccesarily suitable.

C: D9) I am unsure how to read and interpret Table 1. In the majority of methods, there is either no recommendation or there are multiple, where it is unclear how they differ / when which should be used. It seems as if the guideline is useful only for a few select categories in terms of actionable information. I also noted that the last method is recommended (bold font), maybe this is a table formatting problem?
A: There was indeed a formatting error, this has been fixed.
A: The table is largely for reference for researchers, but also for infrastructure providers by indicating gaps in the availability of data exchange formats.

C: D10) I found Section 3.3.3. very generic. Statements like “other systems can provide the necessary solution”, “exchange formats […] could assist in collaborations”, “This study thus also directly contributed to improved data management” ar not very informative. Give examples, report on actual use cases from the CRC, say how it improved what part of management, …
A: We have added two representative cases: IR and SRFM. We have hopefully improved upon this with the use case examples.

C: D11) p.16, l. 326ff: This would be the perfect opportunity to share a few more details on specific use cases.
A: We hope the addition of the use cases improves this section.

C: D12) I suggest clearly defining the terms “shared”, “central”, “project”, “local system” for readers to better appreciate the difference between several options.
A: We've rewritten this paragraph for more clarity.

C: D13) p.16: While guidelines are a first important step to support collaboration, what about the technical implementation? Can you elaborate on APIs or aspects of the technical integration?
A: We've added specifics, including an example, as well as some of the resources required.

C: D14) Concerning the results summarized in Table 2, I am wondering how you came up with this final selection. What are the selection criteria? How were candidates selected? …
A: We have added a few words on this. However, we must admit that we did not have clear criteria and greatly relied on matching information from NFDI4Chem

All authors agreed to these changes. We hope that we now meet your expectations.

On behalf of all authors,
Yours sincerely,

Sonja Herres-Pawlis

Comment #128 Bernd Flemisch @ 2024-08-09 09:43

As the responsible topical editor, I would like to thank the two reviewers for their detailed and constructive feedback. After consideration of the comments, I advise the authors to revise the paper according to the suggestions provided in the reviews. After completion, the revised version of the paper should be uploaded and answers to the reviews should be given in the comments. Thank you.

Invited Review Comment #127 Hamza Oukili @ 2024-08-09 06:43

General assessment:

This paper offers an overview of the research data management (RDM) practices and recommendations for CRC985, which focuses on microgels. The topic is relevant for implementing RDM in similar research projects. I recommend a few minor revisions before final publication.

Revisions:

- Abstract: The abstract is too broad and lacks specificity. The first half should be shortened since it overly focuses on CRC985. It would be better to focus more on the main topic of the paper and what the journal is about. For example, consider elaborating more on the survey and the proposed solution.

- The questionnaire referenced in the paper is currently not accessible (the link provided does not work). It would be beneficial to include a discussion of the questionnaire in the paper, perhaps by providing some examples of the questions. Additionally, explaining how the questions were selected would add valuable context.

- Figure 4 takes up an entire page. If it doesn’t provide additional important information beyond what is shown in Figure 5, consider moving it to the supplementary material

- Subsection 3.3.1 is too long and text-heavy compared to the other subsections. You can make it easier to read by splitting it into two subsections: one for Data Governance and another for Documentation.

- Table 1: In the first row, for 'Data Exchange Format or File Extension,' you might want to specify that these are 'recommended' by FAIRsharing, NFDI4Chem, and the Chemotion Repository. Additionally, listing one file extension per row takes up too much space, causing the table to span four pages with excessive white space. Consider placing multiple extensions on each row to save space.

- Table 2: Similarly, for Table 2. It spans 4 pages. You could combine the first and second columns by placing the type of repository after its name.

Small issues:

- Table 1 (end of page 14): JCAMP-DX is listed in the method column, but it seems like it should be in one of the other two columns. Please verify.

- L226: Replace "analogues" with the adjective "analogous."

- L273: Change "to chose from" to "to choose from."

- L336: "The sample management system was also optimized, leading to more information when storing data." Could you elaborate on how the optimization improved the system?

- L450: Replace “as such as” by “as a” in “Microsoft SharePoint serves as such as collaboration platform".

- L454: Use "prior" instead of "a prior."

- L466: "The the" appears twice; remove one.

References:

- [5]: Add quotation marks around the title and include a hyperlink if possible.

- [11]: The DOI link is not working; please update it.

- [12]: Correct the spelling of "University," and consider adding a link to the article. If the title is in German, include the original title and the English translation in brackets.

- [13]: Correct the capitalization in the title to "an Open Source" to match the article.

- [24]: Add a hyperlink to the reference.

- [47]: Complete the citation.

- [56]: The link is no longer valid; please update or remove it.

Invited Review Comment #118 Anonymous @ 2024-07-01 12:26

The paper reports on a survey conducted over a long period of time within a Collaborative Research Center (CRC) with respect to collected data formats and methods used for data collection. Based on the results of the survey, the main contribution of this paper is a set of recommendations concerning data exchange formats, data publication, and data archival for large collaborative research projects in the interdisciplinary intersection of chemistry, chemical engineering, physics, and life sciences.

In the following, I summarize both strong points and main opportunities for improvement. I then provide detailed comments (roughly in the order of appearance in the manuscript).

== Strong points ==

S1) The manuscript is well-structured and easy to read. It follows all necessary guidelines.

S2) The paper offers a detailed view of the state of the art in data formats and collection methods. From this, the problems to be surmounted in the context of the project the paper describes (a project INF of a CRC) become evident.

S3) The discussion of selected possible solutions is extensive and clear and may provide some guidance for the future.

== Opportunities for improvement ==

O1) I must admit that I found the information provided in the paper not very compelling as it mainly focuses on survey results on the one hand side and possible solutions to address (quite generic) problems on the other hand. I encourage the authors to better connect the dots by discussing why some solutions make sense to address specific problems encountered in the CRC while others do not (e.g., what are selection criteria? Desiderata? What are the pros and cons of alternatives? …). I believe this would greatly improve the paper’s impact on future and similar initiatives and allow for a better assessment of whether the recommendations apply to another project’s setting.

O2) I would also appreciate reports on test scenarios or sample deployments of the recommendations, discussing the results / experience / lessons learned. The paper sometimes indicates that such cases exist (to be expected for a 12-year funded project) but limits the discussion to generic statements (works well). For a scientific publication, I expect a more detailed discussion to provide some evidence that the recommendations are reasonable.

== Detailed comments ==

D1) Abstract: While I understand that the Journal primarily targets a German audience, I suggest the abstract to be written in a more generally understandable way, i.e., avoiding references to CRC, INF.

D2) p2, last paragraph: Some reference to validation use cases or specific statements on how NFDI consortia may benefit would be more convincing and should be discussed in the paper (relates back to O2).

D3) p3., l. 55: Please provide a link or reference to the surveys / questionnaires when you first mention them.

D4) Given the long funding period of the CRC, including the INF project, it is somewhat surprising that the issue of FAIR science has only been addressed quite late (survey between 2021 and 2023). Can you provide some context on why this was considered in the final stage of the CRC? What were previous hurdles? What was the focus of INF before? …

D5) Section 3.1: This section does not add much more compared to the introductory text. It mainly reports what has been done. I would be interested in details on how things were implemented and why specific approaches were chosen. Just as an example, when referring to a living document, I wonder how it is managed. Does it keep track of versions (last sentence suggests but it is barely understandable based on poor English and the link was not working when I tested it) ? Can it faithfully associate answers to versions? How many versions or iterations were needed? How do you systematically keep track of verbal exchanges? Which “user group” (researchers) was targeted (PhDs, PIs)?

D6) p.8, ;. 173: Can you please elaborate on how the recommendations fit the data lifecycle?

D7) p.9: The reported survey results give little idea of the status quo with respect to data organization and documentation. The recommendations are mostly generic statements, and they may sound like common sense without providing further context. In general, knowing more about the practical, domain-specific aspects that require support for FAIRness beyond the survey results would give readers a clearer justification of why the recommendations both make sense and are necessary to be “spelled out” explicitly.

D8) p.10, l. 215ff: Again, the scope of the statement is not clear. To what extent are the more systematic platforms already used within the CRC? What is the status quo in the project? What problems persist for which recommendations can then be developed (on some ground). This also extends to l. 229, where you state: “In cases where ELNs are not quite suited”. I wonder how often this is the case? What characterizes those cases so I can easily identify them and then resort to other (suggested) solutions?

D9) I am unsure how to read and interpret Table 1. In the majority of methods, there is either no recommendation or there are multiple, where it is unclear how they differ / when which should be used. It seems as if the guideline is useful only for a few select categories in terms of actionable information. I also noted that the last method is recommended (bold font), maybe this is a table formatting problem?

D10) I found Section 3.3.3. very generic. Statements like “other systems can provide the necessary solution”, “exchange formats […] could assist in collaborations”, “This study thus also directly contributed to improved data management” ar not very informative. Give examples, report on actual use cases from the CRC, say how it improved what part of management, …

D11) p.16, l. 326ff: This would be the perfect opportunity to share a few more details on specific use cases.

D12) I suggest clearly defining the terms “shared”, “central”, “project”, “local system” for readers to better appreciate the difference between several options.

D13) p.16: While guidelines are a first important step to support collaboration, what about the technical implementation? Can you elaborate on APIs or aspects of the technical integration?

D14) Concerning the results summarized in Table 2, I am wondering how you came up with this final selection. What are the selection criteria? How were candidates selected? …

Downloads

Download Preprint

Metadata

Published: 2024-03-07
Last Updated: 2025-05-09
License: Creative Commons Attribution 4.0
Subjects: Data Literacy
Keywords: Data, Chemistry, microgels, research data management, collaborative projects, SFB985, INF, data-producing methods

Versions

All Preprints