A survey on the dissemination and usage of research data management and related tools in German engineering sciences

This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.

Authors

Tobias Hamann M.Sc.  , Amelie Metzmacher, Patrick Mund, Marcos Alexandre Galdino  , Anas Abdelrazeq  , Robert Heinrich Schmitt

Abstract

As the amount of collected and analysed data increases, a need for data management arises to ensure its usability. This also applies in research. This challenge can be addressed by Research Data Management (RDM), which brings clear focus on the reusability of data. To understand the status quo of the application of research data management in engineering sciences in Germany, as well as possible challenges and improvement chances, a survey was conducted over the last quartal of 2020. Over 168 (n=168) researchers from the engineering sciences in Germany provided their view via a questionnaire that contains 216 question items. The results give intel on the interviewees knowledge and perceived relevance of research data management in their daily research activities. For instance, the application of research data management related tasks, data sharing with third parties, usage of different tools, and the involvement of different file formats were part of the survey. The survey closed with questions regarding RDM specifications, support structures, and questions on reasons that could prevent researchers from adapting sustainable RDM. This paper presents the results of the study, providing an overview over the current RDM in engineering and pointing out possible measures and strategies to foster it, namely the integration of guidance and education for research data management. Along the paper, we publish the collected data set to enable further analysis and reuse (e.g. for extended statistical analysis).

Comments

Comment #121 Tobias Hamann M.Sc. @ 2024-07-01 11:39

Dear Topical Editor Prof. Dr. Regine Gerike,

thank you very much for overseeing the peer review process for this article. We revised the paper extensively based on the feedback provided by the reviewers. The changes were documented in form of a change log, referring to the structure of the reviewers comments. We addressed to comments as soon as possible to not extend the project timeline.

We are very thankful for the reviewers feedback and fully agree on their comments which enhanced the article!

Kind regards,
Tobias Hamann

Comment #120 Tobias Hamann M.Sc. @ 2024-07-01 11:32

Response to Review No. 2 (Christine Eisenmann):
Dear Christine Eisenmann,
thank you for your review of our study. Thanks to your feedback we could rework the article to hopefully better live up to the standards of ing.grid. Based on your feedback, we comprehensively reworked the article to grant a better fit to the journal’s requirements. Below, we address all of your suggestions:

1. General comments:
o Added research question explicitly
o Reworked Results section
i. Moved 4.5 to appendix, as results might be of use for further research on different aspects of RDM, yet does not contribute to the research question
ii. Added disclaimer at the start of section 4 to empathise on the role of hypotheses as findings rather than facts to take the small sample size into account
iii. No in depth statistical analysis was presented, as the survey’s sample size is rather small and is considered qualitative
iv. Restructured Results chapter to only contain information to build up hypotheses and hence answering the research question
o Added “exploratory” in the abstract
o Added “qualitative” into the introduction
o Added Limitations section
2. Abstract:
o Added the research question to the abstract
o Added two sentences to the abstract
o Removed one sentence about survey contents from the abstract
o Kept the number of questions in the abstract as this information seemed important to reviewer No. 1.
3. Introduction:
o Added Figure on all Archetypes
o Added reference to new figure
o Added key objective of Archetype Frank in text
o See 1. - Added research question explicitly
o Reworked first paragraph to better build up towards the research question
o Removed paragraph on article structure
4. Related work:
o Removed table 1
o Added better textual description of search strings components and its compilation in the text
o Added PRISMA Diagram in Appendix
o Added number of duplicated to the text
o Added info on the numbers of paper being excluded in the full text review to the text
o Changed numbers in table 2 to the numbers with duplicates included
o Fixed typo in table 2
o Added number of duplicates excluded and unique records to table 2
o Moved all paragraphs for individual papers to appendix
o Added table for related work
o Empathised on similarities and differences in related works
o Added comparison of common findings among papers
o Added reference to research question to section 2.1 to explain the connection to the SLR search string
5. Methodology:
o Added info on the selection of target group of the survey
o Added comparison of sample size to population in 3.1
o Added disclaimer at the start of section 4 to empathise on the role of hypotheses as findings rather than facts to take the small sample size into account
o Added comparison of sample size to population in limitations
o Added absolute numbers of total population in 3.1
o Added estimated response rate
o Clarified between questions and question items and added further explanations and a column to Table 4
o Added information, that incomplete surveys have not been taken into consideration
o Added information on average participation time
6. Results:
o Removed decimal places
o See 1. - Restructured Results chapter to only contain information to build up hypotheses and hence answering the research question
o Changed colour scheme of all diagrams to colourblind friendly colour palette: Paul Tol’s Muted
o Changed figure 5 to be a combination of figures 4 and 5
7. Discussion:
o Added disclaimer at the start of section 4 to empathise on the role of hypotheses as findings rather than facts to take the small sample size into account
o Changed text in 4.1 from “conclusion” to “hypothesis”
o Changed texts in 4.2 from “conclusion” to “hypothesis”
o Changed texts in 4.4 from “conclusion” to “hypothesis”
o Added further disclaimer right before hypotheses 5-10 in section 4.7
o Reworked structure, wording and texts in Discussion
o Added two subsections to better structure the section
o Added hypotheses reference (number and a shorthand) at the corresponding paragraph
8. Summary and outlook:
o Added backwards linking to research question
o Added answer to research question more explicitly
o Reformulated the findings

Comment #119 Tobias Hamann M.Sc. @ 2024-07-01 11:29

Response to Review No. 1 (Anonymous):
Dear Reviewer,
thank you for your review. It provides constructive feedback to significantly improve the article. We have revised the submission according to your suggestions. In the following, each of your comments is addressed:

1. Unclear classification and consideration of sample size and specific focus
o Added the exact distribution of respondents and their distribution on different research institutions.
o Added further explanation on the focus of the survey on mechanical and industrial engineering
o Changed the title accordingly
o Added explanation on the survey dissemination
o Added section Limitations for a clear distinction between content-wise discussion and limitations of the methodology
o Added bias consideration in Limitations
o Added “exploratory” in the abstract
o Added note on small sample size in summary
2. The structure of the questionnaire is not sufficiently explained
o Clarified between question number and question item number
o Added column to Table 4 for further clarification
o Added further explanation on the survey structure
o Added challenges of the survey to the new limitations section
3. Integration of the hypotheses as core statements in the text
o Added explanation on why Hypotheses were formulated
o Added further explanation on the hypotheses drawn from the free-text answers
o Added information on the hypotheses drawn from the free-text answers to the limitations
4. Consideration and discussion of the sample size and specific focus
o Reformulated abstract
o Added comparison of sample size to population in 3.1
o Added disclaimer at the start of section 4 to empathise on the role of hypotheses as findings rather than facts to take the small sample size into account
o See 1. - Added note on small sample size in summary
o Changed text in 4.1 from “conclusion” to “hypothesis”
o Changed texts in 4.2 from “conclusion” to “hypothesis”
o Changed texts in 4.4 from “conclusion” to “hypothesis”
o Added further disclaimer right before hypotheses 5-10 in section 4.7
o Reworked wording and texts in Discussion
o See 1. - Added bias consideration in Limitations
o Added comparison of sample size to population in limitations
o Added absolute numbers of total population in 3.1
o See 1. - Added explanation on the survey dissemination
o Added comparison of figure one’s distribution and the average distribution in the field under consideration from German engineering sciences
5. Revision of the graphics
o Changed colour scheme of diagrams to colourblind friendly colour palette: Paul Tol’s Muted
o Added N=168 to all diagrams
o Changed figure captions to better describe corresponding question from questionnaire
o Added keys to supplemental data in captions
6. Insights from previous studies
o Added table for related work
o Empathised on similarities and differences in related works
o Added comparison of common findings among papers
7. Explanation of the questionnaire and method
o See 2. - Clarified between questions and question items and added further explanations and a column to Table 4
o Added number of pages to the beginning of section 3.2
o The questionnaire as export is published on Zenodo, containing also the structure (see Engineering_RDM_Survey_Questionnaire.xlsx)
o Added more information on the surveys structure
o Added information, that incomplete surveys have not been taken into consideration
o Added information on average participation time
o Added description of data processing
o Added information about the “Not specified” option at the beginning of section 4.1
o Added review about the “Not specified” option at the Limitations section
o See 4. - Added disclaimer at the start of section 4 to empathise on the role of hypotheses as findings rather than facts to take the small sample size into account
o We considered reporting results without interpretation before reporting interpretations, as this was the initial structure of the document. We found, that this distribution would cause a loss of connection between the raw results and the conclusions drawn. Hence, we chose to change the structure to the current one.
o Added text, that Dropout rates are not considered, as only complete surveys were used to evaluate the results.
8. Further Discussion and evaluation of the results
o Added paragraph in summary on temporal development of the relevance of the findings
o Changed a paragraph in the summary to better outline possible further research
o Reformulated the last paragraph
o Added explanation, that data sample might be too small for statistical analysis

Comment #117 Regine Gerike @ 2024-06-26 21:22

As the Topical Editor overseeing the peer review process for 'A survey on the dissemination and usage of research data management and related tools in German engineering sciences' I would like to extend my thanks to both reviewers for their detailed and constructive feedback. In line with their comments, we advise a major revision of the paper. We request that the authors thoughtfully incorporate this feedback and submit a revised manuscript clearly detailing the changes made. Demonstrating a careful revision underscore our commitment to quality and will significantly contribute to advancing the field.
We recognize that this review and revision process extends the project timeline. However, this emphasizes our preference for thoroughness over speed to maintain the manuscript's integrity and its contribution to the field.

Invited Review Comment #113 Christine Eisenmann @ 2024-06-06 02:32

The authors of the study analyzed the use of research data management (RDM) and related tools by researchers in Germany. A survey was conducted for this purpose. In my view, the topic is very relevant for the research community and thematically appropriate for the journal ing.grid. However, the manuscript has significant weaknesses in various parts. Due to the good fit with the journal I have decided to recommend a revision. However, I ask for a very comprehensive revision in order to meet the requirements of the journal. 

A general comment: so far the manuscript reads more like an activity report. I miss the specific research questions to which the manuscript is dedicated. These research questions should be formulated in the abstract and introduction and underpinned by the literature chapter. The methods section should explain the methods used to answer the questions and the results section should then highlight the results relevant to the research questions in a concise and statistically substantiated way and discuss them in greater depth. It is not advisable to show all possible results of the survey.  This general structure of a journal paper is important to bear in mind when revising.

Below, I have listed some points of critiques, questions and ideas to improve the manuscript:

Abstract:

·        Research question and key findings are missing in the abstract. 

·        In my view, the number of questions in the survey is not a key information item.

 

Introduction:

·        Line 36 ff.: The authors refer to the NFDI4Ing archetype "Frank". From the description in the manuscript, I did not understand the function of the archetype and how it differs from other archetypes. If applicable, an explanatory graphic or similar can be included here.

·        The research questions should be derived and explained.

·        Line 52 ff.: The paragraph on the structure of the manuscript is empty of content, as no reference is made to the chapters' content and only these generic headings are mentioned.

 

Chapter related work:

·        It is not clear how Table 1 is to be read. This should be explained in the text.

·        Does Table 2 contain paper duplicates (i.e. papers that were found in several platforms)?

·        Line 72 ff.: The authors write that 23 papers remain after the selection process, but only 6 papers were chosen by the full text review. Why is that?

·        Chapter 2.2. is very long, as a single paragraph was written for each paper. It would be more useful for the reader if the results of the reviews were presented in a table. This would make it easier for the reader to understand which methods are used in the various papers, which authors arrive at comparable results, how the results differ, etc.

Chapter Methodology

·        Why was this target group selected for the survey?

·        How representative is the sample for the target group/population?

·        How high is the response rate in the survey?

·        Line 231: To me, 216 question items seem a lot for a survey. How long did it take the respondents to complete the survey? Please discuss in critical terms whether such a large questionnaire might lead to respondents no longer completing the questions carefully.

Chapter results: 

·        Some results were shown with one decimal place and some without. This should be standardized (my recommendation: without decimal places).

·        I recommend editing the graphics so that they convey the most important findings of the survey. It is often advisable not to answer each question individually, but rather to relate the information. For example, Figure 6 and Figure 7 are more interesting than Figure 4 and Figure 5 or the information from Figures 4 and 5 could be integrated into the more complex graphics.

·        Figure 10 ff.: the chosen colors are difficult to read.

Chapter Discussion

·        Have you drawn “ten conclusions” or “ten hypotheses”? That is a difference.

·        It is difficult to follow the discussion when the “hypotheses/conclusions” have been stated several pages before.

Chapter Summary and outlook

·        Based on my general comments this chapter should be comprehensively revised

Invited Review Comment #110 Anonymous @ 2024-06-03 02:56

Summary: Resubmit with revision

This paper addresses with the investigation of current practice in research data management (RDM) in engineering sciences an issue of high relevance that has not been sufficiently covered so far. The paper thus comes with strengths but there are also shortcomings that need to be addressed before this paper is ready for publication.

Strengths

  • Given the increasing importance of data management, the study is highly relevant. It addresses a critical need for improved RDM practices in the engineering sciences and is also one of the first studies to use a survey to provide deeper insights into the status quo in Germany.
  • The literature is properly screened and relevant references are reported.
  • The findings provide valuable insights into the barriers to effective RDM, such as the lack of clear guidelines or insufficient support structures. These insights are crucial for developing targeted interventions to enhance RDM adoption.
  • The authors have made the dataset publicly available to promote transparency and enable further research.

Weaknesses

  • Unclear classification and consideration of sample size and specific focus
    • L204ff.: The sample looks very focused on RWTH Aachen and derived from the NFDI4INg community cluster 41 and the archetype Frank. This mean that only one specific part of engineering sciences are addressed, not the entire engineering landscape in Germany. There is also a risk here that the sample is biased. However, this point is not discussed any further in the paper. The concept for recruitment needs more explanation.
    • With this in mind, the sample size of 168 net respondents also seems too small to make statements for the engineering sciences in Germany as a whole. For me, it has more the character of an initial (important) exploratory assessment and I do wonder whether the exploratory nature of part of the engineering sciences should not be emphasized even more clearly?
  • The structure of the questionnaire is not sufficiently explained
    • L230ff.: Against the background of a total of 216 (!) questions, it is not clear from the description whether and which challenges have emerged from this large number of questions.
  • Integration of the hypotheses as core statements in the text
    • L260ff.: The emphasis on the role of hypotheses in the text seemed somewhat incomprehensible to me and it was difficult to maintain an overview here between the continuous text and hypotheticals.
    • L473ff.: 5 of the 10 hypotheses are derived primarily from the comments. However, these were only stated by 23% of the interviewees but the hypotheses are transferred linguistically to the entire context (the interviewees...). Here, too, the statements are not placed in the context of the sample size.

Major Suggestions

  • Consideration and discussion of the sample size and specific focus
    • The sample size and specific focus should be well considered in all steps, from the description of the methodology to the discussion and conclusion sections. There are partly very strong statements based on such a small (and possibly biased) sample. My suggestion would be to be a bit more modest, e.g. “This study gives first indications”, and it is important to really only list conclusions that can be done based on the data collected in this study.
    • The concept for recruitment needs more explanation, e.g.: What is the basic population in engineering sciences? (in Germany ?) Or a sub-group (and if yes which sub-group)? How have potential respondents been approached? How big are the different groups in figure 1 in the basic population and how does this compare to your sample?

Further Suggestions

  • Revision of the graphics:
    • It would be good to check for alternative color schemes for the figures that are clearer. Furthermore “n=##” – the number of cases should be provided in the figure.
    • L243ff.: It would be good to add the specific question to each of the figures in order to be precise in what is shown in each figure.
  • It would be better to synthesise the insights from previous studies and to organize the literature section along these main insights, summarizing similarities and also differences in methods and findings from previous studies – and not just report reference by reference.
  • Explanation of the questionnaire and method
    • The survey is with 216 questions very comprehensive. The structure and functionality of the questionnaire should be better explained. Although the structure of the questionnaire has already been published on Zenodo, it would be good to publish the questionnaire itself. If the questionnaire cannot be published, more information about the structure should be provided.
    • Data processing should also be described better. There seems to be many missing values for some questions. How do you deal with this? Also dropout rates should be looked at and analysed carefully for all parts of the questionnaire (e.g. have there been specific questions where respondents dropped out).
  • A better Integration of the hypotheses as core statements in the paper would be helpful, eg. some explanation on what these numbered sentences mean. The authors might also consider first only reporting results without interpretation (result section) and then reporting interpretations/ comparison with previous studies etc. (discussion section).
  • Further Discussion and evaluation of the results
    • The survey on RDM practice was conducted in 2020, it would be good to discuss (e.g. in the outlook for further research) what changes might be expected ever since and which research needs result from these possible dynamics in RDM practices.
    • L599f.: I would be careful with a complete statistical analysis of the data from this sample. My impression is that the descriptive methods fit well to the data.

Downloads

Download Preprint

Metadata
  • Published: 2024-02-28
  • Last Updated: 2024-02-28
  • License: Creative Commons Attribution 4.0
  • Subjects: Data Governance, Data Infrastructure, Data Literacy, Data Management Software
  • Keywords: Research data management, RDM, Survey, Dissemination, Usage of Research Data Management, RDM, Survey, Dissemination, Usage of Research Data Management
Versions
All Preprints