This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
Research data management (RDM) in academic scientific environments increasingly enters the focus
as an important part of good scientific practice and as a topic with big potentials for saving
time and money. Nevertheless, there is a shortage of appropriate tools, which fulfill the
specific requirements in scientific research. We identified where the requirements in science
deviate from other fields and proposed a list of features which RDM software should fulfill to
become a viable option.
Finally we analyzed the open-source RDMS CaosDB for compatibility with the proposed features and
found that it fulfills the requirements.
Data Management Software
Data Management, Research Data Management, Agile Data Management, Software Tools, FAIR Data, Good Scientific Practice
Published: 2023-03-28 13:12
Invited Review Comment #24 Torsten Bronger @ 2023-04-28 16:28
I fully agree with the premiss of the article that data management tools must have a flexible data model and practical use. Moreover, I understand that CaosDB provides that data model and good functionality to search in it, which is an important use case. I am convinced that CoasDB fulfils its requirements and is good software per se.
Still, I don’t really understand which niche CaosDB is aiming at. I see a huge functional overlap with ELNs (electronic lab notebook). In fact, I consider CaosDB an ELN (without judging its quality as such). However, the authors clearly distinguish between the terms ELN and RDMS. This may make sense, but I don’t see an explanation that sets RDMS’es apart from ELNs.
Be that as it may, these days, an institute typically deploys an instance of an ELN and manages its data with that ELN. ELNs can provide custom fields per record, can automatically import exterimental data via crawlers, and can search in the data (without SQL). This all works decently for most users, in my experience.
Again, CaosDB may well be a fine tool for RDM, but as a reader I have difficulties in understanding its specific “short-term advantages” (quote from the article). If it really is another beast than an ELN, and sort sort of umbrella for other ELNs and data sources – what is the benefit for the researcher?
Minor remarks
Line 189: Please add a note that this ELN integration is in a very early phase of development.
Figure 4: I think some readers would want to know why MariaDB was chosen for the backend rather than MongoDB. The latter is much closer to CaosDB’s memory model after all – just some restrictions would have to be imposed on the application level.
Lines 230–239 look odd to me. They introduce a new issue, whereas the conclusions should wrap up the article. I assume it is meant as an outlook, but then, it should be introduced as such.
Invited Review Comment #21 Marius Politze @ 2023-04-20 14:40
Content:
The article "Agile Research Data Management with Open Source: CaosDB" presents a requirements analysis process for "CaosDB", a research data management database system to link research data and store research metadata using mostly unstructured records with flexible properties. After a short introduction into the topic of RDM, the authors propose a schematic and exemplary research data life cycle and deduce their main requirements for their implemented software in section 2. To cover the requirements, the authors derive a set of features that their software should cover in section 3. Section 4 discusses architectural and structural decisions for CaosDB and very briefly points out some prominent features. The article closes with a brief conclusion and appendixes referencing the source code and giving a comparison of SPARQL and a CaosDB query.
Overall Evaluation:
The article is well written and understandable. However, it leaves me an impression to be more like a tool presentation than a scientific article. While this is not uncommon in the field of RDM, I am still missing at least a critical evaluation respecting the following:
Relevance:
Simplification of queries and a certain agility in the data models certainly are in my experience a non-neglectable issue for uses' acceptance of RDM metadata storage systems. CaosDB is not the first and certainly not the last system trying to cover this area. The article presents CaosDB as an example of research software engineering, where a product developed by a group of researchers eventually emerges as a tool for a bigger and cross-discipline community. The article also shows what I see as some of the major challenges: the adoption of standards and research data common infrastructures for interoperability and mid to long term availability as the tools grow. Technology wise other approaches like document oriented databases (elasticsearch / opensearch), data migraions for RDMS or schema or profile based (linked data) metadata stores are widely available.
Presentation quality:
Major Improvements:
RecordType
sMinor Improvements: