Betty’s (Re)Search Engine: A client-based search engine for research software stored in repositories.

This is a Preprint and has not been peer reviewed. A published version of this Preprint is available on ing.grid . This is version 3 of this Preprint.

Authors

Vasiliy Seibert , Andreas Rausch, Stefan Wittek

Abstract

Promoting research, without providing the source code that was used to conduct the research, means a greater effort for every researcher down the line. Existing solutions that aim to make research software FAIR [1], fail to provide a wholesome solution, for they do not sufficiently consider already existing research software stored on platforms like GitHub or organizational GitLabs. We therefore present Betty’s research engine, a client-based implementation of a cascading search process, that first finds research software stored on platforms like GitHub and then links them to corresponding publications or entries in third party databases. We evaluated 400 random search results from the domain of ecology and found that 345 out of 400 repositories made a reference to a corresponding publication / entry in third party database and therefore clearly indicating the potential of the cascading search. Betty’s research engine is live and openly available under this URL: http://nfdi4ing.rz-housing.tu-clausthal.de/

Comments

Comment #106 Ulrike Küsters @ 2024-04-03 13:34

As the Topical Editor overseeing the peer review process for "Betty’s (Re)Search Engine: A client-based search engine for research software stored in repositories," I am pleased to announce the acceptance of this paper for publication in our journal. This decision comes after careful consideration of the insightful feedback and constructive criticism provided by our esteemed reviewers, whose expertise has significantly contributed to the refinement of this manuscript.
The review process, while lengthier than anticipated, has served as a valuable learning curve for all involved and I would like to extend my profound thanks to the authors, Vasiliy Seibert, Andreas Rausch and Stefan Wittek for their dedication in addressing the concerns raised during the review process.

Comment #98 Vasiliy Seibert @ 2024-03-04 11:22

Thank you Mohammadreza Tavakoli for your insight. I will try to address your concerns. Our paper describes a novel approach that aims to make research software more FAIR. Since it is client based and the search process is initiated on platforms (like GitHub), where software repositories are being stored, we believe that the title "Betty’s (Re)Search Engine: A client-based search engine for research software stored in repositories." describes the contents of this paper just fine. We chose to reiterate the problems described in [2] and [4] to introduce the reader to the challenges that we try to find an approach to. [2] and [4] are therefore mere examples which we make, to give different perspectives on the same challenge.

Staying consistent with our references, we choose [4] to demonstrate the effectiveness of BRE. As for the "expectations of the reader", this paper describes exactly what it promises in the abstract. We describe the problem scenario, reference different perspectives to the challenge, explain our approach in great detail, provide an evaluation which gives an realistic insight of what the user can expect and eventually we discuss, reflect and conclude that more work needs to be done.

Maybe you could help us understand what your concrete expectations were and where this paper failed to deliver? Conducting an evaluation that is representative for the entire domain of research software stored on GitHub is well beyond the scope of this paper, could be addressed in a follow up paper, though.

Invited Review Comment #97 Mohammadreza Tavakoli @ 2024-02-28 12:31

Thanks for your response to the reviews. I do not agree with some of the mentioned points, so I will try to clarify the issues in the following parts.

Discussion-1

***

"Is there any specific reason for choosing the area of ecology for the evaluation?" - yes there is. With this evaluation we address the problems discussed in [4]. By evaluating 400 software repositories from the domain of ecology we stay consistent with our references.

If your solution is not general and applicable in other fields (which could be an expectation according to how you have written the paper and its objectives), then you need to revise the title of your paper, like what [4] has done: "Low availability of code in ecology: A call for urgent action"

***

Discussion-2

***

"As the research standards in different areas can be totally different, choosing another area might affect the outcome. So, it is highly recommended to check a few other areas and report them in the paper" - yes it might. However conducting an evaluation that is representative for the entire domain of research software stored on GitHub is well beyond the scope of this paper.

Again, I would try to adjust the expectation of readers. Because when I am reading your abstract and introduction, for me it is not beyond the scope of the paper. For me, it seems you try to find a general solution, because you mention the target issues generally. To summarize, although I agree that this paper can be a good start, some parts of the paper (especially abstract and introduction) should be more focused on the scope of the paper.

***

Discussion-3

***

"One more limitation which should be mentioned as the limitation of the paper is that the system is not easy-to-use for individual researchers, because, for example, they need to know about the Github key, create one, and add it to the tool." Using a GitHub authentication token is a prerequisite for using BRE. We do not consider this to be a limitation but rather a requirement that we adequately name and provide references to. If this paper would describe a python package, the prerequisite of knowing how to code in python would not be considered as a major disadvantage to the approach in general.

For me, it depends on the target audience. For example, if you want to resolve an issue in the area of nursing with a python package (assume that the nurses do not have programming knowledge), then knowing about python programming can be a huge barrier (and not a simple prerequisite). However, I only consider this as a limitation which should be mentioned and considered.

***

All in all, I find discussion 1 and 2 critical for publishing the paper. So I would like to know the authors' response to them.

Invited Review Comment #96 Martin Maga @ 2024-02-15 10:25

I have no concerns and I recommend the paper for publication. :)

Comment #92 Ulrike Küsters @ 2024-02-08 11:46

I would like to extend my thanks to both reviewers for their detailed and constructive feedback. In line with their insights and following the managing editor's recommendation, we advise a revision of 'Betty’s (Re)Search Engine.' We request that the authors thoughtfully incorporate this feedback and submit a revised manuscript clearly detailing the changes made. Specifically, please consider and discuss the suggestions regarding evaluation scope, in-depth analysis of false negatives, and system usability. Demonstrating a careful revision underscore our commitment to quality and will significantly contribute to advancing the field.

We recognize that this review and revision process extends the project timeline. However, this emphasizes our preference for thoroughness over speed to maintain the manuscript's integrity and its contribution to the field.

Invited Review Comment #90 Mohammadreza Tavakoli @ 2024-02-01 11:46

Summary

The manuscript is about developing a client-based research engine (called Betty) which aims at linking publications with their research softwares which are stored on development platforms like Github. The tool has been evaluated through 400 random repositories in the area of ecology.

Relevance

The paper is completely relevant for the ing.grid journal as it facilitates the process of finding available software repositories for publications which promote open access principle.

Grounding

The related literature has been reviewed well and the readers can understand the aim of the paper in a meaningful way.

Methods and Results

The proposed method makes sense and the way that the authors describe is easy to follow. However, the logic of the evaluation could be adjusted better. For instance:

- Is there any specific reason for choosing the area of ecology for the evaluation? As the research standards in different areas can be totally different, choosing another area might affect the outcome. So, it is highly recommended to check a few other areas and report them in the paper

- The false negative of the approach needs attention, however, in my mind, the importance of improving it is not mentioned properly (the accuracy improvement is mentioned in the paper in general). Assume you use this tool, and receive something like a “not found” from the tool, so probably you, as researcher, would not search for it somewhere else which can be a big issue.

- One more limitation which should be mentioned as the limitation of the paper is that the system is not easy-to-use for individual researchers, because, for example, they need to know about the Github key, create one, and add it to the tool.

Implications

As it has been mentioned in the paper, the results of the paper show the fact that working on such research problems is promising and needed for having a reproducible open research.

Minor suggestions

Maybe adding a summary about the evaluation outcome would be helpful for readers
First paragraph of section 2 is repetition
Typo “wich” in the first paragraph of section 2
Typo “Adress” in section 3
“weren’t” in section 4 (better not to use abbreviated version)
You may remove commas before/after “That”s (e.g., second sentence of section 4)
“don’t” in section 6 (better not to use abbreviated version)

Invited Review Comment #81 Martin Maga @ 2023-12-21 15:33

In their paper Betty's (Re)Search Engine, the authors Seibert, Rausch and Wittek present an innovative solution to the problem of consistent implementation of the FAIR principles.

The authors convincingly argue that previous approaches to implementing the FAIR principles do not fully work if you have a closer look.

Betty's, the presented client-side solution of a cascading, parallel search process that links software repositories (like GitLab) with external publication databases, is convincing as a concept.

The accuracy of around 35% of Betty's can still be improved, of course. In the current concept phase, however, this is by no means an exclusion criterion. This is true especially because of the design decision to allow the search engine to run entirely in the browser on the client side, which should be emphasized positively.

Interesting follow-up questions would include hybrid methods using rule-based and statistical (AI-based) approaches.

All in all the presented approach is innovative and convincing.

Downloads

Download Preprint

Metadata

Published: 2023-03-09
Last Updated: 2024-02-15
License: Creative Commons Attribution 4.0
Subjects: Data Management Software
Keywords: Inggrid, Data, Research Data Management, research software

Versions

All Preprints