plotID - a toolkit for connecting research data and visualization

This is a Preprint and has not been peer reviewed. A published version of this Preprint is available on ing.grid . This is version 6 of this Preprint.

Authors

Martin Hock , Hannes Mayr, Manuela Richter, Jan Lemmer, Peter Pelz

Abstract

The highest amount of published information on paper is contained in visualizations such as 2D and or 3D plots. Supporting a generic research workflow, plotID provides tools that can a) create and anchor a reference (ID code, URL,...) for and b) package figures, data, code and parameters used to create the figure. The code is provided as tools with small impact, that need to be used consciously by the researcher and does not aim to relieve the researcher of his duty to keep his digital working environment organized. The exported packages help immensely to make results reusable and repeatable. The initial implementation was created in Matlab and used internally before rewriting the tool in the Python programming language, for easier distribution and adaption to diverse environments.

Comments

Comment #18 Kevin Logan @ 2023-03-15 03:24

I thank the authors for implementing the corrections recommended by the reviewers. Minor linguistic errors should be corrected during the copy editing step.

I would like to make one suggestion regarding the reference [16], Lambrecht et al. The authors may consider the latest work regarding FAIR principles for research software, which is also an RDA recommendation. The text is publicly available here: https://doi.org/10.15497/RDA00068.

Other than that, I would like to ask the authors to submit their LaTeX project folder in order for it to be processed for publication in the ing.grid journal.

Comment #17 Martin Hock @ 2023-03-09 02:50

I want to thank both reviewers again for their detailed remarks.
In addition to revisiting all of the noted issues concerning spelling, wording and grammar I have:
- expanded the references for FAIR, PyPI, pip and SemVer
- streamlined the formatting of common terms and product names (Open Source, Python,...)
- rearranged figure 3
- updated section 4 to represent the current state of the software
- added explanation about remote data in section 4
- rewritten the first sentences of the conclusion for clarity

Comment #16 Kevin Logan @ 2023-03-02 09:36

I would like to thank both reviewers for their feedback regarding the updated version. Following the reviewers' recommendation, I agree that the submission can be published. I advise the authors to address the minor issues highlighted by the reviewers and upload an updated version for publication.

Invited Review Comment #12 Jane Wyngaard @ 2023-01-18 12:03

All primary comments addressed, minor text and grammar issues highlighted

Recommend for publication

Highlighted issues:
Line 31: "Last sentence needs a rewrite"

Line 38: "stays"? replace with 'remains'
Line 45: "programming'? replace with "code" or "software"

Section 3:

Put level of use/popularity of Matlab and Python into global research use context
FAIR must be defined

Section 4.2.2: Add comment on if there are any plans to accommodate web hosted/remote data?

Eg if the data is at a url how would PlotID handle that

Section 8: 1st two sentences are awkward/could be rewritten

Invited Review Comment #11 Bernd Flemisch @ 2022-12-22 23:59

Thank you for revising the manuscript! It looks like all my comments have been addressed properly.

For me, this is good to go in terms of content. Before the actual publication, please check again for
- linguistic correctness. There are still miswordings such as "The ID on in the published paper" in line 82.
- consistency in capitalization. For example, "python" is sometimes capitalized, sometimes not.
- readability of Figure 3. Currently, it has to be zoomed in too much. Maybe possible by putting the four blocks "tagplot()"... below each other.

Comment #9 Martin Hock @ 2022-11-15 07:54

The authors wish to thank both the reviewers for their detailed reviews, advice and points for improvement.
The comments resulted in many changes to both the descriptor and the repository. A new version of the submission has been uploaded. A few points have been added as issues to the repository, to be implemented in the near future.
The changes include:
- Rewrite of the abstract to be more concise in detailing the posed problem and the steps for a solution.
- Added to the statement of need, to better explain the motivation.
- Comparison with similar software: Added general comparisons about the main functions and goals of plotID after an investigation into a wide variety of software used for research data management.
- We made sure to mention the current limitation to Python and matplotlib or picture files earlier and plainer. Breaking free of these limitations is a future aim of the project.
- The implementation section was supported with a system architecture diagram.
- We decided against listing all Python dependencies, we will add this to the official documentation.
- The example code was provided with a figure showing the resulting folder and picture including the ID.
- We discussed separate installation instructions, but since only the first step optional step diverges, we emphasized this difference instead. Should more steps diverge we will create separate instructions.
- Added a 'CONTRIBUTING' file, containing instructions to the repository and added a short section with a reference to the readme.
- Added a section in the readme on how to install optional dependencies, which doubles as instructions for a development environment.
- Added a feature request about not printing the ID to the repository. This should be implemented within the next weeks.
- Added a feature request to add the ID as metadata to picture files and as an additional text file.
- Started investigations on how to add metadata in a generalized way to python objects and export them.
- Added more details about the plans to export the python environment as requirements file.
- Adjusted many small errors in wording and formatting.

Comment #7 Kevin Logan @ 2022-10-14 01:17

As the responsible topical editor, I would like to thank both reviewers for their detailed and constructive feedback. After consideration of the comments, I advise the authors to revise both the descriptor and the repository according to the suggestions provided in the reviews. After completion, the new version of the descriptor including the revisions may be submitted by the authors for further consideration.

Invited Review Comment #3 Bernd Flemisch @ 2022-10-04 11:53

Overall assessment:

The authors present a Python tool for equipping a plot or more general an image with a unique ID. While the idea is certainly interesting, the manuscript would profit from a clearer motivation and a more careful presentation. I would appreciate if my comments below are considered before publication.

Detailed comments:

- While I can imagine that IDs for plots can be useful, I would welcome a more elaborated motivation, for example, by means of specific use cases. I would also be very interested in the envisioned interplay of several PIDs for, e.g., paper, data, code and figures.

- Section 3: I would rather have expected a description of the code architecture and its dependencies. I think that there is no need for justifying the usage of Python. Moreover, the license information belongs rather to Section 6.

- Line 101: "everything necessary to recreate the visualization from scratch". I doubt that, usually there will be more necessary than just the script. At least the result data, possibly much more. Please provide a more extensive discussion of what should be done in a more general case or point to the (improved) description in Section 5.2.

- Section 5.1.3: It seems to be mandatory that the ID is displayed together with the plot. I could imagine that this is not desired in some cases. Would it be also ok to add the ID to the figure's metadata? Please elaborate on this, maybe include the option to add it to the metadata.

- Section 5.2: It's currently hard to put the individual "pieces" 5.2.1-3 together to get the full picture. Please organize the section in a better way.

- Section 5.2.1: I wonder if parsing the import statements is really enough to achieve reproducibility. For this, the employed versions should also be captured. Please elaborate on this.

- It would be good to add a specific example with a figure that is put in the paper. Could be taken from the repo, but maybe even Figure 1 qualifies?

Small issues:

- Abstract, line 2: "and or", please choose one.

- Abstract, l.3f: Something seems to be wrong or at least not easy to understand with a) and b). I think that you mean this: "a) create and anchor a reference (ID code, URL,...) for THE FIGURE and b) package figures, data, code and parameters used to create IT." It still sounds odd to "package figures ... used to create a figure". Maybe one can simply omit the "figures," here.

- Abstract, line 4ff: "as tools" and "does" don't fit together.

- L.13f: A closing parenthesis is missing and the next sentence starts without a dot before. I suggest rather "repositories, see ... [6]. Labelling ...".

- L.28: Delete the "on".

- L.36: "even BE shipped".

- Section 5: If a normal word is the first word of a section heading, it should be capitalized.

- Section 5.1.3: "Type of string" should rather be "Of type string".

Invited Review Comment #2 Jane Wyngaard @ 2022-09-21 07:54

PlotID appears to be a useful tool

Some minor modifications to the the descriptor submission and public repository would increase its reuse and impact.

Descriptor:
Abstract should be re-written, it's poorly structured making it difficult for a reader to quickly understand what is being reported on. Further, highlights of PlotID specific features should be included.

The statement of need currently also serves to report on related work. This is limited to naming one other software package. This package's functionality and limitations along with other similiar efforts to fill the same need should be elaborated on in more detail so as to offer a fuller comparison with PlotID.

The fact that this tool is Python specific should be pointed to from the start rather than being left ambiguous through the use of the general term 'scripts'. This is a tool that specifically allows for integrated tagging and version control of Python generated plots for a range of data types such as can be visualised with matplotlib. These facts and other system requirements and features should be highlighted earlier in the description.

A reader will gain more from reviewing the provided example code in section4 if it was placed after the contents of section 5

Section 5 would be more readable if a system architecture diagram was provided.

Repository:
* Separation of unix and windows installation instructions into separate subsections would be more efficient for a reader
* Contributing: Are pull requests and issues welcome? How would a contributor setup their development environment?

Downloads

Download Preprint

Metadata

Published: 2022-09-05
Last Updated: 2023-03-17
License: Creative Commons Attribution 4.0
Subjects: Data Management Software
Keywords: research data management, visualization, figure, plot, mapping, referencing, ID, visualization, figure, plot, mapping, reference, ID, organisation

Versions

All Preprints