Skip to content
Serghei Mangul edited this page Nov 1, 2018 · 19 revisions

This project contains the links to the datasets and the code that was used for our "study"

Data

Archival stability

We downloaded open access papers via PubMed from 10 systems and computational biology journals. Raw data in XML format is available here. Our approach to extract software links from the downloaded papers and verify the archival stability of links is described in the Methods section of the paper. Timeout links were manually verified.

Links extracted from the abstracts and the body of the surveyed papers (n=48,393) are available in CSV format here. The CVS file contains the following fields:

  • type: abstract (extracted from Abstract) or body (extracted from the body of the paper)
  • Journal: Name of the journal
  • Year: Year the paper was published
  • Link
  • Code : HTTP status: 0-300 - success. 300-400 redicrection. 400 - broken link. -1 - time out. See more details here
  • Flag.uniqueness indicates if the link was present in one paper or was shared across multiple papers.

Usability

We have randomly chosen 99 tools across various domains of computational biology. The methodology used to select tools and list of domains is presented in the Methods section of the paper.

Information about the usability of 99 tools is presented in CSV format here. The CVS file contains the following fields:

  • toolID
  • Name of the package manager from which the tools was available, or NA if the tools was not available via a package manager
  • Number of citations per year
  • Number of executed commands during the installation process
  • Number of commands suggested in the manual for installation
  • The proportion of undocumented commands
  • Binary flag to indicate if the tool passed automatic installation test. Tools that require no manual intervention are considered to pass automatics installation test.
  • The total Installation time
  • Binary flag to indicate how easy was to install the software tool. We categorized a tool as ‘easy to install’ if it could be installed in 15 minutes or less; ‘complex installation’ if it required more than 15 minutes but was successfully installed before the two-hour limit; and ‘not installed’ if the tool could not be successfully installed within two hours
  • Binary flag to indicate if the example dataset was provided

Reproducing results with Jupyter notebooks (in progress)

We have shared Jupyter Notebooks and the raw data allowing to reproduce results and figures from the manuscripts.

How to cite this study

Mangul, Serghei, et al. "A comprehensive analysis of the usability and archival stability of omics computational tools and resources." bioRxiv, doi: https://doi.org/10.1101/452532

Contact

Please do not hesitate to contact us ([email protected], [email protected], [email protected]) if you have any comments, suggestions, or clarification requests regarding the study or if you would like to contribute to this resource.

N/A

Clone this wiki locally