Artifact Evaluation

Help others to build upon the technical contributions of your article!

The Artifact Evaluation (AE) process is a service provided by the Programming Journal to help authors of accepted articles (including articles accepted subject to minor revisions) to extend the reach of their work so future researchers can build on and compare with that work. The results can be of any form (implementations, data, analysis results). The AEC will read the article and explore the artifact to give feedback about how well the artifact supports the article.

Submissions to AE are voluntary. Articles evaluated positively in the AE process will receive badges to be included on the first page of the article. By default, we expect artifacts to be available publicly (for details and exceptions see below). We will publish and archive the artifact alongside the article via Zenodo as part of the Zenodo ‹Programming› community. Articles will include references to their artifacts and artifacts references to the corresponding articles.

Two badges are available: Available and Supports Claims (details below).

Timeline and Submission Process

Please consult the journal’s submission timeline for the timeline of the artifact evaluation.

Please consult the artifact submission page for details on how to package and submit an artifact.

Selection Criteria

The artifact is evaluated in relation to the expectations set by the article. For an artifact to be accepted, it must support all the main claims made in the article. Thus, in addition to just running the artifact, evaluators will read the article and may try to modify provided inputs or otherwise slightly generalize the use of the artifact from the article in order to test its limits.

Artifacts should be:

  • consistent with the article,
  • as complete as possible,
  • well documented, and
  • easy to reuse, facilitating further research.

The Artifact Evaluation Committee (AEC) takes the position of future researchers and then asks themselves how much this artifact would help them. Please see details of the outcomes of artifact evaluation (badges) for further guidance.

Review Process Overview

After submitting their artifact, there is a short window of time in which the reviewers will work through only the kick-the-tyres instructions and upload preliminary reviews indicating whether or not they were able to get those 30-or-so minutes of instructions working. At that point, the preliminary reviews will be shared with authors, who may make modest updates and corrections in order to resolve any issues the reviewers encountered.

We allow additional rounds of interaction with reviewers in case new issues are discovered after the kick-the-tyres window. This is in the hope that artifacts that would be just short of satisfying the Supports Claims requirements can have more opportunities to make small corrections. After the kick-the-tyres response, reviewers will be able to post author-visible comments with questions for authors at any time, and authors may respond to those reviewer questions or requests. Such interaction is on the reviewers’ initiative; authors are asked not to post unless in response to reviewer requests.

Badges (v2.0)

The AEC evaluates each artifact and awards one of the following badges: Available and Supports Claims.

Available

This badge is earned by artifacts that are made publicly available in an archival location. The badge can be earned by an artifact without the Supports Claims badge.

We expect artifacts that are submitted for artifact evaluation to be available publicly. Authors who want an artifact to be considered for the Supports Claims badge without making it available publicly must justify why the artifact cannot be made available. The reason will be listed on the article’s web page to inform readers of the unavailability.

The badge can also be awarded to artifacts that choose embargoed access (publicly available after a specific date) or restricted access (access only on request) if authors can justify why they limit access to the artifact.

There are two routes for the publication of the artifact:

  1. Authors upload a snapshot of the complete artifact to Zenodo, which provides a DOI specific to the artifact. Note that GitHub etc. are not adequate for receiving this badge (see FAQ), and that Zenodo provides a way to make subsequent revisions of the artifact available and linked from the specific version. Please upload your artifact to the Zenodo ‹Programming› community. Please send the resulting DOI to the AEC chairs.
  2. Authors can send the complete artifact to the AEC chairs to take care of publication via Zenodo.

Supports Claims

This is the basic “accepted” outcome for an artifact. An artifact can be awarded a Supports Claims badge if the artifact supports all claims made in the article, possibly excluding some claims if there are very good reasons they cannot be supported. In the ideal case, an artifact with this designation includes all relevant materials (code, dependencies, input data, benchmark scripts, questionnaires, raw data) and the artifact’s documentation is sufficient for reviewers to reproduce the exact results described in the article. The material should also include all required external resources. For instance, if the artifact is claimed to outperform a related system in some way (in time, accuracy, etc.) and that related system was used to generate new numbers for the article (e.g., an existing tool was run on new benchmarks not considered by the corresponding publication), artifacts should include a version of that related system and instructions for reproducing the numbers used for comparison as well. If the alternative tool crashes on a subset of the inputs, simply note this expected behavior.

Deviations from this ideal must be for good reason. A non-exclusive list of justifiable deviations includes:

  • Some materials are subject to licensing or intellectual property restrictions and cannot legally be shared with reviewers (e.g., licensed benchmark suites like SPEC, when a tool is applied to private, proprietary code, or the full corpus of a literature review). In such cases, all available materials should be included. If all materials from the article fall into this case, alternative data should be supplied: providing a tool with no meaningful inputs to evaluate on is not sufficient to justify claims that the artifact works.
  • Some of the results are performance data and therefore exact numbers depend on the particular hardware. In this case, the artifact should explain how to recognize when experiments on other hardware reproduce the high-level results (e.g., that a certain optimization exhibits a particular trend, or that comparing two tools one outperforms the other in a certain class of cases).
  • In some cases repeating the evaluation may take a long time. Reviewers may not reproduce full results in such cases.

In some cases, the artifact may require specialized hardware (e.g., a CPU with a particular new feature, a specific class of GPU, or a cluster of GPUs). For such cases, authors should contact the Artifact Evaluation Chairs as soon as possible after their articles’ acceptance notification to work out how to make these possible to evaluate. One possible outcome is that the authors of the artifact requiring specialized hardware pay for a cloud instance with the hardware, which reviewers could access remotely.

Acknowledgments

The description of this process is based on documents from similar ECOOP and SPLASH AE processes. Thanks for creating and sharing these documents go to Benjamin Greenman, Ana Milanova, Colin Gordon, and Anders Møller.