Write a Blog >>
MSR 2019
Sun 26 - Mon 27 May 2019 Montreal, QC, Canada
co-located with ICSE 2019

Since 2013, the MSR conference has included a Data Showcase. The purpose of the Data Showcase is to provide a forum to share and discuss the important data sets that underpin the work of the Mining Software Repositories community.

The important dates for the Data Showcase are:

  • Abstracts Due: February 1, 2019

  • Papers Due: February 6, 2019

  • Author Notification: March 1, 2019

  • Camera Ready: March 15, 2019

Please see the Call for Data Showcase Papers for all details.

Call for Data Showcase Papers

Data Showcase papers should describe data sets that are curated by their authors and made available to use by others. Ideally, these data sets should be of value to others in the community, should be preprocessed or filtered in some way, and should provide an easy-to-understand schema. Data showcase papers are expected to include:

  • a description of the data source,
  • a description of the methodology used to gather it (provenance; the tool used to create/generate/gather the data, if such a tool has been used, see below),
  • a description of the storage mechanism, including a schema if applicable,
  • if the data has been used by authors or others, a description of how this was done including references to previously published papers,
  • a description of the originality of the data set (that is, even if the dataset has been used in a published article, its complete description must be unpublished),
  • ideas for future research questions that could be answered using the data set,
  • ideas for further improvements that could be made to the data set, and
  • any limitations and/or challenges in creating or using the data set.

The data set should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper. At latest upon publication of the paper the authors should archive data on preserved archives that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com or institutional preserved archives. In this way the data will become citable; DOI-based citation of the dataset should be included in the camera-ready version. If the size of the dataset exceeds the limits imposed by the preserved archives (e.g., 50GB for zenodo), the authors can store their data on Archive.org and refer to the URL in their camera-ready version.

Data showcase papers are not:

  • empirical studies
  • tool demos
  • data sets that are
    • based on poorly explained or untrustworthy heuristics for data collection, or
    • result of trivial application of generic tools.

We expect all datasets to be accompanied by the source code of the tool that was used to create them, along with clear documentation on how to run the tool in order to recreate the datasets. The tool should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. GItHub provides an easy way to make source code citable. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.


Submit your data paper (maximum 4 pages, plus 1 additional page of references) to EasyChair on or before February 6th, 2019 (abstract due February 1st).

Submitted papers will undergo single-blind peer review. We opt for single-blind peer review (as opposed to the double-blind peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such reference is likely to disclose the authors’ identity.

To make research data sets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e. data should be: Findable, Accessible, Interoperable, and Reusable.

The submission must conform to the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTEX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf option).

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. The submission must also comply with the ACM plagiarism policy and procedures. The submission must also comply with the IEEE Policy on Authorship. To submit please use the EasyChair link.

Upon notification of acceptance, all authors of accepted papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR conference. All accepted contributions will be published in the conference electronic proceedings.

The official publication date is the date the proceedings are made available in the ACM or IEEE Digital Libraries. This date may be up to two weeks prior to the first day of ICSE 2019. The official publication date affects the deadline for any patent filings related to the published work. Purchases of additional pages in the proceedings is not allowed.

A selection of the best papers will be invited to EMSE Special Issue.

Important Dates

Abstracts Due: February 1, 2019

Papers Due: February 6, 2019

Author Notification: March 1, 2019

Camera Ready: March 15, 2019


Nicole Novielli, University of Bari, Italy

Alexander Serebrenik, Eindhoven University of Technology, The Netherlands