Since 2013, the MSR conference has included a Data Showcase. The purpose of the Data Showcase is to provide a forum to share and discuss the important data sets that underpin the work of the Mining Software Repositories community.
The important dates for the Data Showcase are:
-
Abstracts Due: February 1, 2019
-
Papers Due: February 6, 2019
-
Author Notification: March 1, 2019
-
Camera Ready: March 15, 2019
Please see the Call for Data Showcase Papers for all details.
Sun 26 MayDisplayed time zone: Eastern Time (US & Canada) change
09:05 - 10:30 | |||
09:05 45mTalk | Keynote: We Won! Now What? MSR 2019 Keynote | ||
09:50 10m | Q&A for Keynote MSR 2019 Keynote | ||
10:00 30m | Discussion: Ethical MSR MSR 2019 Keynote |
11:00 - 11:45 | Session II: Defect Prediction and Testing (Part 1)MSR 2019 Technical Papers at Centre-Ville Chair(s): Patanamon Thongtanunam The University of Melbourne | ||
11:00 15mFull-paper | DeepJIT: An End-To-End Deep LearningFramework for Just-In-Time Defect Prediction MSR 2019 Technical Papers Thong Hoang Singapore Management University, Singapore, Hoa Khanh Dam University of Wollongong, Yasutaka Kamei Kyushu University, David Lo Singapore Management University, Naoyasu Ubayashi Kyushu University | ||
11:16 15mFull-paper | Lessons learned from using a deep tree-based model for software defect prediction in practice MSR 2019 Technical Papers Hoa Khanh Dam University of Wollongong, Trang Pham Deakin University, Shien Wee Ng University of Wollongong, Truyen Tran , John Grundy Monash University, Aditya Ghose , Taeksu Kim , Chul-Joo Kim | ||
11:32 6mShort-paper | Empirical study in using version histories for change risk classification MSR 2019 Technical Papers | ||
11:39 6mShort-paper | Snoring: a Noise in Defect Prediction Datasets MSR 2019 Technical Papers Aalok Ahluwalia , Davide Falessi California Polytechnic State University, Massimiliano Di Penta University of Sannio |
11:00 - 11:45 | Session I: Representations for Mining (Part 1)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada Chair(s): Chanchal K. Roy University of Saskatchewan | ||
11:00 15mFull-paper | SCOR: Source Code Retrieval With Semantics and Order MSR 2019 Technical Papers Pre-print Media Attached | ||
11:16 6mShort-paper | PathMiner : A Library for Mining of Path-Based Representations of Code MSR 2019 Technical Papers Vladimir Kovalenko TU Delft, Egor Bogomolov Higher School of Economics, JetBrains Research, Timofey Bryksin , Alberto Bacchelli University of Zurich DOI Pre-print Media Attached | ||
11:23 15mFull-paper | Import2vec: learning embeddings for software libraries MSR 2019 Technical Papers Pre-print | ||
11:39 6mTalk | Semantic Source Code Models Using Identifier Embeddings MSR 2019 Data Showcase Vasiliki Efstathiou Athens University of Economics and Business, Diomidis Spinellis Athens University of Economics and Business Pre-print |
11:55 - 12:30 | Session III: Representations for Mining (Part 2)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada Chair(s): Nicole Novielli University of Bari | ||
11:55 15mFull-paper | Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts MSR 2019 Technical Papers Pre-print | ||
12:10 6mTalk | Cleaning StackOverflow for Machine Translation MSR 2019 Data Showcase Musfiqur Rahman Concordia University, Montreal, Canada, Peter Rigby Concordia University, Montreal, Canada, Dharani Palani Concordia University, Tien N. Nguyen University of Texas at Dallas | ||
12:16 15mFull-paper | Predicting Good Configurations for GitHub and Stack Overflow Topic Models MSR 2019 Technical Papers Pre-print |
13:50 - 14:35 | Discussion: Data vs. Theory-driven ResearchMSR 2019 Paper Presentations at Place du Canada Chair(s): Michael W. Godfrey University of Waterloo, Canada, Andy Zaidman TU Delft | ||
14:45 - 15:30 | Session VI: Energy and EconomicsMSR 2019 Data Showcase / MSR 2019 Technical Papers at Centre-Ville Chair(s): Maleknaz Nayebi Polytechnique Montréal | ||
14:45 15mFull-paper | Recommending Energy-Efficient Java Collections MSR 2019 Technical Papers Wellington de Oliveira Júnior , Renato Santos , Fernando Castor Federal University of Pernambuco (UFPE), José Benito Fernandes De Araújo Neto , Gustavo Pinto UFPA Pre-print | ||
15:01 6mTalk | GreenHub Farmer: Real-world data for Android Energy Mining MSR 2019 Data Showcase Rui Pereira HASLab/INESC TEC & Universidade do Minho & Universidade da Beira Interior, Marco Couto HASLab/INESC TEC & Universidade do Minho, João Paulo Fernandes Release/LISP, CISUC, Bruno Cabral , Hugo Matalonga University of Minho, Simão Melo de Sousa , Fernando Castor Federal University of Pernambuco (UFPE) Pre-print | ||
15:08 6mTalk | GreenSource: a large-scale collection of Android code, tests and energy metrics MSR 2019 Data Showcase Rui Rua HASLab/INESC TEC & Universidade do Minho, Marco Couto HASLab/INESC TEC & Universidade do Minho, João Saraiva University of Minho, Portugal | ||
15:15 6mShort-paper | Striking Gold in Software Repositories? An Econometric Study of Cryptocurrencies on GitHub MSR 2019 Technical Papers Asher Trockman University of Evansville, Rijnard van Tonder Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print | ||
15:22 6mTalk | Panel Data of Cryptocurrency Development Activity on GitHub MSR 2019 Data Showcase Rijnard van Tonder Carnegie Mellon University, Asher Trockman University of Evansville, Claire Le Goues Carnegie Mellon University |
Mon 27 MayDisplayed time zone: Eastern Time (US & Canada) change
08:45 - 09:30 | Session II: Automatic SummarizationMSR 2019 Technical Papers at Centre-Ville Chair(s): Xin Xia Monash University | ||
08:45 15mFull-paper | Generating Commit Messages from Diffs using Pointer-generator Network MSR 2019 Technical Papers Qin Liu , Zihe Liu School of Software Engineering, Tongji University, Shanghai, China, Hongming Zhu , Hongfei Fan , Bowen Du , Yu Qian | ||
09:00 15mFull-paper | Automatically Generating Documentation for Lambda Expressions in Java MSR 2019 Technical Papers Anwar Alqaimi , Patanamon Thongtanunam The University of Melbourne, Christoph Treude The University of Adelaide Pre-print | ||
09:15 15mFull-paper | Extracting API Tips from Developer Question and Answer Websites MSR 2019 Technical Papers |
08:45 - 09:30 | Session I: APIs & Dependencies (Part 1)MSR 2019 Technical Papers at Place du Canada Chair(s): Philipp Leitner Chalmers University of Technology & University of Gothenburg | ||
08:45 15mFull-paper | Investigating Next-Steps in Static API-Misuse Detection MSR 2019 Technical Papers Sven Amann CQSE GmbH, Hoan Nguyen Iowa State University, Sarah Nadi University of Alberta, Tien N. Nguyen University of Texas at Dallas, Mira Mezini TU Darmstadt, Germany Pre-print | ||
09:00 15mFull-paper | Identifying Experts in Software Libraries and Frameworks among GitHub Users MSR 2019 Technical Papers João Eduardo Montandon Universidade Federal de Minas Gerais (UFMG), Luciana L. Silva , Marco Tulio Valente Federal University of Minas Gerais, Brazil Pre-print | ||
09:15 15mFull-paper | Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study MSR 2019 Technical Papers Simone Scalabrino University of Molise, Gabriele Bavota Università della Svizzera italiana (USI), Mario Linares-Vasquez Universidad de los Andes, Michele Lanza Universita della Svizzera italiana (USI), Rocco Oliveto University of Molise |
09:40 - 10:30 | Session IV: SecurityMSR 2019 Data Showcase / MSR 2019 Technical Papers at Centre-Ville Chair(s): Sarah Nadi University of Alberta | ||
09:40 15mFull-paper | Automated Software Vulnerability Assessment with Concept Drift MSR 2019 Technical Papers | ||
09:55 6mTalk | A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software MSR 2019 Data Showcase | ||
10:01 15mFull-paper | Negative Results on Mining Crypto-API Usage Rules in Android Apps MSR 2019 Technical Papers Jun Gao University of Luxembourg, SnT, Pingfan Kong Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Li Li Monash University, Australia, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg, SnT | ||
10:16 6mTalk | A Dataset of Parametric Cryptographic Misuses MSR 2019 Data Showcase Anna-Katharina Wickert TU Darmstadt, Germany, Michael Reif TU Darmstadt, Germany, Michael Eichberg TU Darmstadt, Germany, Anam Dodhy , Mira Mezini TU Darmstadt, Germany Link to publication DOI Pre-print Media Attached | ||
10:22 6mTalk | RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata MSR 2019 Data Showcase Haoyu Wang Beijing University of Posts and Telecommunications, China, Junjun Si , Hao Li , Yao Guo Peking University |
09:40 - 10:30 | Session III: APIs & Dependencies (Part 2)MSR 2019 Data Showcase / MSR 2019 Technical Papers at Place du Canada Chair(s): Georgios Gousios TU Delft | ||
09:40 6mTalk | The Maven Dependency Graph: a Temporal Graph-based Representation of Maven Centra MSR 2019 Data Showcase Amine Benelallam , Nicolas Harrand , César Soto-Valero KTH Royal Institute of Technology, Benoit Baudry KTH Royal Institute of Technology, Sweden, Olivier Barais Pre-print | ||
09:46 15mFull-paper | The Emergence of Software Diversity in Maven Central MSR 2019 Technical Papers César Soto-Valero KTH Royal Institute of Technology, Amine Benelallam , Nicolas Harrand , Olivier Barais , Benoit Baudry KTH Royal Institute of Technology, Sweden Pre-print | ||
10:01 15mFull-paper | Dependency Versioning in the Wild MSR 2019 Technical Papers Jens Dietrich Victoria University of Wellington, David J. Pearce Victoria University of Wellington, New Zealand, Jacob Stringer , Amjed Tahir Massey University, Kelly Blincoe University of Auckland Pre-print | ||
10:16 15mFull-paper | Splitting APIs: An Exploratory Study of Software Unbundling MSR 2019 Technical Papers |
11:00 - 11:45 | Session VI: Software Quality (part 1)MSR 2019 Technical Papers at Centre-Ville Chair(s): Fabio Palomba University of Zurich | ||
11:00 15mFull-paper | The Rise of Android Code Smells: Who Is to Blame? MSR 2019 Technical Papers Sarra Habchi University of Lille, Romain Rouvoy University Lille 1 and INRIA, Naouel Moha University of Montreal | ||
11:15 15mFull-paper | Assessing Diffusion and Perception of Test Smells in Scala Projects MSR 2019 Technical Papers Jonas De Bleser Sofware Languages Lab, Vrije Universiteit Brussel, Dario Di Nucci Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel Pre-print | ||
11:30 15mFull-paper | style-analyzer: fixing code style inconsistencies with interpretable unsupervised algorithms MSR 2019 Technical Papers Vadim Markovtsev source{d}, Hugo Mougard source{d}, Waren Long source{d}, Egor Bulychev , Konstantin Slavnov Pre-print |
11:00 - 11:45 | Session V: Collaboration & Communication (Part 1)MSR 2019 Technical Papers at Place du Canada Chair(s): Peter Rigby Concordia University, Montreal, Canada | ||
11:00 15mFull-paper | An Empirical Study of Multiple Names and Email Addresses in OSS Version Control Repositories MSR 2019 Technical Papers Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, China, Jun Wei Institute of Software, Chinese Academy of Sciences, China | ||
11:15 15mFull-paper | Characterizing the Roles of Contributors in Open-source Scientific Software Projects MSR 2019 Technical Papers Reed Milewicz Sandia National Laboratories, Gustavo Pinto UFPA, Paige Rodeghero University of Notre Dame Pre-print | ||
11:30 15mFull-paper | git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories MSR 2019 Technical Papers DOI Pre-print |
11:55 - 12:30 | Session VIII: Software Quality (part 2)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Centre-Ville Chair(s): Yasutaka Kamei Kyushu University | ||
11:55 15mFull-paper | A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks MSR 2019 Technical Papers João Felipe Pimentel , Leonardo Murta Universidade Federal Fluminense (UFF), Vanessa Braganholo , Juliana Freire Pre-print | ||
12:10 15mFull-paper | Cross-language clone detection by learning over abstract syntax trees MSR 2019 Technical Papers Pre-print | ||
12:25 6mTalk | SeSaMe: A Data Set of Semantically Similar Java Methods MSR 2019 Data Showcase Marius Kamp , Patrick Kreutzer , Michael Philippsen Friedrich-Alexander University Erlangen-Nürnberg (FAU) |
11:55 - 12:30 | Session VII: Collaboration & Communication (Part 2)MSR 2019 Technical Papers at Place du Canada Chair(s): Kelly Blincoe University of Auckland | ||
11:55 15mFull-paper | Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study MSR 2019 Technical Papers Saikat Mondal University of Saskatchewan, Masud Rahman University of Saskatchewan , Chanchal K. Roy University of Saskatchewan Pre-print | ||
12:10 15mFull-paper | Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools MSR 2019 Technical Papers Preetha Chatterjee University of Delaware, USA, Kostadin Damevski Virginia Commonwealth University, Lori Pollock University of Delaware, USA, Vinay Augustine , Nicholas A. Kraft ABB Corporate Research Pre-print | ||
12:25 6mShort-paper | Impacts of Daylight Saving Time on Software Development MSR 2019 Technical Papers Junichi Hayashi Osaka University, Yoshiki Higo Osaka University, Shinsuke Matsumoto Osaka University, Shinji Kusumoto Osaka University Pre-print |
13:50 - 14:35 | Discussion: SE for AI for SEMSR 2019 Paper Presentations at Place du Canada Chair(s): Neil Ernst University of Victoria, Tim Menzies North Carolina State University | ||
Accepted Papers
Call for Data Showcase Papers
Data Showcase papers should describe data sets that are curated by their authors and made available to use by others. Ideally, these data sets should be of value to others in the community, should be preprocessed or filtered in some way, and should provide an easy-to-understand schema. Data showcase papers are expected to include:
- a description of the data source,
- a description of the methodology used to gather it (provenance; the tool used to create/generate/gather the data, if such a tool has been used, see below),
- a description of the storage mechanism, including a schema if applicable,
- if the data has been used by authors or others, a description of how this was done including references to previously published papers,
- a description of the originality of the data set (that is, even if the dataset has been used in a published article, its complete description must be unpublished),
- ideas for future research questions that could be answered using the data set,
- ideas for further improvements that could be made to the data set, and
- any limitations and/or challenges in creating or using the data set.
The data set should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper. At latest upon publication of the paper the authors should archive data on preserved archives that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com or institutional preserved archives. In this way the data will become citable; DOI-based citation of the dataset should be included in the camera-ready version. If the size of the dataset exceeds the limits imposed by the preserved archives (e.g., 50GB for zenodo), the authors can store their data on Archive.org and refer to the URL in their camera-ready version.
Data showcase papers are not:
- empirical studies
- tool demos
- data sets that are
- based on poorly explained or untrustworthy heuristics for data collection, or
- result of trivial application of generic tools.
We expect all datasets to be accompanied by the source code of the tool that was used to create them, along with clear documentation on how to run the tool in order to recreate the datasets. The tool should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. GItHub provides an easy way to make source code citable. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.
Submission
Submit your data paper (maximum 4 pages, plus 1 additional page of references) to EasyChair on or before February 6th, 2019 (abstract due February 1st).
Submitted papers will undergo single-blind peer review. We opt for single-blind peer review (as opposed to the double-blind peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such reference is likely to disclose the authors’ identity.
To make research data sets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e. data should be: Findable, Accessible, Interoperable, and Reusable.
The submission must conform to the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTEX users must use \documentclass[10pt,conference]{IEEEtran}
without including the compsoc
or compsocconf
option).
Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. The submission must also comply with the ACM plagiarism policy and procedures. The submission must also comply with the IEEE Policy on Authorship. To submit please use the EasyChair link.
Upon notification of acceptance, all authors of accepted papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR conference. All accepted contributions will be published in the conference electronic proceedings.
The official publication date is the date the proceedings are made available in the ACM or IEEE Digital Libraries. This date may be up to two weeks prior to the first day of ICSE 2019. The official publication date affects the deadline for any patent filings related to the published work. Purchases of additional pages in the proceedings is not allowed.
A selection of the best papers will be invited to EMSE Special Issue.
Important Dates
Abstracts Due: February 1, 2019
Papers Due: February 6, 2019
Author Notification: March 1, 2019
Camera Ready: March 15, 2019
Organization
Nicole Novielli, University of Bari, Italy
Alexander Serebrenik, Eindhoven University of Technology, The Netherlands