The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.
The important dates for the Mining Challenge are:
-
Abstracts due: February 1, 2019 (AOE)
-
Papers due: February 6, 2019 (AOE)
-
Author notification: March 1, 2019 (AOE)
-
Camera-ready: March 15, 2019 (AOE)
Please see the Call for Mining Challenge Papers for all details.
Sun 26 May Times are displayed in time zone: Eastern Time (US & Canada) change
09:05 - 10:30 | |||
09:05 45mTalk | Keynote: We Won! Now What? MSR 2019 Keynote | ||
09:50 10m | Q&A for Keynote MSR 2019 Keynote | ||
10:00 30m | Discussion: Ethical MSR MSR 2019 Keynote |
11:00 - 11:45 | Session II: Defect Prediction and Testing (Part 1)MSR 2019 Technical Papers at Centre-Ville Chair(s): Patanamon ThongtanunamThe University of Melbourne | ||
11:00 15mFull-paper | DeepJIT: An End-To-End Deep LearningFramework for Just-In-Time Defect Prediction MSR 2019 Technical Papers Thong HoangSingapore Management University, Singapore, Hoa Khanh DamUniversity of Wollongong, Yasutaka KameiKyushu University, David LoSingapore Management University, Naoyasu UbayashiKyushu University | ||
11:16 15mFull-paper | Lessons learned from using a deep tree-based model for software defect prediction in practice MSR 2019 Technical Papers Hoa Khanh DamUniversity of Wollongong, Trang PhamDeakin University, Shien Wee NgUniversity of Wollongong, Truyen Tran, John GrundyMonash University, Aditya Ghose, Taeksu Kim, Chul-Joo Kim | ||
11:32 6mShort-paper | Empirical study in using version histories for change risk classification MSR 2019 Technical Papers | ||
11:39 6mShort-paper | Snoring: a Noise in Defect Prediction Datasets MSR 2019 Technical Papers Aalok Ahluwalia, Davide FalessiCalifornia Polytechnic State University, Massimiliano Di PentaUniversity of Sannio |
11:00 - 11:45 | Session I: Representations for Mining (Part 1)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada Chair(s): Chanchal K. RoyUniversity of Saskatchewan | ||
11:00 15mFull-paper | SCOR: Source Code Retrieval With Semantics and Order MSR 2019 Technical Papers Pre-print Media Attached | ||
11:16 6mShort-paper | PathMiner : A Library for Mining of Path-Based Representations of Code MSR 2019 Technical Papers Vladimir KovalenkoTU Delft, Egor BogomolovHigher School of Economics, JetBrains Research, Timofey Bryksin, Alberto BacchelliUniversity of Zurich DOI Pre-print Media Attached | ||
11:23 15mFull-paper | Import2vec: learning embeddings for software libraries MSR 2019 Technical Papers Pre-print | ||
11:39 6mTalk | Semantic Source Code Models Using Identifier Embeddings MSR 2019 Data Showcase Vasiliki EfstathiouAthens University of Economics and Business, Diomidis SpinellisAthens University of Economics and Business Pre-print |
11:55 - 12:30 | Session III: Representations for Mining (Part 2)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada Chair(s): Nicole NovielliUniversity of Bari | ||
11:55 15mFull-paper | Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts MSR 2019 Technical Papers Pre-print | ||
12:10 6mTalk | Cleaning StackOverflow for Machine Translation MSR 2019 Data Showcase Musfiqur RahmanConcordia University, Montreal, Canada, Peter RigbyConcordia University, Montreal, Canada, Dharani PalaniConcordia University, Tien N. NguyenUniversity of Texas at Dallas | ||
12:16 15mFull-paper | Predicting Good Configurations for GitHub and Stack Overflow Topic Models MSR 2019 Technical Papers Pre-print |
13:50 - 14:35 | Discussion: Data vs. Theory-driven ResearchMSR 2019 Paper Presentations at Place du Canada Chair(s): Andy ZaidmanTU Delft, Michael W. GodfreyUniversity of Waterloo, Canada | ||
14:45 - 15:30 | Session VI: Energy and EconomicsMSR 2019 Data Showcase / MSR 2019 Technical Papers at Centre-Ville Chair(s): Maleknaz NayebiPolytechnique Montréal | ||
14:45 15mFull-paper | Recommending Energy-Efficient Java Collections MSR 2019 Technical Papers Wellington de Oliveira Júnior, Renato Santos, Fernando CastorFederal University of Pernambuco (UFPE), José Benito Fernandes De Araújo Neto, Gustavo PintoUFPA Pre-print | ||
15:01 6mTalk | GreenHub Farmer: Real-world data for Android Energy Mining MSR 2019 Data Showcase Rui PereiraHASLab/INESC TEC & Universidade do Minho & Universidade da Beira Interior, Marco CoutoHASLab/INESC TEC & Universidade do Minho, João Paulo FernandesRelease/LISP, CISUC, Bruno Cabral, Hugo MatalongaUniversity of Minho, Simão Melo de Sousa, Fernando CastorFederal University of Pernambuco (UFPE) Pre-print | ||
15:08 6mTalk | GreenSource: a large-scale collection of Android code, tests and energy metrics MSR 2019 Data Showcase Rui RuaHASLab/INESC TEC & Universidade do Minho, Marco CoutoHASLab/INESC TEC & Universidade do Minho, João SaraivaUniversity of Minho, Portugal | ||
15:15 6mShort-paper | Striking Gold in Software Repositories? An Econometric Study of Cryptocurrencies on GitHub MSR 2019 Technical Papers Asher TrockmanUniversity of Evansville, Rijnard van TonderCarnegie Mellon University, Bogdan VasilescuCarnegie Mellon University Pre-print | ||
15:22 6mTalk | Panel Data of Cryptocurrency Development Activity on GitHub MSR 2019 Data Showcase Rijnard van TonderCarnegie Mellon University, Asher TrockmanUniversity of Evansville, Claire Le GouesCarnegie Mellon University |
Mon 27 May Times are displayed in time zone: Eastern Time (US & Canada) change
08:45 - 09:30 | Session II: Automatic SummarizationMSR 2019 Technical Papers at Centre-Ville Chair(s): Xin XiaMonash University | ||
08:45 15mFull-paper | Generating Commit Messages from Diffs using Pointer-generator Network MSR 2019 Technical Papers Qin Liu, Zihe LiuSchool of Software Engineering, Tongji University, Shanghai, China, Hongming Zhu, Hongfei Fan, Bowen Du, Yu Qian | ||
09:00 15mFull-paper | Automatically Generating Documentation for Lambda Expressions in Java MSR 2019 Technical Papers Anwar Alqaimi, Patanamon ThongtanunamThe University of Melbourne, Christoph TreudeThe University of Adelaide Pre-print | ||
09:15 15mFull-paper | Extracting API Tips from Developer Question and Answer Websites MSR 2019 Technical Papers |
08:45 - 09:30 | Session I: APIs & Dependencies (Part 1)MSR 2019 Technical Papers at Place du Canada Chair(s): Philipp LeitnerChalmers University of Technology & University of Gothenburg | ||
08:45 15mFull-paper | Investigating Next-Steps in Static API-Misuse Detection MSR 2019 Technical Papers Sven AmannCQSE GmbH, Hoan NguyenIowa State University, Sarah NadiUniversity of Alberta, Tien N. NguyenUniversity of Texas at Dallas, Mira MeziniTU Darmstadt, Germany Pre-print | ||
09:00 15mFull-paper | Identifying Experts in Software Libraries and Frameworks among GitHub Users MSR 2019 Technical Papers João Eduardo MontandonUniversidade Federal de Minas Gerais (UFMG), Luciana L. Silva, Marco Tulio ValenteFederal University of Minas Gerais, Brazil Pre-print | ||
09:15 15mFull-paper | Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study MSR 2019 Technical Papers Simone ScalabrinoUniversity of Molise, Gabriele BavotaUniversità della Svizzera italiana (USI), Mario Linares-VasquezUniversidad de los Andes, Michele LanzaUniversita della Svizzera italiana (USI), Rocco OlivetoUniversity of Molise |
09:40 - 10:30 | Session IV: SecurityMSR 2019 Data Showcase / MSR 2019 Technical Papers at Centre-Ville Chair(s): Sarah NadiUniversity of Alberta | ||
09:40 15mFull-paper | Automated Software Vulnerability Assessment with Concept Drift MSR 2019 Technical Papers | ||
09:55 6mTalk | A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software MSR 2019 Data Showcase | ||
10:01 15mFull-paper | Negative Results on Mining Crypto-API Usage Rules in Android Apps MSR 2019 Technical Papers Jun GaoUniversity of Luxembourg, SnT, Pingfan KongInterdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Li LiMonash University, Australia, Tegawendé F. BissyandéSnT, University of Luxembourg, Jacques KleinUniversity of Luxembourg, SnT | ||
10:16 6mTalk | A Dataset of Parametric Cryptographic Misuses MSR 2019 Data Showcase Anna-Katharina WickertTU Darmstadt, Germany, Michael ReifTU Darmstadt, Germany, Michael EichbergTU Darmstadt, Germany, Anam Dodhy, Mira MeziniTU Darmstadt, Germany Pre-print Media Attached | ||
10:22 6mTalk | RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata MSR 2019 Data Showcase Haoyu WangBeijing University of Posts and Telecommunications, China, Junjun Si, Hao Li , Yao GuoPeking University |
11:00 - 11:45 | Session VI: Software Quality (part 1)MSR 2019 Technical Papers at Centre-Ville Chair(s): Fabio PalombaUniversity of Zurich | ||
11:00 15mFull-paper | The Rise of Android Code Smells: Who Is to Blame? MSR 2019 Technical Papers Sarra HabchiUniversity of Lille, Romain RouvoyUniversity Lille 1 and INRIA, Naouel MohaUniversity of Montreal | ||
11:15 15mFull-paper | Assessing Diffusion and Perception of Test Smells in Scala Projects MSR 2019 Technical Papers Jonas De BleserSofware Languages Lab, Vrije Universiteit Brussel, Dario Di NucciVrije Universiteit Brussel, Coen De RooverVrije Universiteit Brussel Pre-print | ||
11:30 15mFull-paper | style-analyzer: fixing code style inconsistencies with interpretable unsupervised algorithms MSR 2019 Technical Papers Vadim Markovtsevsource{d}, Hugo Mougardsource{d}, Waren Longsource{d}, Egor Bulychev, Konstantin Slavnov Pre-print |
11:00 - 11:45 | Session V: Collaboration & Communication (Part 1)MSR 2019 Technical Papers at Place du Canada Chair(s): Peter RigbyConcordia University, Montreal, Canada | ||
11:00 15mFull-paper | An Empirical Study of Multiple Names and Email Addresses in OSS Version Control Repositories MSR 2019 Technical Papers Jiaxin ZhuInstitute of Software at Chinese Academy of Sciences, China, Jun WeiInstitute of Software, Chinese Academy of Sciences, China | ||
11:15 15mFull-paper | Characterizing the Roles of Contributors in Open-source Scientific Software Projects MSR 2019 Technical Papers Reed MilewiczSandia National Laboratories, Gustavo PintoUFPA, Paige Rodeghero University of Notre Dame Pre-print | ||
11:30 15mFull-paper | git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories MSR 2019 Technical Papers DOI Pre-print |
11:55 - 12:30 | Session VIII: Software Quality (part 2)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Centre-Ville Chair(s): Yasutaka KameiKyushu University | ||
11:55 15mFull-paper | A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks MSR 2019 Technical Papers João Felipe Pimentel, Leonardo MurtaUniversidade Federal Fluminense (UFF), Vanessa Braganholo, Juliana Freire Pre-print | ||
12:10 15mFull-paper | Cross-language clone detection by learning over abstract syntax trees MSR 2019 Technical Papers Pre-print | ||
12:25 6mTalk | SeSaMe: A Data Set of Semantically Similar Java Methods MSR 2019 Data Showcase Marius Kamp, Patrick Kreutzer, Michael PhilippsenFriedrich-Alexander University Erlangen-Nürnberg (FAU) |
11:55 - 12:30 | Session VII: Collaboration & Communication (Part 2)MSR 2019 Technical Papers at Place du Canada Chair(s): Kelly BlincoeUniversity of Auckland | ||
11:55 15mFull-paper | Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study MSR 2019 Technical Papers Saikat MondalUniversity of Saskatchewan, Masud RahmanUniversity of Saskatchewan , Chanchal K. RoyUniversity of Saskatchewan Pre-print | ||
12:10 15mFull-paper | Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools MSR 2019 Technical Papers Preetha ChatterjeeUniversity of Delaware, USA, Kostadin DamevskiVirginia Commonwealth University, Lori PollockUniversity of Delaware, USA, Vinay Augustine, Nicholas A. KraftABB Corporate Research Pre-print | ||
12:25 6mShort-paper | Impacts of Daylight Saving Time on Software Development MSR 2019 Technical Papers Junichi HayashiOsaka University, Yoshiki HigoOsaka University, Shinsuke MatsumotoOsaka University, Shinji KusumotoOsaka University Pre-print |
13:50 - 14:35 | Discussion: SE for AI for SEMSR 2019 Paper Presentations at Place du Canada Chair(s): Neil ErnstUniversity of Victoria, Tim MenziesNorth Carolina State University | ||
Call for Mining Challenge Papers
This year, the challenge is about mining SOTorrent, a dataset providing the version history of Stack Overflow posts at the level of whole posts and individual text and code blocks. Moreover, the dataset connects Stack Overflow posts to other platforms by aggregating URLs from text blocks and comments, and by collecting references from GitHub files to Stack Overflow posts. Analyses can be based on SOTorrent alone or expanded to also include data from other resources such as GHTorrent. The overall goal is to study the origin, evolution, and usage of Stack Overflow code snippets. Questions that are, to the best of our knowledge, not sufficiently answered yet include:
- How are code snippets on Stack Overflow maintained?
- How many clones of code snippets exist inside Stack Overflow?
- How can we detect buggy versions of Stack Overflow code snippets and find them in GitHub projects?
- How frequently are code snippets copied from external sources into Stack Overflow and then co-evolve there?
- How do snippets copied from Stack Overflow to GitHub co-evolve?
- Does the evolution of Stack Overflow code snippets follow patterns?
- Do these patterns differ between programming languages?
- Are the licenses of external sources compatible with Stack Overflow’s license (CC BY-SA 3.0)?
- How many code blocks on Stack Overflow do not contain source code (and are only used for markup)?
- Can we reliably predict bug-fixing edits to code on Stack Overflow?
- Can we reliably predict popularity of Stack Overflow code snippets on GitHub?
These are just some of the questions that could be answered using SOTorrent. We encourage challenge participants to adapt the above questions or formulate their own research questions about the origin, evolution, and usage of content on Stack Overflow.
How to Participate in the Challenge
First, familiarize yourself with the SOTorrent dataset:
- Read our MSR 2018 paper about SOTorrent and the preprint of our mining challenge proposal, which contains exemplary queries.
- Study the project page of SOTorrent, which includes the most recent database layout and links to the online and download versions of the dataset.
- Create a new issue here in case you have problems with the dataset or want to suggest ideas for improvements.
Then, use the dataset to answer your research questions, report your findings in a four-page challenge paper (see information below), submit your abstract before February 1, 2019, and your final paper before February 6, 2019. If your paper is accepted, present your results at MSR 2019 in Montreal, Canada!
Submission
A challenge paper should describe the results of your work by providing an introduction to the problem you address and why it is worth studying, the version of the dataset you used, the approach and tools you used, your results and their implications, and conclusions. Make sure your report highlights the contributions and the importance of your work. See also our open science policy regarding the publication of software and additional data you used for the challenge.
Challenge papers must not exceed 4 pages plus 1 additional page only with references and must conform to the MSR 2019 format and submission guidelines. Each submission will be reviewed by at least three members of the program committee. Submissions should follow the IEEE Conference Proceedings Formatting Guidelines, with title in 24pt font and full text in 10pt type. LaTEX users must use \documentclass[10pt,conference]{IEEEtran}
without including the compsoc
or compsocconf
option.
IMPORTANT: The mining challenge track of MSR 2019 follows the double-blind submission model. Submissions should not reveal the identity of the authors in any way. This means that authors should:
- leave out author names and affiliations from the body and metadata of the submitted pdf
- ensure that any citations to related work by themselves are written in the third person, for example “the prior work of XYZ” as opposed to “our prior work [2]”
- not refer to their personal, lab or university website; similarly, care should be taken with personal accounts on github, bitbucket, Google Drive, etc.
- not upload unblinded versions of their paper on archival websites during bidding/reviewing, however uploading unblinded versions prior to submission is allowed and sometimes unavoidable (e.g., thesis)
Authors having further questions on double blind reviewing are encouraged to contact the Mining Challenge Chairs via email.
Papers must be submitted electronically through EasyChair, should not have been published elsewhere, and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policy and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship.
Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each accepted paper is expected to register and present the results at MSR 2019 in Montreal, Canada. All accepted contributions will be published in the electronic conference proceedings.
The official publication date is the date the proceedings are made available in the ACM or IEEE Digital Libraries. This date may be up to two weeks prior to the first day of ICSE 2019. The official publication date affects the deadline for any patent filings related to the published work. Purchases of additional pages in the proceedings is not allowed.
If you use the SOTorrent dataset, please cite our challenge proposal:
@inproceedings{msr2019challenge,
title={SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets},
author={Baltes, Sebastian and Treude, Christoph and Diehl, Stephan},
year={2019},
booktitle={Proceedings of the 16th International Conference on Mining Software Repositories (MSR 2019)},
preprint={http://empirical-software.engineering/assets/pdf/msr19-sotorrent.pdf}
}
Important Dates
Abstracts due: February 1, 2019 (AOE)
Papers due: February 6, 2019 (AOE)
Author notification: March 1, 2019 (AOE)
Camera-ready: March 15, 2019 (AOE)
Open Science Policy
Openness in science is key to fostering progress via transparency, reproducibility and replicability. Our steering principle is that all research output should be accessible to the public and that empirical studies should be reproducible. In particular, we actively support the adoption of open data and open source principles. To increase reproducibility and replicability, we encourage all contributing authors to disclose:
- the source code of the software they used to retrieve and analyze the data
- the (anonymized and curated) empirical data they retrieved in addition to the SOTorrent dataset
- a document with instructions for other researchers describing how to reproduce or replicate the results
Already upon submission, authors can privately share their anonymized data and software on preserved archives such as Zenodo or Figshare (tutorial available here). Zenodo accepts up to 50GB per dataset (more upon request). There is no need to use Dropbox or Google Drive. After acceptance, data and software should be made public so that they receive a DOI and become citable. Zenodo and Figshare accounts can easily be linked with GitHub repositories to automatically archive software releases. In the unlikely case that authors need to upload terabytes of data, Archive.org may be used.
We encourage authors to self-archive pre- and postprints of their papers in open, preserved repositories such as arXiv.org. This is legal and allowed by all major publishers including ACM and IEEE and it lets anybody in the world reach your paper. Note that you are usually not allowed to self-archive the PDF of the published article (that is, the publisher proof or the Digital Library version).
Please note that the success of the open science initiative depends on the willingness (and possibilities) of authors to disclose their data and that all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. We encourage authors who cannot disclose industrial or otherwise non-public data, for instance due to non-disclosure agreements, to provide an explicit (short) statement in the paper.
Best Mining Challenge Paper Award
As mentioned above, all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. However, only accepted papers for which code and data are available on preserved archives, as described in the open science policy, will be considered by the program committee for the best mining challenge paper award.
Best Student Presentation Award
Like in the previous years, there will be a public voting during the conference to select the best mining challenge presentation. This award often goes to authors of compelling work who present an engaging story to the audience. To increase student involvement, starting with MSR 2019, only students can compete for this award.
Organization
Sebastian Baltes, University of Trier, Germany
Christoph Treude, The University of Adelaide, Australia
Stephan Diehl, University of Trier, Germany