Write a Blog >>
MSR 2019
Sun 26 - Mon 27 May 2019 Montreal, QC, Canada
co-located with ICSE 2019
Mon 27 May 2019 12:10 - 12:25 at Centre-Ville - Session VIII: Software Quality (part 2) Chair(s): Yasutaka Kamei

Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset - which is to the best of our knowledge the first of its kind - containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.

Mon 27 May
Times are displayed in time zone: Eastern Time (US & Canada) change

11:55 - 12:30: Session VIII: Software Quality (part 2)MSR 2019 Paper Presentations / MSR 2019 Technical Papers / MSR 2019 Data Showcase at Centre-Ville
Chair(s): Yasutaka KameiKyushu University
11:55 - 12:10
A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks
MSR 2019 Technical Papers
12:10 - 12:25
Cross-language clone detection by learning over abstract syntax trees
MSR 2019 Technical Papers
Daniel PerezImperial College London, Shigeru ChibaUniversity of Tokyo, Japan
12:25 - 12:31
SeSaMe: A Data Set of Semantically Similar Java Methods
MSR 2019 Data Showcase
Marius Kamp, Patrick Kreutzer, Michael PhilippsenFriedrich-Alexander University Erlangen-Nürnberg (FAU)