Cross-language clone detection by learning over abstract syntax trees (MSR 2019 - Technical Papers)

Who

Daniel Perez, Shigeru Chiba

Track

MSR 2019 MSR Technical Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 27 May 2019 12:10 - 12:25 at Centre-Ville - Session VIII: Software Quality (part 2) Chair(s): Yasutaka Kamei

Abstract

Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset - which is to the best of our knowledge the first of its kind - containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.

Link to Preprint

https://static.perez.sh/research/2019/cross-language-clone-detection/clone-detection-msr19.pdf

Daniel Perez

Imperial College London

United Kingdom

Shigeru Chiba

University of Tokyo, Japan

Japan