Write a Blog >>
MSR 2019
Sun 26 - Mon 27 May 2019 Montreal, QC, Canada
co-located with ICSE 2019

Word embeddings produced by the word2vec algorithm provide us with a strong mechanism to discover relationships between the words based on the degree to which they are contextually related to one another. In and of itself, algorithms like word2vec do not give us a mechanism to impose ordering constraints on the embedded word representations. Our main goal in this paper is to exploit the semantic word vectors obtained from word2vec in such a way that allows for the ordering constraints to be invoked on them when comparing a sequence of words in a query with a sequence of words in a file for source code retrieval. These ordering constraints employ the logic of Markov Random Fields (MRF), a framework used previously to enhance the precision of the source-code retrieval engines based on the Bag-of-Words (BoW) assumption. The work we present here demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRF to representations based on term and term-term frequencies. The performance improvement was 30% for the Java AspectJ repository using only the titles of the bug reports provided by iBUGS, and 6% for the case of the Eclipse repository using titles as well as descriptions of the bug reports provided by BUGLinks.

Sun 26 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 11:45
Session I: Representations for Mining (Part 1)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada
Chair(s): Chanchal K. Roy University of Saskatchewan
11:00
15m
Full-paper
SCOR: Source Code Retrieval With Semantics and Order
MSR 2019 Technical Papers
Pre-print Media Attached
11:16
6m
Short-paper
PathMiner : A Library for Mining of Path-Based Representations of Code
MSR 2019 Technical Papers
Vladimir Kovalenko TU Delft, Egor Bogomolov Higher School of Economics, JetBrains Research, Timofey Bryksin , Alberto Bacchelli University of Zurich
DOI Pre-print Media Attached
11:23
15m
Full-paper
Import2vec: learning embeddings for software libraries
MSR 2019 Technical Papers
Bart Theeten Nokia Bell Labs, Belgium, Frederik Vandeputte , Tom Van Cutsem Nokia Bell Labs
Pre-print
11:39
6m
Talk
Semantic Source Code Models Using Identifier Embeddings
MSR 2019 Data Showcase
Vasiliki Efstathiou Athens University of Economics and Business, Diomidis Spinellis Athens University of Economics and Business
Pre-print