SCOR: Source Code Retrieval With Semantics and Order
Word embeddings produced by the word2vec algorithm provide us with a strong mechanism to discover relationships between the words based on the degree to which they are contextually related to one another. In and of itself, algorithms like word2vec do not give us a mechanism to impose ordering constraints on the embedded word representations. Our main goal in this paper is to exploit the semantic word vectors obtained from word2vec in such a way that allows for the ordering constraints to be invoked on them when comparing a sequence of words in a query with a sequence of words in a file for source code retrieval. These ordering constraints employ the logic of Markov Random Fields (MRF), a framework used previously to enhance the precision of the source-code retrieval engines based on the Bag-of-Words (BoW) assumption. The work we present here demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRF to representations based on term and term-term frequencies. The performance improvement was 30% for the Java AspectJ repository using only the titles of the bug reports provided by iBUGS, and 6% for the case of the Eclipse repository using titles as well as descriptions of the bug reports provided by BUGLinks.
Sun 26 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 11:45 | Session I: Representations for Mining (Part 1)MSR 2019 Technical Papers / MSR 2019 Data Showcase at Place du Canada Chair(s): Chanchal K. Roy University of Saskatchewan | ||
11:00 15mFull-paper | SCOR: Source Code Retrieval With Semantics and Order MSR 2019 Technical Papers Pre-print Media Attached | ||
11:16 6mShort-paper | PathMiner : A Library for Mining of Path-Based Representations of Code MSR 2019 Technical Papers Vladimir Kovalenko TU Delft, Egor Bogomolov Higher School of Economics, JetBrains Research, Timofey Bryksin , Alberto Bacchelli University of Zurich DOI Pre-print Media Attached | ||
11:23 15mFull-paper | Import2vec: learning embeddings for software libraries MSR 2019 Technical Papers Pre-print | ||
11:39 6mTalk | Semantic Source Code Models Using Identifier Embeddings MSR 2019 Data Showcase Vasiliki Efstathiou Athens University of Economics and Business, Diomidis Spinellis Athens University of Economics and Business Pre-print |