ml4code
diff --git a/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions b/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions
@@ -303,6 +303,7 @@
 {"key": "panthaplackel2020deep", "year": "2020", "title":"Deep Just-In-Time Inconsistency Detection Between Comments and Source Code", "abstract": "<p>Natural language comments convey key aspects of source code such as implementation, usage, and pre- and post-conditions. Failure to update comments accordingly when the corresponding code is modified introduces inconsistencies, which is known to lead to confusion and software bugs. In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code, in order to catch potential inconsistencies just-in-time, i.e., before they are committed to a version control system. To achieve this, we develop a deep-learning approach that learns to correlate a comment with code changes. By evaluating on a large corpus of comment/code pairs spanning various comment types, we show that our model outperforms multiple baselines by significant margins. For extrinsic evaluation, we show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system which can both detect and resolve inconsistent comments based on code changes.</p>\n", "tags": ["edit","bimodal","documentation"] },
 {"key": "panthaplackel2020learning", "year": "2020", "title":"Learning to Update Natural Language Comments Based on Code Changes", "abstract": "<p>We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a dataset that we collected from commit histories of open-source software projects, with each example consisting of a concurrent update to a method and its corresponding comment. We compare our approach against multiple baselines using both automatic metrics and human evaluation. Results reflect the challenge of this task and that our model outperforms baselines with respect to making edits.</p>\n", "tags": ["bimodal","edit","documentation"] },
 {"key": "panthaplackel2021learning", "year": "2021", "title":"Learning to Describe Solutions for Bug Reports Based on Developer Discussions", "abstract": "<p>When a software bug is reported, developers engage in a discussion to collaboratively resolve it. While the solution is likely formulated within the discussion, it is often buried in a large amount of text, making it difficult to comprehend, which delays its implementation. To expedite bug resolution, we propose generating a concise natural language description of the solution by synthesizing relevant content within the discussion, which encompasses both natural language and source code. Furthermore, to support generating an informative description during an ongoing discussion, we propose a secondary task of determining when sufficient context about the solution emerges in real-time. We construct a dataset for these tasks with a novel technique for obtaining noisy supervision from repository changes linked to bug reports. We establish baselines for generating solution descriptions, and develop a classifier which makes a prediction following each new utterance on whether or not the necessary context for performing generation is available. Through automated and human evaluation, we find these tasks to form an ideal testbed for complex reasoning in long, bimodal dialogue context.</p>\n", "tags": ["summarization","documentation"] },
+{"key": "panthaplackel2022using", "year": "2022", "title":"Using Developer Discussions to Guide Fixing Bugs in Software", "abstract": "<p>Automatically fixing software bugs is a challenging task. While recent work showed that natural language context is useful in guiding bug-fixing models, the approach required prompting developers to provide this context, which was simulated through commit messages written after the bug-fixing code changes were made. We instead propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for any additional information from developers. For this, we augment standard bug-fixing datasets with bug report discussions. Using these newly compiled datasets, we demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.</p>\n", "tags": ["Transformer","repair"] },
 {"key": "parvez2018building", "year": "2018", "title":"Building Language Models for Text with Named Entities", "abstract": "<p>Text  in  many  domains  involves  a  significant amount of named entities.   Predicting the entity names is often challenging\nfor a language model as they appear less\nfrequent  on  the  training  corpus.   In  this\npaper,  we  propose  a  novel  and  effective\napproach to building a discriminative language  model  which  can  learn  the  entity\nnames by leveraging their entity type information.  We also introduce two benchmark  datasets  based  on  recipes  and  Java\nprogramming codes,  on which we evaluate the proposed model.  Experimental results show that our model achieves 52.2%\nbetter perplexity in recipe generation and\n22.06% on code generation than the state-of-the-art language models.</p>\n", "tags": ["language model"] },
 {"key": "parvez2021retrieval", "year": "2021", "title":"Retrieval Augmented Code Generation and Summarization", "abstract": "<p>Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.</p>\n", "tags": ["Transformer","summarization","code generation"] },
 {"key": "pashakhanloo2022codetrek", "year": "2022", "title":"CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation", "abstract": "<p>Designing a suitable representation for code-reasoning tasks is challenging in aspects such as the kinds of program information to model, how to combine them, and how much context to consider. We propose CodeTrek, a deep learning approach that addresses these challenges by representing codebases as databases that conform to rich relational schemas. The relational representation not only allows CodeTrek to uniformly represent diverse kinds of program information, but also to leverage program-analysis queries to derive new semantic relations, which can be readily incorporated without further architectural engineering. CodeTrek embeds this relational representation using a set of walks that can traverse different relations in an unconstrained fashion, and incorporates all relevant attributes along the way. We evaluate CodeTrek on four diverse and challenging Python tasks: variable misuse, exception prediction, unused definition, and variable shadowing.\nCodeTrek achieves an accuracy of 91%, 63%, 98%, and 94% on these tasks respectively, and outperforms state-of-the-art neural models by 2-19% points.</p>\n", "tags": ["representation","variable misuse"] },