ml4code
diff --git a/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions b/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions
@@ -74,6 +74,7 @@
 {"key": "chen2019mining", "year": "2019", "title":"Mining Likely Analogical APIs across Third-Party Libraries via Large-Scale Unsupervised API Semantics Embedding", "abstract": "<p>Establishing API mappings between third-party libraries is a prerequisite step for library migration tasks. Manually establishing API mappings is tedious due to the large number of APIs to be examined. Having an automatic technique to create a database of likely API mappings can significantly ease the task. Unfortunately, existing techniques either adopt supervised learning mechanism that requires already-ported or functionality similar applications across major programming languages or platforms, which are difficult to come by for an arbitrary pair of third-party libraries, or cannot deal with lexical gap in the API descriptions of different libraries. To overcome these limitations, we present an unsupervised deep learning based approach to embed both API usage semantics and API description (name and document) semantics into vector space for inferring likely analogical API mappings between libraries. Based on deep learning models trained using tens of millions of API call sequences, method names and comments of 2.8 millions of methods from 135,127 GitHub projects, our approach significantly outperforms other deep learning or traditional information retrieval (IR) methods for inferring likely analogical APIs. We implement a proof-of-concept website which can recommend analogical APIs for 583,501 APIs of 111 pairs of analogical Java libraries with diverse functionalities. This scale of third-party analogical-API database has never been achieved before.</p>\n", "tags": ["API","representation"] },
 {"key": "chen2019sequencer", "year": "2019", "title":"SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair", "abstract": "<p>This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 commits, carefully curated from open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples. It captures a wide range of repair operators without any domain-specific top-down design.</p>\n", "tags": ["repair","code generation"] },
 {"key": "chen2021evaluating", "year": "2021", "title":"Evaluating Large Language Models Trained on Code", "abstract": "<p>We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.</p>\n", "tags": ["language model","synthesis"] },
+{"key": "chen2021plur", "year": "2021", "title":"PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair", "abstract": "<p>Machine learning for understanding and editing source code has recently attracted significant interest, with many developments in new models, new code representations, and new tasks.This proliferation can appear disparate and disconnected, making each approach seemingly unique and incompatible, thus obscuring the core machine learning challenges and contributions.In this work, we demonstrate that the landscape can be significantly simplified by taking a general approach of mapping a graph to a sequence of tokens and pointers.Our main result is to show that 16 recently published tasks of different shapes can be cast in this form, based on which a single model architecture achieves near or above state-of-the-art results on nearly all tasks, outperforming custom models like code2seq and alternative generic models like Transformers.This unification further enables multi-task learning and a series of cross-cutting experiments about the importance of different modeling choices for code understanding and repair tasks.The full framework, called PLUR, is easily extensible to more tasks, and will be open-sourced (https://github.com/google-research/plur).</p>\n", "tags": ["repair"] },
 {"key": "chen2022codet", "year": "2022", "title":"CodeT: Code Generation with Generated Tests", "abstract": "<p>Given a programming problem, pre-trained language models such as Codex have demonstrated the ability to generate multiple different code solutions via sampling. However, selecting a correct or best solution from those samples still remains a challenge. While an easy way to verify the correctness of a code solution is through executing test cases, producing high-quality test cases is prohibitively expensive. In this paper, we explore the use of pre-trained language models to automatically generate test cases, calling our method CodeT: Code generation with generated Tests. CodeT executes the code solutions using the generated test cases, and then chooses the best solution based on a dual execution agreement with both the generated test cases and other generated solutions. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks. Extensive experimental results demonstrate CodeT can achieve significant, consistent, and surprising improvements over previous methods. For example, CodeT improves the pass@1 on HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002 model, and an absolute 20+% improvement over previous state-of-the-art results.</p>\n", "tags": ["synthesis"] },
 {"key": "chibotaru2019scalable", "year": "2019", "title":"Scalable Taint Specification Inference with Big Code", "abstract": "<p>We present a new scalable, semi-supervised method for inferring\ntaint analysis specifications by learning from a large dataset of programs.\nTaint specifications capture the role of library APIs (source, sink, sanitizer)\nand are a critical ingredient of any taint analyzer that aims to detect\nsecurity violations based on information flow.</p>\n\n<p>The core idea of our method\nis to formulate the taint specification learning problem as a linear\noptimization task over a large set of information flow constraints.\nThe resulting constraint system can then be efficiently solved with\nstate-of-the-art solvers. Thanks to its scalability, our method can infer\nmany new and interesting taint specifications by simultaneously learning from\na large dataset of programs (e.g., as found on GitHub), while requiring \nfew manual annotations.</p>\n\n<p>We implemented our method in an end-to-end system,\ncalled Seldon, targeting Python, a language where static specification\ninference is particularly hard due to lack of typing information.\nWe show that Seldon is practically effective: it learned almost 7,000 API\nroles from over 210,000 candidate APIs with very little supervision\n(less than 300 annotations) and with high estimated precision (67%).\nFurther,using the learned specifications, our taint analyzer flagged more than\n20,000 violations in open source projects, 97% of which were\nundetectable without the inferred specifications.</p>\n", "tags": ["defect","program analysis"] },
 {"key": "chirkova2020empirical", "year": "2020", "title":"Empirical Study of Transformers for Source Code", "abstract": "<p>Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly structured, i. e. follows the syntax of the programming language. Several recent works develop Transformer modifications for capturing syntactic information in source code. The drawback of these works is that they do not compare to each other and all consider different tasks. In this work, we conduct a thorough empirical study of the capabilities of Transformers to utilize syntactic information in different tasks. We consider three tasks (code completion, function naming and bug fixing) and re-implement different syntax-capturing modifications in a unified framework. We show that Transformers are able to make meaningful predictions based purely on syntactic information and underline the best practices of taking the syntactic information into account for improving the performance of the model.</p>\n", "tags": ["Transformer"] },