ml4code
diff --git a/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions b/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions
@@ -199,6 +199,7 @@
 {"key": "koc2017learning", "year": "2017", "title":"Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools", "abstract": "<p>The large scale and high complexity of modern software systems\nmake perfectly precise static code analysis (SCA) infeasible. Therefore SCA tools often over-approximate, so not to miss any real\nproblems. This, however, comes at the expense of raising false\nalarms, which, in practice, reduces the usability of these tools.</p>\n\n<p>To partially address this problem, we propose a novel learning\nprocess whose goal is to discover program structures that cause\na given SCA tool to emit false error reports, and then to use this\ninformation to predict whether a new error report is likely to be a\nfalse positive as well. To do this, we first preprocess code to isolate\nthe locations that are related to the error report. Then, we apply\nmachine learning techniques to the preprocessed code to discover\ncorrelations and to learn a classifier.</p>\n\n<p>We evaluated this approach in an initial case study of a widely-used SCA tool for Java. Our results showed that for our dataset\nwe could accurately classify a large majority of false positive error\nreports. Moreover, we identified some common coding patterns that\nled to false positive errors. We believe that SCA developers may be\nable to redesign their methods to address these patterns and reduce\nfalse positive error reports.</p>\n", "tags": ["static analysis"] },
 {"key": "kocetkov2022stack", "year": "2022", "title":"The Stack: 3TB of permissively licensed source code", "abstract": "<p>Large Language Models (LLMs) play an ever-increasing role in the field of\nArtificial Intelligence (AI)–not only for natural language processing but also\nfor code understanding and generation. To stimulate open and responsible\nresearch on LLMs for code, we introduce The Stack, a 3.1 TB dataset\nconsisting of permissively licensed source code in 30 programming languages.\nWe describe how we collect the full dataset, construct a permissively licensed\nsubset, and present promising results on text2code benchmarks by training 350M-parameter decoders on different Python subsets. We find that\n(1) near-deduplicating the data significantly boosts performance across all\nexperiments, and (2) it is possible to match previously reported HumanEval\nand MBPP performance using only permissively licensed data. We make the\ndataset available at https://hf.co/BigCode and give developers the possi-\nbility to have their code removed from the dataset by following the instruc-\ntions at https://www.bigcode-project.org/docs/about/the-stack/.</p>\n", "tags": ["dataset"] },
 {"key": "korbak2021energy", "year": "2021", "title":"Energy-Based Models for Code Generation under Compilability Constraints", "abstract": "<p>Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm (Khalifa et al., 2021) to train a generative model approximating the EBM. We conduct experiments showing that our proposed approach is able to improve compilability rates without sacrificing diversity and complexity of the generated samples.</p>\n", "tags": ["code generation"] },
+{"key": "kovalchuk2022human", "year": "2022", "title":"Human perceiving behavior modeling in evaluation of code generation models", "abstract": "<p>Within this study, we evaluated a series of code generation models based on CodeGen and GPTNeo to compare the metric-based performance and human evaluation. For a deeper analysis of human perceiving within the evaluation procedure we’ve implemented a 5-level Likert scale assessment of the model output using a perceiving model based on the Theory of Planned Behavior (TPB). Through such analysis, we showed an extension of model assessment as well as a deeper understanding of the quality and applicability of generated code for practical question answering. The approach was evaluated with several model settings in order to assess diversity in quality and style of answer. With the TPB-based model, we showed a different level of perceiving the model result, namely personal understanding, agreement level, and readiness to use the particular code. With such analysis, we investigate a series of issues in code generation as natural language generation (NLG) problems observed in a practical context of programming question-answering with code.</p>\n", "tags": ["code generation","evaluation","human evaluation"] },
 {"key": "kovalenko2019pathminer", "year": "2019", "title":"PathMiner : A Library for Mining of Path-Based Representations of Code", "abstract": "<p>One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation – an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information.\nBuilding the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code.</p>\n\n<p>In this paper, we present PathMiner – an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language. Preprint [https://doi.org/10.5281/zenodo.2595271]; released tool [https://doi.org/10.5281/zenodo.2595257].</p>\n", "tags": ["representation","grammar"] },
 {"key": "kremenek2007factor", "year": "2007", "title":"A Factor Graph Model for Software Bug Finding", "abstract": "<p>Automatic tools for finding software errors require\nknowledge of the rules a program must obey, or\n“specifications,” before they can identify bugs. We\npresent a method that combines factor graphs and\nstatic program analysis to automatically infer specifications directly from programs. We illustrate the\napproach on inferring functions in C programs that\nallocate and release resources, and evaluate the approach on three codebases: SDL, OpenSSH, and\nthe OS kernel for Mac OS X (XNU). The inferred\nspecifications are highly accurate and with them we\nhave discovered numerous bugs.</p>\n\n", "tags": ["program analysis"] },
 {"key": "kulal2019spoc", "year": "2019", "title":"SPoC: Search-based Pseudocode to Code", "abstract": "<p>We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising programs. We propose to perform credit assignment based on signals from compilation errors, which constitute 88.7% of program failures. Concretely, we treat the translation of each pseudocode line as a discrete portion of the program, and whenever a synthesized program fails to compile, an error localization method tries to identify the portion of the program responsible for the failure. We then focus search over alternative translations of the pseudocode for those portions. For evaluation, we collected the SPoC dataset (Search-based Pseudocode to Code) containing 18,356 programs with human-authored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%.</p>\n", "tags": ["bimodal","synthesis"] },