|
50 | 50 | {"key": "bieber2022static", "year": "2022", "title":"Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions", "abstract": "<p>The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in isolation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a “static” setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and “learns to execute” descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.</p>\n", "tags": ["dataset","defect"] }, |
51 | 51 | {"key": "bielik2016phog", "year": "2016", "title":"PHOG: Probabilistic Model for Code", "abstract": "<p>We introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code.</p>\n", "tags": ["grammar","code generation","language model"] }, |
52 | 52 | {"key": "bielik2020adversarial", "year": "2020", "title":"Adversarial Robustness for Code", "abstract": "<p>We propose a novel technique which addresses the challenge of learning accurate and robust models of code in a principled way. Our method consists of three key components: (i) learning to abstain from making a prediction if uncertain, (ii) adversarial training, and (iii) representation refinement which learns the program parts relevant for the prediction and abstracts the rest. These components are used to iteratively train multiple models, each of which learns a suitable program representation necessary to make robust predictions on a different subset of the dataset. We instantiated our approach to the task of type inference for dynamically typed languages and demonstrate its effectiveness by learning a model that achieves 88% accuracy and 84% robustness. Further, our evaluation shows that using the combination of all three components is key to obtaining accurate and robust models.</p>\n", "tags": ["adversarial","types"] }, |
| 53 | +{"key": "bouzenia2023tracefixer", "year": "2023", "title":"TraceFixer: Execution Trace-Driven Program Repair", "abstract": "<p>When debugging unintended program behavior, developers can often identify the point in the execution where the actual behavior diverges from the desired behavior. For example, a variable may get assigned a wrong value, which then negatively influences the remaining computation. Once a developer identifies such a divergence, how to fix the code so that it provides the desired behavior? This paper presents TraceFixer, a technique for predicting how to edit source code so that it does not diverge from the expected behavior anymore. The key idea is to train a neural program repair model that not only learns from source code edits but also exploits excerpts of runtime traces. The input to the model is a partial execution trace of the incorrect code, which can be obtained automatically through code instrumentation, and the correct state that the program should reach at the divergence point, which the user provides, e.g., in an interactive debugger. Our approach fundamentally differs from current program repair techniques, which share a similar goal but exploit neither execution traces nor information about the desired program state. We evaluate TraceFixer on single-line mistakes in Python code. After training the model on hundreds of thousands of code edits created by a neural model that mimics real-world bugs, we find that exploiting execution traces improves the bug-fixing ability by 13% to 20% (depending on the dataset, within the top-10 predictions) compared to a baseline that learns from source code edits only. Applying TraceFixer to 20 real-world Python bugs shows that the approach successfully fixes 10 of them.</p>\n", "tags": ["Transformer","repair","dynamic"] }, |
53 | 54 | {"key": "brauckmann2020compiler", "year": "2020", "title":"Compiler-based graph representations for deep learning models of code", "abstract": "<p>In natural language processing, novel methods in deep learning, like recurrent neural networks (RNNs) on sequences of words, have been very successful. These methods have also been used recently for tasks in compiler optimization, like heterogeneous mapping of OpenCL kernels or predicting thread coarsening factors for optimal execution times. In contrast to natural languages, programming languages usually have a well-defined structure. This structure is what enables compilers to reason about programs on the foundations of graphs, such as abstract syntax trees (ASTs) or control-data flow graphs (CDFGs).\nIn this paper, we argue that we should use these graph structures instead of word sequences for learning compiler optimization tasks. To this end we apply recently proposed graph neural networks (GNNs) for learning predictive compiler tasks on two representations based on ASTs and CDFGs. Experimental results show how these representations improve upon the accuracy of the state-of-the-art in the task of heterogeneous OpenCL mapping, while providing orders of magnitude faster inference times, which are crucial for compiler optimizations. When testing on benchmark suites not included for training, our graph-based methods significantly outperform the state-of-the art by 12 percentage points in terms of accuracy, and are the only ones to perform better than a random mapping. When testing on the task of predicting thread coarsening factors, we expose current limitations of deep learning in compilers. We show how all of the deep learning approaches proposed so far, including our graph-based models, fail to produce an overall speedup with their predictions.</p>\n", "tags": ["representation","compilation","optimization","GNN"] }, |
54 | 55 | {"key": "brauckmann2020compy", "year": "2020", "title":"ComPy-Learn: A toolbox for exploring machine learning representations for compilers", "abstract": "<p>Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation.\nTherefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.</p>\n", "tags": ["representation","compilation","optimization","GNN"] }, |
55 | 56 | {"key": "briem2020offside", "year": "2020", "title":"OffSide: Learning to Identify Mistakes in Boundary Conditions", "abstract": "<p>Mistakes in boundary conditions are the cause of many bugs in software.\nThese mistakes happen when, e.g., developers make use of <code class=\"language-plaintext highlighter-rouge\"><</code> or <code class=\"language-plaintext highlighter-rouge\">></code> in cases\nwhere they should have used <code class=\"language-plaintext highlighter-rouge\"><=</code> or <code class=\"language-plaintext highlighter-rouge\">>=</code>. Mistakes in boundary conditions\nare often hard to find and manually detecting them might be very time-consuming\nfor developers. While researchers have been proposing techniques to cope with\nmistakes in the boundaries for a long time, the automated detection of such bugs still\nremains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes\nin boundary conditions, it should be able to capture the overall context of the source code\nunder analysis. In this work, we propose a deep learning model that learn mistakes in boundary\nconditions and, later, is able to identifythem in unseen code snippets. We train and test a\nmodel on over 1.5 million code snippets, with and without mistakes in different boundary conditions.\nOur model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41\nreal-world bugs;however, with a high false positive rate. The existing state-of-the-practice linter\ntools are not able to detect any of the bugs. We hope this paper can pave the road towards deep\nlearning models that will be able to support developers in detecting mistakes in boundary conditions.</p>\n", "tags": ["defect"] }, |
|
0 commit comments