|
58 | 58 | {"key": "bielik2020adversarial", "year": "2020", "title":"Adversarial Robustness for Code", "abstract": "<p>We propose a novel technique which addresses the challenge of learning accurate and robust models of code in a principled way. Our method consists of three key components: (i) learning to abstain from making a prediction if uncertain, (ii) adversarial training, and (iii) representation refinement which learns the program parts relevant for the prediction and abstracts the rest. These components are used to iteratively train multiple models, each of which learns a suitable program representation necessary to make robust predictions on a different subset of the dataset. We instantiated our approach to the task of type inference for dynamically typed languages and demonstrate its effectiveness by learning a model that achieves 88% accuracy and 84% robustness. Further, our evaluation shows that using the combination of all three components is key to obtaining accurate and robust models.</p>\n", "tags": ["adversarial","types"] }, |
59 | 59 | {"key": "bouzenia2023tracefixer", "year": "2023", "title":"TraceFixer: Execution Trace-Driven Program Repair", "abstract": "<p>When debugging unintended program behavior, developers can often identify the point in the execution where the actual behavior diverges from the desired behavior. For example, a variable may get assigned a wrong value, which then negatively influences the remaining computation. Once a developer identifies such a divergence, how to fix the code so that it provides the desired behavior? This paper presents TraceFixer, a technique for predicting how to edit source code so that it does not diverge from the expected behavior anymore. The key idea is to train a neural program repair model that not only learns from source code edits but also exploits excerpts of runtime traces. The input to the model is a partial execution trace of the incorrect code, which can be obtained automatically through code instrumentation, and the correct state that the program should reach at the divergence point, which the user provides, e.g., in an interactive debugger. Our approach fundamentally differs from current program repair techniques, which share a similar goal but exploit neither execution traces nor information about the desired program state. We evaluate TraceFixer on single-line mistakes in Python code. After training the model on hundreds of thousands of code edits created by a neural model that mimics real-world bugs, we find that exploiting execution traces improves the bug-fixing ability by 13% to 20% (depending on the dataset, within the top-10 predictions) compared to a baseline that learns from source code edits only. Applying TraceFixer to 20 real-world Python bugs shows that the approach successfully fixes 10 of them.</p>\n", "tags": ["Transformer","repair","dynamic"] }, |
60 | 60 | {"key": "bouzenia2024repairagent", "year": "2024", "title":"RepairAgent: An Autonomous, LLM-Based Agent for Program Repair", "abstract": "<p>Automated program repair has emerged as a powerful technique to mitigate the impact of software bugs on system reliability and user experience. This paper introduces RepairAgent, the first work to address the program repair challenge through an autonomous agent based on a large language model (LLM). Unlike existing deep learning-based approaches, which prompt a model with a fixed prompt or in a fixed feedback loop, our work treats the LLM as an agent capable of autonomously planning and executing actions to fix bugs by invoking suitable tools. RepairAgent freely interleaves gathering information about the bug, gathering repair ingredients, and validating fixes, while deciding which tools to invoke based on the gathered information and feedback from previous fix attempts. Key contributions that enable RepairAgent include a set of tools that are useful for program repair, a dynamically updated prompt format that allows the LLM to interact with these tools, and a finite state machine that guides the agent in invoking the tools. Our evaluation on the popular Defects4J dataset demonstrates RepairAgent’s effectiveness in autonomously repairing 164 bugs, including 39 bugs not fixed by prior techniques. Interacting with the LLM imposes an average cost of 270,000 tokens per bug, which, under the current pricing of OpenAI’s GPT-3.5 model, translates to 14 cents of USD per bug. To the best of our knowledge, this work is the first to present an autonomous, LLM-based agent for program repair, paving the way for future agent-based techniques in software engineering.</p>\n", "tags": ["repair"] }, |
| 61 | +{"key": "brach2024can", "year": "2024", "title":"Can Large Language Model Detect Plagiarism in Source Code?", "abstract": "<p>The issue of code plagiarism represents a significant challenge in the academic environment. This study examines the potential of large language models (LLMs) in improving the detection of code plagiarism. The performance of several LLMs, including GPT-4o, GPT3.5 Turbo, LLaMA 3, and CodeLlama, is evaluated in comparison to conventional tools, such as JPlag, across a range of levels of code plagiarism. The findings of our study illustrate that state-of-the-art LLMs are able to outperform traditional methods, particularly in the detection of sophisticated forms of plagiarism. GPT-4o exhibited the highest overall accuracy (78.70%) and an F1 score of 86.97%. It is important to note that open-source models, such as LLaMA 3 (accuracy 71.53%, F1 score 82.75%), demonstrated the ability to detect the most complex forms of plagiarism with the same accuracy as GPT-4o. While these results demonstrate the promising potential of LLMs in code similarity analysis, it is also evident that higher false positive rates may be an inherent limitation, emphasizing the need for human oversight. This study contributes valuable insights into the application of AI in maintaining code integrity and academic honesty, paving the way for more effective, interpretable, and fair plagiarism detection systems in software development education and practice.</p>\n", "tags": ["code similarity","large language models","LLM","plagiarism detection","natural language processing"] }, |
61 | 62 | {"key": "brauckmann2020compiler", "year": "2020", "title":"Compiler-based graph representations for deep learning models of code", "abstract": "<p>In natural language processing, novel methods in deep learning, like recurrent neural networks (RNNs) on sequences of words, have been very successful. These methods have also been used recently for tasks in compiler optimization, like heterogeneous mapping of OpenCL kernels or predicting thread coarsening factors for optimal execution times. In contrast to natural languages, programming languages usually have a well-defined structure. This structure is what enables compilers to reason about programs on the foundations of graphs, such as abstract syntax trees (ASTs) or control-data flow graphs (CDFGs).\nIn this paper, we argue that we should use these graph structures instead of word sequences for learning compiler optimization tasks. To this end we apply recently proposed graph neural networks (GNNs) for learning predictive compiler tasks on two representations based on ASTs and CDFGs. Experimental results show how these representations improve upon the accuracy of the state-of-the-art in the task of heterogeneous OpenCL mapping, while providing orders of magnitude faster inference times, which are crucial for compiler optimizations. When testing on benchmark suites not included for training, our graph-based methods significantly outperform the state-of-the art by 12 percentage points in terms of accuracy, and are the only ones to perform better than a random mapping. When testing on the task of predicting thread coarsening factors, we expose current limitations of deep learning in compilers. We show how all of the deep learning approaches proposed so far, including our graph-based models, fail to produce an overall speedup with their predictions.</p>\n", "tags": ["representation","compilation","optimization","GNN"] }, |
62 | 63 | {"key": "brauckmann2020compy", "year": "2020", "title":"ComPy-Learn: A toolbox for exploring machine learning representations for compilers", "abstract": "<p>Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation.\nTherefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.</p>\n", "tags": ["representation","compilation","optimization","GNN"] }, |
63 | 64 | {"key": "briem2020offside", "year": "2020", "title":"OffSide: Learning to Identify Mistakes in Boundary Conditions", "abstract": "<p>Mistakes in boundary conditions are the cause of many bugs in software.\nThese mistakes happen when, e.g., developers make use of <code class=\"language-plaintext highlighter-rouge\"><</code> or <code class=\"language-plaintext highlighter-rouge\">></code> in cases\nwhere they should have used <code class=\"language-plaintext highlighter-rouge\"><=</code> or <code class=\"language-plaintext highlighter-rouge\">>=</code>. Mistakes in boundary conditions\nare often hard to find and manually detecting them might be very time-consuming\nfor developers. While researchers have been proposing techniques to cope with\nmistakes in the boundaries for a long time, the automated detection of such bugs still\nremains a challenge. We conjecture that, for a tool to be able to precisely identify mistakes\nin boundary conditions, it should be able to capture the overall context of the source code\nunder analysis. In this work, we propose a deep learning model that learn mistakes in boundary\nconditions and, later, is able to identifythem in unseen code snippets. We train and test a\nmodel on over 1.5 million code snippets, with and without mistakes in different boundary conditions.\nOur model shows an accuracy from 55% up to 87%. The model is also able to detect 24 out of 41\nreal-world bugs;however, with a high false positive rate. The existing state-of-the-practice linter\ntools are not able to detect any of the bugs. We hope this paper can pave the road towards deep\nlearning models that will be able to support developers in detecting mistakes in boundary conditions.</p>\n", "tags": ["defect"] }, |
|
0 commit comments