|
476 | 476 | {"key": "xu2022systematic", "year": "2022", "title":"A Systematic Evaluation of Large Language Models of Code", "abstract": "<p>Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not publicly available, leaving many questions about their model and data design decisions. We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages. Although Codex itself is not open-source, we find that existing open-source models do achieve close results in some programming languages, although targeted mainly for natural language modeling. We further identify an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine. In the C programming language, PolyCoder outperforms all models including Codex. Our trained models are open-source and publicly available at this https URL, which enables future research and application in this area.</p>\n", "tags": ["Transformer","language model"] }, |
477 | 477 | {"key": "yadavally2023partial", "year": "2023", "title":"(Partial) Program Dependence Learning", "abstract": "<p>Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%–97.17% and 92.46%–96.01%, respectively. We also test the usefulness of the PDGs predicted by NEURALPDA (i.e., PDG<em>) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG</em> is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.</p>\n", "tags": ["large language models","program analysis","static analysis","tool"] }, |
478 | 478 | {"key": "yadavally2024learning", "year": "2024", "title":"A Learning-Based Approach to Static Program Slicing", "abstract": "<p>Traditional program slicing techniques are crucial for early bug detection and manual/automated debugging of online code snippets. Nevertheless, their inability to handle incomplete code hinders their real-world applicability in such scenarios. To overcome these challenges, we present NS-Slicer, a novel learning-based approach that predicts static program slices for both complete and partial code. Our tool leverages a pre-trained language model to exploit its understanding of fine-grained variable-statement dependencies within source code. With this knowledge, given a variable at a specific location and a statement in a code snippet, NS-Slicer determines whether the statement belongs to the backward slice or forward slice, respectively. We conducted a series of experiments to evaluate NS-Slicer’s performance. On complete code, it predicts the backward and forward slices with an F1-score of 97.41% and 95.82%, respectively, while achieving an overall F1-score of 96.77%. Notably, in 85.20% of the cases, the static program slices predicted by NS-Slicer exactly match entire slices from the oracle. For partial programs, it achieved an F1-score of 96.77%–97.49% for backward slicing, 92.14%–95.40% for forward slicing, and an overall F1-score of 94.66%–96.62%. Furthermore, we demonstrate NS-Slicer’s utility in vulnerability detection (VD), integrating its predicted slices into an automated VD tool. In this setup, the tool detected vulnerabilities in Java code with a high F1-score of 73.38%. We also include the analyses studying NS-Slicer’s promising performance and limitations, providing insights into its understanding of intrinsic code properties such as variable aliasing, leading to better slicing.</p>\n", "tags": ["large language models","program analysis","static","tool"] }, |
| 479 | +{"key": "yadavally2024predictive", "year": "2024", "title":"Predictive Program Slicing via Execution Knowledge-Guided Dynamic Dependence Learning", "abstract": "<p>Program slicing, the process of extracting program statements that influence values at a designated location (known as the slicing criterion), is helpful in both manual and automated debugging. However, such slicing techniques prove ineffective in scenarios where executing specific inputs is prohibitively expensive, or even impossible, as with partial code. In this paper, we introduce ND-Slicer, a predictive slicing methodology that caters to specific executions based on a particular input, overcoming the need for actual execution. We enable such a process by leveraging execution-aware pre-training to learn the dynamic program dependencies, including both dynamic data and control dependencies between variables in the slicing criterion and the remaining program statements. Such knowledge forms the cornerstone for constructing a predictive backward slice. Our empirical evaluation revealed a high accuracy in predicting program slices, achieving an exact-match accuracy of 81.3% and a ROUGE-LCS F1-score of 95.4% on Python programs. As an extrinsic evaluation, we illustrate ND-Slicer’s usefulness in crash detection, with it locating faults with an accuracy of 63.9%. Furthermore, we include an in-depth qualitative evaluation, assessing ND-Slicer’s understanding of branched structures such as if-else blocks and loops, as well as the control flow in inter-procedural calls.</p>\n", "tags": ["large language models","program analysis","dynamic","tool"] }, |
479 | 480 | {"key": "yadid2016extracting", "year": "2016", "title":"Extracting Code from Programming Tutorial Videos", "abstract": "<p>The number of programming tutorial videos on the web\nincreases daily. Video hosting sites such as YouTube host\nmillions of video lectures, with many programming tutorials for various languages and platforms. These videos contain a wealth of valuable information, including code that\nmay be of interest. However, two main challenges have so\nfar prevented the effective indexing of programming tutorial\nvideos: (i) code in tutorials is typically written on-the-fly,\nwith only parts of the code visible in each frame, and (ii) optical character recognition (OCR) is not precise enough to\nproduce quality results from videos.</p>\n\n<p>We present a novel approach for extracting code from\nvideos that is based on: (i) consolidating code across frames,\nand (ii) statistical language models for applying corrections\nat different levels, allowing us to make corrections by choosing the most likely token, combination of tokens that form a\nlikely line structure, and combination of lines that lead to\na likely code fragment in a particular language. We implemented our approach in a tool called ACE , and used it to extract code from 40 Android video tutorials on YouTube . Our\nevaluation shows that ACE extracts code with high accuracy,\nenabling deep indexing of video tutorials.</p>\n", "tags": ["information extraction"] }, |
480 | 481 | {"key": "yan2020are", "year": "2020", "title":"Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries", "abstract": "<p>Code search methods, especially those that allow programmers to raise queries in a natural language, plays an important role in software development. It helps to improve programmers’ productivity by returning sample code snippets from the Internet and/or source-code repositories for their natural-language queries. Meanwhile, there are many code search methods in the literature that support natural-language queries. Difficulties exist in recognizing the strengths and weaknesses of each method and choosing the right one for different usage scenarios, because (1) the implementations of those methods and the datasets for evaluating them are usually not publicly available, and (2) some methods leverage different training datasets or auxiliary data sources and thus their effectiveness cannot be fairly measured and may be negatively affected in practical uses. To build a common ground for measuring code search methods, this paper builds CosBench, a dataset that consists of 1000 projects, 52 code-independent natural-language queries with ground truths, and a set of scripts for calculating four metrics on code research results. We have evaluated four IR (Information Retrieval)-based and two DL (Deep Learning)-based code search methods on CosBench. The empirical evaluation results clearly show the usefulness of the CosBench dataset and various strengths of each code search method. We found that DL-based methods are more suitable for queries on reusing code, and IR-based ones for queries on resolving bugs and learning API uses.</p>\n", "tags": ["search"] }, |
481 | 482 | {"key": "yang2017language", "year": "2017", "title":"A Language Model for Statements of Software Code", "abstract": "<p>Building language models for source code enables a large set of improvements on traditional software engineering tasks. One promising application is automatic code completion. State-of-the-art techniques capture code regularities at token level with lexical information. Such language models are more suitable for predicting short token sequences, but become less effective with respect to long statement level predictions. In this paper, we have proposed PCC to optimize the token level based language modeling. Specifically, PCC introduced an intermediate representation (IR) for source code, which puts tokens into groups using lexeme and variable relative order. In this way, PCC is able to handle long token sequences, i.e., group sequences, to suggest a complete statement with the precise synthesizer. Further more, PCC employed a fuzzy matching technique which combined genetic and longest common sub-sequence algorithms to make the prediction more accurate. We have implemented a code completion plugin for Eclipse and evaluated it on open-source Java projects. The results have demonstrated the potential of PCC in generating precise long statement level predictions. In 30%-60% of the cases, it can correctly suggest the complete statement with only six candidates, and 40%-90% of the cases with ten candidates.</p>\n", "tags": ["language model"] }, |
|
0 commit comments