|
304 | 304 | {"key": "parvez2018building", "year": "2018", "title":"Building Language Models for Text with Named Entities", "abstract": "<p>Text in many domains involves a significant amount of named entities. Predicting the entity names is often challenging\nfor a language model as they appear less\nfrequent on the training corpus. In this\npaper, we propose a novel and effective\napproach to building a discriminative language model which can learn the entity\nnames by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java\nprogramming codes, on which we evaluate the proposed model. Experimental results show that our model achieves 52.2%\nbetter perplexity in recipe generation and\n22.06% on code generation than the state-of-the-art language models.</p>\n", "tags": ["language model"] }, |
305 | 305 | {"key": "parvez2021retrieval", "year": "2021", "title":"Retrieval Augmented Code Generation and Summarization", "abstract": "<p>Software developers write a lot of source code and documentation during software development. Intrinsically, developers often recall parts of source code or code summaries that they had written in the past while implementing software or documenting them. To mimic developers’ code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models. REDCODER has a couple of uniqueness. First, it extends the state-of-the-art dense retrieval technique to search for relevant code or summaries. Second, it can work with retrieval databases that include unimodal (only code or natural language description) or bimodal instances (code-description pairs). We conduct experiments and extensive analysis on two benchmark datasets of code generation and summarization in Java and Python, and the promising results endorse the effectiveness of our proposed retrieval augmented framework.</p>\n", "tags": ["Transformer","summarization","code generation"] }, |
306 | 306 | {"key": "pashakhanloo2022codetrek", "year": "2022", "title":"CodeTrek: Flexible Modeling of Code using an Extensible Relational Representation", "abstract": "<p>Designing a suitable representation for code-reasoning tasks is challenging in aspects such as the kinds of program information to model, how to combine them, and how much context to consider. We propose CodeTrek, a deep learning approach that addresses these challenges by representing codebases as databases that conform to rich relational schemas. The relational representation not only allows CodeTrek to uniformly represent diverse kinds of program information, but also to leverage program-analysis queries to derive new semantic relations, which can be readily incorporated without further architectural engineering. CodeTrek embeds this relational representation using a set of walks that can traverse different relations in an unconstrained fashion, and incorporates all relevant attributes along the way. We evaluate CodeTrek on four diverse and challenging Python tasks: variable misuse, exception prediction, unused definition, and variable shadowing.\nCodeTrek achieves an accuracy of 91%, 63%, 98%, and 94% on these tasks respectively, and outperforms state-of-the-art neural models by 2-19% points.</p>\n", "tags": ["representation","variable misuse"] }, |
| 307 | +{"key": "patil2022exploring", "year": "2022", "title":"Exploring Dimensions of Generalizability and Few-shot Transfer for Text-to-SQL Semantic Parsing", "abstract": "<p>Existing work on generalization in Text-to-SQL semantic parsing has been restricted to a zero-shot cross-domain setting. In this paper, we introduce Spider-Gen: a Text-to-SQL benchmark to develop a paradigm of transfer learning across distinct dimensions of generalization in Text-to-SQL semantic parsing. The Spider-Gen benchmark focuses on few-shot adaption for Cross-domain, Lexical, and Structural generalization of Text-to-SQL models. Through our experiments with the Spider-Gen dataset, we show that Seq2Seq language models struggle to generalize against change in data distribution, lexical changes in database schema, and changes in SQL query complexity. Our experiments also reveal that performing few-shot fine-tuning helps Text-to-SQL models to generalize across these changes. However, such few-shot adaptation comes with a negative effect on the knowledge learnt during training. Hence, we also explore Parameter-efficient Fine-tuning methods to overcome the limitations of Seq2Seq Text-to-SQL models. We release the Spider-Gen dataset publicly to facilitate further research in generalization and transfer learning across various dimensions in Text-to-SQL semantic parsing.</p>\n", "tags": ["dataset","evaluation","Transformer","benchmark","generalizability"] }, |
307 | 308 | {"key": "patra2016learning", "year": "2016", "title":"Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data", "abstract": "<p>Fuzzing is a popular technique to create test inputs for software that processes structured data. It has been successfully\napplied in various domains, ranging from compilers and interpreters over program analyses to rendering engines, image manipulation tools, and word processors. Existing fuzz\ntesting techniques are tailored for a particular purpose and\nrely on a carefully crafted model of the data to be generated.\nThis paper presents TreeFuzz, a generic approach for generating structured data without an a priori known model. The\nkey idea is to exploit a given corpus of example data to au-\ntomatically infer probabilistic, generative models that create\nnew data with properties similar to the corpus. To support a\nwide range of different properties, TreeFuzz is designed as a\nframework with an extensible set of techniques to infer generative models. We apply the idea to JavaScript programs\nand HTML documents and show that the approach generates mostly valid data for both of them: 96.3% of the generated JavaScript programs are syntactically valid and there are\nonly 2.06 validation errors per kilobyte of generated HTML.\nThe performance of both learning and generation scales linearly w.r.t. the size of the corpus. Using TreeFuzz-generated\nJavaScript programs for differential testing of JavaScript engines exposes various inconsistencies among browsers, including browser bugs and unimplemented language features.</p>\n", "tags": ["fuzzing"] }, |
308 | 309 | {"key": "patra2021semantic", "year": "2021", "title":"A Semantic Bug Seeding: A Learning-Based Approach for Creating Realistic Bugs", "abstract": "<p>When working on techniques to address the wide-spread problem\nof software bugs, one often faces the need for a large number of\nrealistic bugs in real-world programs. Such bugs can either help\nevaluate an approach, e.g., in form of a bug benchmark or a suite\nof program mutations, or even help build the technique, e.g., in\nlearning-based bug detection. Because gathering a large number ofreal bugs is difficult,\na common approach is to rely on automatically\nseeded bugs. Prior work seeds bugs based on syntactic transformation patterns,\nwhich often results in unrealistic bugs and typically \ncannot introduce new, application-specific code tokens. This paper\npresents SemSeed, a technique for automatically seeding bugs in\na semantics-aware way. The key idea is to imitate how a given\nreal-world bug would look like in other programs by semantically\nadapting the bug pattern to the local context. To reason about the\nsemantics of pieces of code, our approach builds on learned token embeddings\nthat encode the semantic similarities of identifiers and literals. Our\nevaluation with real-world JavaScript softwares\nhows that the approach effectively reproduces real bugs and clearly\noutperforms a semantics-unaware approach. The seeded bugs are\nuseful as training data for learning-based bug detection, where\nthey significantly improve the bug detection ability. Moreover, we\nshow that SemSeed-created bugs complement existing mutation\ntesting operators, and that our approach is efficient enough to seed\nhundreds of thousands of bugs within an hour.</p>\n", "tags": ["repair","edit"] }, |
309 | 310 | {"key": "pearce2021empirical", "year": "2021", "title":"An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions", "abstract": "<p>There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described `AI pair programmer’, GitHub Copilot, a language model trained over open-source GitHub code. However, code often contains bugs - and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk CWEs (e.g. those from MITRE’s “Top 25” list). We explore Copilot’s performance on three distinct code generation axes – examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,692 programs. Of these, we found approximately 40% to be vulnerable.</p>\n", "tags": ["Transformer","language model"] }, |
|
0 commit comments