ml4code
diff --git a/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions b/‎paper-abstracts.json‎
Lines changed: 1 addition & 0 deletions
@@ -329,6 +329,7 @@
 {"key": "saini2018oreo", "year": "2018", "title":"Oreo: detection of clones in the twilight zone", "abstract": "<p>Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature fail to operate in this zone. We present Oreo, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Oreo is built using a combination of machine learning, information retrieval, and software metrics. We evaluate the recall of Oreo on BigCloneBench, and perform manual evaluation for precision. Oreo has both high recall and precision. More importantly, it pushes the boundary in detection of clones with moderate to weak syntactic similarity in a scalable manner.</p>\n", "tags": ["clone"] },
 {"key": "santos2018syntax", "year": "2018", "title":"Syntax and Sensibility: Using language models to detect and correct syntax errors", "abstract": "<p>Syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of experience that help them quickly resolve these frustrating errors. Standard LR parsers are of little help, typically resolving syntax errors and their precise location poorly. We propose a methodology that locates where syntax errors occur, and suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by using language models trained on correct source code to find tokens that seem out of place. Fixes are synthesized by consulting the language models to determine what tokens are more likely at the estimated error location. We compare <em>n</em>-gram and LSTM (long short-term memory) language models for this task, each trained on a large corpus of Java code collected from GitHub. Unlike prior work, our methodology does not rely that the problem source code comes from the same domain as the training data. We evaluated against a repository of real student mistakes. Our tools are able to find a syntactically-valid fix within its top-2 suggestions, often producing the exact fix that the student used to resolve the error. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.</p>\n", "tags": ["repair","language model"] },
 {"key": "saraiva2015products", "year": "2015", "title":"Products, Developers, and Milestones: How Should I Build My N-Gram Language Model", "abstract": "<p>Recent work has shown that although programming languages en-\nable source code to be rich and complex, most code tends to be\nrepetitive and predictable. The use of natural language processing\n(NLP) techniques applied to source code such as n-gram language\nmodels show great promise in areas such as code completion, aiding impaired developers, and code search. In this paper, we address\nthree questions related to different methods of constructing lan-\nguage models in an industrial context. Specifically, we ask: (1) Do\napplication specific, but smaller language models perform better\nthan language models across applications? (2) Are developer specific language models effective and do they differ depending on\nwhat parts of the codebase a developer is working in? (3) Finally,\ndo language models change over time, i.e., does a language model\nfrom early development model change later on in development?\nThe answers to these questions enable techniques that make use of\nprogramming language models in development to choose the model\ntraining corpus more effectively.</p>\n\n<p>We evaluate these questions by building 28 language models across\ndevelopers, time periods, and applications within Microsoft Office\nand present the results in this paper. We find that developer and\napplication specific language models perform better than models\nfrom the entire codebase, but that temporality has little to no effect\non language model performance.</p>\n", "tags": ["language model"] },
+{"key": "sarkar2022what", "year": "2022", "title":"What is it like to program with artificial intelligence?", "abstract": "<p>Large language models, such as OpenAI’s codex and Deepmind’s AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot.</p>\n\n<p>In this paper, we explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. We draw upon publicly available experience reports of LLM-assisted programming, as well as prior usability and design studies. We find that while LLM-assisted programming shares some properties of compilation, pair programming, and programming via search and reuse, there are fundamental differences both in the technical possibilities as well as the practical experience. Thus, LLM-assisted programming ought to be viewed as a new way of programming with its own distinct properties and challenges.</p>\n\n<p>Finally, we draw upon observations from a user study in which non-expert end user programmers use LLM-assisted tools for solving data tasks in spreadsheets. We discuss the issues that might arise, and open research challenges, in applying large language models to end-user programming, particularly with users who have little or no programming expertise.</p>\n", "tags": ["human evaluation","review"] },
 {"key": "schrouff2019inferring", "year": "2019", "title":"Inferring Javascript types using Graph Neural Networks", "abstract": "<p>The recent use of `Big Code’ with state-of-the-art deep learning methods offers promising avenues to ease program source code writing and correction. As a first step towards automatic code repair, we implemented a graph neural network model that predicts token types for Javascript programs. The predictions achieve an accuracy above 90%, which improves on previous similar work.</p>\n", "tags": ["GNN","types","program analysis"] },
 {"key": "schuster2021you", "year": "2021", "title":"You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion", "abstract": "<p>Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context.</p>\n\n<p>We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter’s training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can “teach” the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer.</p>\n\n<p>We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.</p>\n", "tags": ["autocomplete","adversarial"] },
 {"key": "sharma2015nirmal", "year": "2015", "title":"NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model", "abstract": "<p>Twitter is one of the most widely used social media\nplatforms today. It enables users to share and view short 140-character messages called “tweets”. About 284 million active\nusers generate close to 500 million tweets per day. Such rapid\ngeneration of user generated content in large magnitudes results\nin the problem of information overload. Users who are interested\nin information related to a particular domain have limited means\nto filter out irrelevant tweets and tend to get lost in the huge\namount of data they encounter. A recent study by Singer et\nal. found that software developers use Twitter to stay aware of\nindustry trends, to learn from others, and to network with other\ndevelopers. However, Singer et al. also reported that developers\noften find Twitter streams to contain too much noise which is a\nbarrier to the adoption of Twitter. In this paper, to help developers\ncope with noise, we propose a novel approach named NIRMAL,\nwhich automatically identifies software relevant tweets from a\ncollection or stream of tweets. Our approach is based on language\nmodeling which learns a statistical model based on a training\ncorpus (i.e., set of documents). We make use of a subset of posts\nfrom StackOverflow, a programming question and answer site, as\na training corpus to learn a language model. A corpus of tweets\nwas then used to test the effectiveness of the trained language\nmodel. The tweets were sorted based on the rank the model\nassigned to each of the individual tweets. The top 200 tweets\nwere then manually analyzed to verify whether they are software\nrelated or not, and then an accuracy score was calculated. The\nresults show that decent accuracy scores can be achieved by\nvarious variants of NIRMAL, which indicates that NIRMAL can\neffectively identify software related tweets from a huge corpus of\ntweets.</p>\n", "tags": ["information extraction"] },