feat(extraction): add Common Lisp support via hand-rolled s-expressio…#562
Open
cyclistmass wants to merge 1 commit into
Open
feat(extraction): add Common Lisp support via hand-rolled s-expressio…#562cyclistmass wants to merge 1 commit into
cyclistmass wants to merge 1 commit into
Conversation
…n parser
Adds Common Lisp / Emacs Lisp (.lisp, .lsp, .cl, .asd, .el) as a custom
extractor — a dedicated tokenizer + recursive-descent s-expression parser,
no tree-sitter grammar and no source preprocessing. The atom reader consumes
up to whitespace/delimiter, so `^`, `{}[]`, backslash escapes, package-
qualified symbols, reader conditionals, `#'`/`#\`/`#x` dispatch, nested
`#|...|#` comments, format-directive strings, and mid-symbol `#` all parse
correctly with zero special-casing.
Extracts: defun/defmacro/defgeneric (functions); defmethod (method with CLOS
receiver-typed qualified name + :before/:after/:around qualifier disambig +
contains-edge from its class); defclass/define-condition (class + slot fields
+ :accessor/:reader/:writer functions + extends edges); defstruct; deftype;
defvar/defparameter/defconstant; defpackage (namespace + :use/:import-from
imports + :export exports); (require ...) imports; and context-aware call
edges that suppress binding-form names (let/do/dolist/multiple-value-bind/...),
declaration specifiers, cond/case literal keys, and resolve funcall/apply/
CCL's (! vinsn) indirection to the real target. Top-level def* DSL macros
(def-x86-opcode, define-arm-vinsn, defcommand, deftest, ...) surface their
defined symbol.
Validated on the Clozure ANSI regression suite (ccl-tests, 867 files / 170k
lines): zero parse errors, 24,751 nodes, 99.97% of top-level deftest forms
extracted. Cross-checked against CCL's own tools/vinsn-xref.py on the ARM64
port: identical caller results (e.g. save-values -> its 3 call sites).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…n parser
Adds Common Lisp / Emacs Lisp (.lisp, .lsp, .cl, .asd, .el) as a custom extractor — a dedicated tokenizer + recursive-descent s-expression parser, no tree-sitter grammar and no source preprocessing. The atom reader consumes up to whitespace/delimiter, so
^,{}[], backslash escapes, package- qualified symbols, reader conditionals,#'/#\/#xdispatch, nested#|...|#comments, format-directive strings, and mid-symbol#all parse correctly with zero special-casing.Extracts: defun/defmacro/defgeneric (functions); defmethod (method with CLOS receiver-typed qualified name + :before/:after/:around qualifier disambig + contains-edge from its class); defclass/define-condition (class + slot fields
Validated on the Clozure ANSI regression suite (ccl-tests, 867 files / 170k lines): zero parse errors, 24,751 nodes, 99.97% of top-level deftest forms extracted. Cross-checked against CCL's own tools/vinsn-xref.py on the ARM64 port: identical caller results (e.g. save-values -> its 3 call sites).