Skip to content

feat(extraction): add Common Lisp support via hand-rolled s-expressio…#562

Open
cyclistmass wants to merge 1 commit into
colbymchenry:mainfrom
cyclistmass:feat/lisp-sexp-extractor
Open

feat(extraction): add Common Lisp support via hand-rolled s-expressio…#562
cyclistmass wants to merge 1 commit into
colbymchenry:mainfrom
cyclistmass:feat/lisp-sexp-extractor

Conversation

@cyclistmass
Copy link
Copy Markdown

…n parser

Adds Common Lisp / Emacs Lisp (.lisp, .lsp, .cl, .asd, .el) as a custom extractor — a dedicated tokenizer + recursive-descent s-expression parser, no tree-sitter grammar and no source preprocessing. The atom reader consumes up to whitespace/delimiter, so ^, {}[], backslash escapes, package- qualified symbols, reader conditionals, #'/#\/#x dispatch, nested #|...|# comments, format-directive strings, and mid-symbol # all parse correctly with zero special-casing.

Extracts: defun/defmacro/defgeneric (functions); defmethod (method with CLOS receiver-typed qualified name + :before/:after/:around qualifier disambig + contains-edge from its class); defclass/define-condition (class + slot fields

  • :accessor/:reader/:writer functions + extends edges); defstruct; deftype; defvar/defparameter/defconstant; defpackage (namespace + :use/:import-from imports + :export exports); (require ...) imports; and context-aware call edges that suppress binding-form names (let/do/dolist/multiple-value-bind/...), declaration specifiers, cond/case literal keys, and resolve funcall/apply/ CCL's (! vinsn) indirection to the real target. Top-level def* DSL macros (def-x86-opcode, define-arm-vinsn, defcommand, deftest, ...) surface their defined symbol.

Validated on the Clozure ANSI regression suite (ccl-tests, 867 files / 170k lines): zero parse errors, 24,751 nodes, 99.97% of top-level deftest forms extracted. Cross-checked against CCL's own tools/vinsn-xref.py on the ARM64 port: identical caller results (e.g. save-values -> its 3 call sites).

…n parser

Adds Common Lisp / Emacs Lisp (.lisp, .lsp, .cl, .asd, .el) as a custom
extractor — a dedicated tokenizer + recursive-descent s-expression parser,
no tree-sitter grammar and no source preprocessing. The atom reader consumes
up to whitespace/delimiter, so `^`, `{}[]`, backslash escapes, package-
qualified symbols, reader conditionals, `#'`/`#\`/`#x` dispatch, nested
`#|...|#` comments, format-directive strings, and mid-symbol `#` all parse
correctly with zero special-casing.

Extracts: defun/defmacro/defgeneric (functions); defmethod (method with CLOS
receiver-typed qualified name + :before/:after/:around qualifier disambig +
contains-edge from its class); defclass/define-condition (class + slot fields
+ :accessor/:reader/:writer functions + extends edges); defstruct; deftype;
defvar/defparameter/defconstant; defpackage (namespace + :use/:import-from
imports + :export exports); (require ...) imports; and context-aware call
edges that suppress binding-form names (let/do/dolist/multiple-value-bind/...),
declaration specifiers, cond/case literal keys, and resolve funcall/apply/
CCL's (! vinsn) indirection to the real target. Top-level def* DSL macros
(def-x86-opcode, define-arm-vinsn, defcommand, deftest, ...) surface their
defined symbol.

Validated on the Clozure ANSI regression suite (ccl-tests, 867 files / 170k
lines): zero parse errors, 24,751 nodes, 99.97% of top-level deftest forms
extracted. Cross-checked against CCL's own tools/vinsn-xref.py on the ARM64
port: identical caller results (e.g. save-values -> its 3 call sites).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants