Add Ripper :on_sp events for Prism.lex_compat and Prism::Translation::Ripper by eregon · Pull Request #3859 · ruby/prism

eregon · 2026-01-18T16:46:29Z

Ref: #3838
On top of #3858 should be merged first.

eregon · 2026-01-18T16:47:09Z

Failures that were added to excludes:

==============================================================================================================================
Failure: test_dos_endings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 9], :on_sp, "\\\r\n", nil]> expected but was
<[[1, 9], :on_sp, "\\\r\n" + "     ", nil]>

diff:
? [[1, 9], :on_sp, "\\\r\n" + "     ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/str_lit_concat_bad_encodings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 62], :on_sp, " ", nil]> expected but was
<[[1, 62], :on_sp, " \\\n" + "        ", nil]>

diff:
? [[1, 62], :on_sp, " \\\n" + "        ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/utf8_bom.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[2, 0], :on_ident, "p", CMDARG]> expected but was
<[[1, 23], :on_sp, "-w\n", BEG]>

diff:
? [[   2, 0], :on_ide     nt, "p", CMDARG]
?   1,  3         sp, "-w\         BE     
?   +++ ???         ??? -----   ?????     
==============================================================================================================================
F
==============================================================================================================================
Failure: test_unparser/corpus/semantic/dstr.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[119, 3], :on_sp, " ", nil]> expected but was
<[[119, 3], :on_sp, " \\\n", nil]>

diff:
? [[119, 3], :on_sp, "     ", nil]
?                      \\\n       
?                     ?       
==============================================================================================================================
Finished in 1.197953795 seconds.
------------------------------------------------------------------------------------------------------------------------------
1892 tests, 36583 assertions, 4 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
99.7886% passed
------------------------------------------------------------------------------------------------------------------------------

eregon · 2026-01-18T16:51:31Z

:on_sp events also have state on Ripper and it's not always the last token state, example:

$ ruby -rripper -e 'pp Ripper.lex("def m(a) nil end")'  
[[[1, 0], :on_kw, "def", FNAME],
 [[1, 3], :on_sp, " ", FNAME],
 [[1, 4], :on_ident, "m", ENDFN],
 [[1, 5], :on_lparen, "(", BEG|LABEL],
 [[1, 6], :on_ident, "a", ARG],
 [[1, 7], :on_rparen, ")", ENDFN],
 [[1, 8], :on_sp, " ", BEG], <=
 [[1, 9], :on_kw, "nil", END],
 [[1, 12], :on_sp, " ", END],
 [[1, 13], :on_kw, "end", END]]

And more complex cases like whitequark/space_args_block.txt:

EXPECTED:
[[[1, 0], :on_kw, "undef", FNAME|FITEM],
 [[1, 5], :on_sp, " ", FNAME|FITEM], # OK
 [[1, 6], :on_ident, "foo", END],
 [[1, 9], :on_comma, ",", BEG|LABEL],
 [[1, 10], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 11], :on_symbeg, ":", FNAME],
 [[1, 12], :on_ident, "bar", ENDFN],
 [[1, 15], :on_comma, ",", BEG|LABEL],
 [[1, 16], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 17], :on_symbeg, ":\"", FNAME],
 [[1, 19], :on_tstring_content, "foo", FNAME],
 [[1, 22], :on_embexpr_beg, "\#{", FNAME],
 [[1, 24], :on_int, "1", END],
 [[1, 25], :on_embexpr_end, "}", END],
 [[1, 26], :on_tstring_end, "\"", END],
 [[1, 27], :on_nl, "\n", BEG]]

Would there be any way to get the correct state with Prism?

Earlopain · 2026-01-19T07:47:15Z

I would not bother with comparing state for this. I'm not particularly motivated to understand what is happening in detail here.

I doubt anyone is relying on this information being correct.

eregon · 2026-01-19T10:15:53Z

Yeah it's what I'm doing in test/prism/ruby/ripper_test.rb, I'm ignoring the state of :on_sp events for now.
Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states? It seems wasteful to create new instances every time when there are only a few possible states and they are frozen.

Earlopain · 2026-01-19T12:12:54Z

Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Just putting in the state from the previous token seems fine. When it lines up, great. If not, oh well. Doesn't really matter in what way it's wrong. nil doesn't sound great for compatibility.

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states?

I believe that is what ripper itself does. I haven't benchmarked this in detail yet but there are definitly some hotspots. Much time is spend sorting tokens by location for example.

If changing this makes a noticable difference, I wouldn't be opposed to it. Here's what I use to quickly check:

require "ripper"
require "prism"
require "benchmark/ips"

codes = Dir["**/*.rb"].map { File.read(it) }

Benchmark.ips do |x|
  x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
  x.report("ripper") { codes.each { Ripper.lex(it) } }

  x.compare!
end

I was also a bit concerned about the performance of this implementation but it is not so bad. Compared to ripper it goes from 2.3x times slower to 2.6x times slower which seems fine.

lib/prism/lex_compat.rb

Earlopain

Still a draft but this is basically good to me.

lib/prism/lex_compat.rb

eregon · 2026-01-19T20:03:03Z

should rebase on top of Optimize ripper translator #3862 and use the cache

lib/prism/lex_compat.rb

eregon · 2026-01-19T21:50:51Z

Taking notes, next failure is

# top-100-gems/graphql-2.0.21/lib/generators/graphql/install/mutation_root_generator.rb
  end
end #<trailing space here

- [[34, 0], :on_kw, "end", END],
- [[34, 3], :on_sp, " ", END]]
+ [[34, 0], :on_kw, "end", END]]

UPDATE: fixed now

eregon · 2026-01-19T23:52:30Z

The BOM (byte order mark) edge cases are madness but I think I got them correct now 😅

Earlopain

I'm somewhat surprised that this seems to handle all the cases. Nice work.

Earlopain · 2026-01-20T07:50:18Z

rbi/prism/parse_result.rbi


-  sig { params(byte_offset: Integer, length: Integer).returns(String) }
-  def slice(byte_offset, length); end
+  def slice(...); end


I really don't know if this is ok. It typechecks but that's about it about my knowledge. I hate fighting with these tools. I'm just going to revert this and update the two places that pass a range.

I read some Sorbet docs and asked ChatGPT and basically Sorbet seems to not really support ....
I tried just leaving sig { returns(String) } but it was not happy about it.
Maybe overloads like this would work:

sig { overload.params(a: Integer, b: Integer).void } sig { overload.params(r: T::Range[Integer]).void } def slice(*args); end

But it feels complicated and duplicating the type of byteslice, hence just def slice(...); end felt simpler.
It'd be nice if we could say "this has the exact same type as String#byteslice".

Earlopain · 2026-01-20T07:51:03Z

test/prism/fixtures/bom_leading_space.txt

@@ -0,0 +1 @@
+ p (42)


Commited the two snapshots as well. Basically they get created when when running bundle exec rake

Earlopain · 2026-01-20T07:52:58Z

rakelib/typecheck.rake

      --ignore=rakelib/
      --ignore=Rakefile
      --ignore=top-100-gems/
+      #{Dir.glob("*.rb").map { |f| "--ignore=/#{f}" }.join("\n")}


This is nice, thanks

…:Ripper * Handle line continuations. * Handle space at the end of file in LexCompat. Co-authored-by: Earlopain <14981592+Earlopain@users.noreply.github.com>

Earlopain · 2026-01-20T11:19:20Z

This causes test failures in syntax_suggest because it worked around missing on_sp in an incompatible way. I'll handle it ruby/ruby#15914

It used to not emit this token type, but now it does. So when a newer version of prism is present, we can fall back to the same code that ripper uses. Ref: * ruby/ruby#15914 * ruby/prism#3859

eregon · 2026-01-20T11:59:35Z

@Earlopain Oh, sorry about that, I didn't expect that would break.
Thanks a lot for making PRs to fix it!

It used to not emit this token type, but now it does. So when a newer version of prism is present, we can fall back to the same code that ripper uses. Ref: * ruby/ruby#15914 * ruby/prism#3859

It used to not emit this token type, but now it does. So when a newer version of prism is present, we can fall back to the same code that ripper uses. Ref: * ruby#15914 * ruby/prism#3859 ruby/syntax_suggest@42a3b8f6cb

It used to not emit this token type, but now it does. So when a newer version of prism is present, we can fall back to the same code that ripper uses. Ref: * #15914 * ruby/prism#3859 ruby/syntax_suggest@42a3b8f6cb

eregon force-pushed the ripper_on_sp branch from 0a7adbc to af3da37 Compare January 19, 2026 10:16

eregon mentioned this pull request Jan 19, 2026

Prism::Source.for(code) returns an incomplete Prism::Source object #3861

Closed

eregon force-pushed the ripper_on_sp branch from af3da37 to 97fa55f Compare January 19, 2026 12:44

Earlopain mentioned this pull request Jan 19, 2026

Optimize ripper translator #3862

Merged

Earlopain reviewed Jan 19, 2026

View reviewed changes

lib/prism/lex_compat.rb Outdated Show resolved Hide resolved

Earlopain reviewed Jan 19, 2026

View reviewed changes

lib/prism/lex_compat.rb Outdated Show resolved Hide resolved

lib/prism/lex_compat.rb Outdated Show resolved Hide resolved

lib/prism/lex_compat.rb Outdated Show resolved Hide resolved

eregon force-pushed the ripper_on_sp branch 2 times, most recently from e214296 to 68a45bc Compare January 19, 2026 20:01

eregon force-pushed the ripper_on_sp branch from 68a45bc to 9184563 Compare January 19, 2026 20:03

eregon marked this pull request as ready for review January 19, 2026 20:04

Earlopain reviewed Jan 19, 2026

View reviewed changes

lib/prism/lex_compat.rb Show resolved Hide resolved

eregon force-pushed the ripper_on_sp branch 3 times, most recently from f9acebf to b7336e8 Compare January 19, 2026 21:49

eregon force-pushed the ripper_on_sp branch 7 times, most recently from 2541426 to c29d8a9 Compare January 19, 2026 23:49

Ignore *.rb at the root for rake typecheck:sorbet

df7b4e5

eregon force-pushed the ripper_on_sp branch from c29d8a9 to 09be2ee Compare January 19, 2026 23:57

Earlopain force-pushed the ripper_on_sp branch from 09be2ee to 3062051 Compare January 20, 2026 07:53

Earlopain approved these changes Jan 20, 2026

View reviewed changes

Add Ripper :on_sp events for Prism.lex_compat and Prism::Translation:…

32bd13e

…:Ripper * Handle line continuations. * Handle space at the end of file in LexCompat. Co-authored-by: Earlopain <14981592+Earlopain@users.noreply.github.com>

Earlopain force-pushed the ripper_on_sp branch from 3062051 to 32bd13e Compare January 20, 2026 08:00

Earlopain merged commit 38c64e5 into ruby:main Jan 20, 2026
66 checks passed

Earlopain mentioned this pull request Jan 20, 2026

Handle on_sp when using prism ruby/syntax_suggest#241

Closed

schneems mentioned this pull request Jan 20, 2026

Test multiple prism versions in CI + Fix Prism head ruby/syntax_suggest#243

Merged

eregon linked an issue Jan 21, 2026 that may be closed by this pull request

Ripper translation layer relies on upstream Ripper and is not fully compatible #3838

Closed

Conversation

eregon commented Jan 18, 2026

Uh oh!

eregon commented Jan 18, 2026

Uh oh!

eregon commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Earlopain commented Jan 19, 2026

Uh oh!

eregon commented Jan 19, 2026

Uh oh!

Earlopain commented Jan 19, 2026

Uh oh!

Uh oh!

Earlopain left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eregon commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eregon commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eregon commented Jan 19, 2026

Uh oh!

Earlopain left a comment

Choose a reason for hiding this comment

Uh oh!

Earlopain Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

eregon Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Earlopain Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Earlopain Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Earlopain commented Jan 20, 2026

Uh oh!

eregon commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eregon commented Jan 18, 2026 •

edited

Loading

eregon commented Jan 19, 2026 •

edited

Loading

eregon commented Jan 19, 2026 •

edited

Loading