Skip to content

Conversation

@eregon
Copy link
Member

@eregon eregon commented Jan 18, 2026

Ref: #3838
On top of #3858 should be merged first.

@eregon
Copy link
Member Author

eregon commented Jan 18, 2026

Failures that were added to excludes:

==============================================================================================================================
Failure: test_dos_endings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 9], :on_sp, "\\\r\n", nil]> expected but was
<[[1, 9], :on_sp, "\\\r\n" + "     ", nil]>

diff:
? [[1, 9], :on_sp, "\\\r\n" + "     ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/str_lit_concat_bad_encodings.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[1, 62], :on_sp, " ", nil]> expected but was
<[[1, 62], :on_sp, " \\\n" + "        ", nil]>

diff:
? [[1, 62], :on_sp, " \\\n" + "        ", nil]
==============================================================================================================================
F
==============================================================================================================================
Failure: test_seattlerb/utf8_bom.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[2, 0], :on_ident, "p", CMDARG]> expected but was
<[[1, 23], :on_sp, "-w\n", BEG]>

diff:
? [[   2, 0], :on_ide     nt, "p", CMDARG]
?   1,  3         sp, "-w\         BE     
?   +++ ???         ??? -----   ?????     
==============================================================================================================================
F
==============================================================================================================================
Failure: test_unparser/corpus/semantic/dstr.txt_lex(Prism::RipperTest)
test/prism/ruby/ripper_test.rb:148:in 'block in Prism::RipperTest#assert_ripper_lex'
<internal:numeric>:257:in 'Integer#times'
test/prism/ruby/ripper_test.rb:127:in 'Prism::RipperTest#assert_ripper_lex'
test/prism/ruby/ripper_test.rb:84:in 'block (2 levels) in <class:RipperTest>'
<[[119, 3], :on_sp, " ", nil]> expected but was
<[[119, 3], :on_sp, " \\\n", nil]>

diff:
? [[119, 3], :on_sp, "     ", nil]
?                      \\\n       
?                     ?       
==============================================================================================================================
Finished in 1.197953795 seconds.
------------------------------------------------------------------------------------------------------------------------------
1892 tests, 36583 assertions, 4 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
99.7886% passed
------------------------------------------------------------------------------------------------------------------------------

@eregon
Copy link
Member Author

eregon commented Jan 18, 2026

:on_sp events also have state on Ripper and it's not always the last token state, example:

$ ruby -rripper -e 'pp Ripper.lex("def m(a) nil end")'  
[[[1, 0], :on_kw, "def", FNAME],
 [[1, 3], :on_sp, " ", FNAME],
 [[1, 4], :on_ident, "m", ENDFN],
 [[1, 5], :on_lparen, "(", BEG|LABEL],
 [[1, 6], :on_ident, "a", ARG],
 [[1, 7], :on_rparen, ")", ENDFN],
 [[1, 8], :on_sp, " ", BEG], <=
 [[1, 9], :on_kw, "nil", END],
 [[1, 12], :on_sp, " ", END],
 [[1, 13], :on_kw, "end", END]]

And more complex cases like whitequark/space_args_block.txt:

EXPECTED:
[[[1, 0], :on_kw, "undef", FNAME|FITEM],
 [[1, 5], :on_sp, " ", FNAME|FITEM], # OK
 [[1, 6], :on_ident, "foo", END],
 [[1, 9], :on_comma, ",", BEG|LABEL],
 [[1, 10], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 11], :on_symbeg, ":", FNAME],
 [[1, 12], :on_ident, "bar", ENDFN],
 [[1, 15], :on_comma, ",", BEG|LABEL],
 [[1, 16], :on_sp, " ", FNAME|FITEM], # what?
 [[1, 17], :on_symbeg, ":\"", FNAME],
 [[1, 19], :on_tstring_content, "foo", FNAME],
 [[1, 22], :on_embexpr_beg, "\#{", FNAME],
 [[1, 24], :on_int, "1", END],
 [[1, 25], :on_embexpr_end, "}", END],
 [[1, 26], :on_tstring_end, "\"", END],
 [[1, 27], :on_nl, "\n", BEG]]

Would there be any way to get the correct state with Prism?

@Earlopain
Copy link
Collaborator

I would not bother with comparing state for this. I'm not particularly motivated to understand what is happening in detail here.

I doubt anyone is relying on this information being correct.

@eregon
Copy link
Member Author

eregon commented Jan 19, 2026

Yeah it's what I'm doing in test/prism/ruby/ripper_test.rb, I'm ignoring the state of :on_sp events for now.
Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states? It seems wasteful to create new instances every time when there are only a few possible states and they are frozen.

@Earlopain
Copy link
Collaborator

Maybe we should always give them some "0/initial" state or so? Or even nil or one less Array element?

Just putting in the state from the previous token seems fine. When it lines up, great. If not, oh well. Doesn't really matter in what way it's wrong. nil doesn't sound great for compatibility.

Slightly related, is Translation::Ripper::Lexer::State.new(Translation::Ripper::EXPR_BEG) the correct way to create Ripper states?

I believe that is what ripper itself does. I haven't benchmarked this in detail yet but there are definitly some hotspots. Much time is spend sorting tokens by location for example.

If changing this makes a noticable difference, I wouldn't be opposed to it. Here's what I use to quickly check:

require "ripper"
require "prism"
require "benchmark/ips"

codes = Dir["**/*.rb"].map { File.read(it) }

Benchmark.ips do |x|
  x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
  x.report("ripper") { codes.each { Ripper.lex(it) } }

  x.compare!
end

I was also a bit concerned about the performance of this implementation but it is not so bad. Compared to ripper it goes from 2.3x times slower to 2.6x times slower which seems fine.

Copy link
Collaborator

@Earlopain Earlopain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a draft but this is basically good to me.

@eregon eregon force-pushed the ripper_on_sp branch 2 times, most recently from e214296 to 68a45bc Compare January 19, 2026 20:01
@eregon
Copy link
Member Author

eregon commented Jan 19, 2026

@eregon eregon marked this pull request as ready for review January 19, 2026 20:04
@eregon eregon force-pushed the ripper_on_sp branch 3 times, most recently from f9acebf to b7336e8 Compare January 19, 2026 21:49
@eregon
Copy link
Member Author

eregon commented Jan 19, 2026

Taking notes, next failure is

# top-100-gems/graphql-2.0.21/lib/generators/graphql/install/mutation_root_generator.rb
  end
end #<trailing space here
- [[34, 0], :on_kw, "end", END],
- [[34, 3], :on_sp, " ", END]]
+ [[34, 0], :on_kw, "end", END]]

UPDATE: fixed now

@eregon eregon force-pushed the ripper_on_sp branch 7 times, most recently from 2541426 to c29d8a9 Compare January 19, 2026 23:49
@eregon
Copy link
Member Author

eregon commented Jan 19, 2026

The BOM (byte order mark) edge cases are madness but I think I got them correct now 😅

Copy link
Collaborator

@Earlopain Earlopain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised that this seems to handle all the cases. Nice work.


sig { params(byte_offset: Integer, length: Integer).returns(String) }
def slice(byte_offset, length); end
def slice(...); end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know if this is ok. It typechecks but that's about it about my knowledge. I hate fighting with these tools. I'm just going to revert this and update the two places that pass a range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read some Sorbet docs and asked ChatGPT and basically Sorbet seems to not really support ....
I tried just leaving sig { returns(String) } but it was not happy about it.
Maybe overloads like this would work:

sig { overload.params(a: Integer, b: Integer).void }
sig { overload.params(r: T::Range[Integer]).void }
def slice(*args); end

But it feels complicated and duplicating the type of byteslice, hence just def slice(...); end felt simpler.
It'd be nice if we could say "this has the exact same type as String#byteslice".

@@ -0,0 +1 @@
 p (42)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commited the two snapshots as well. Basically they get created when when running bundle exec rake

--ignore=rakelib/
--ignore=Rakefile
--ignore=top-100-gems/
#{Dir.glob("*.rb").map { |f| "--ignore=/#{f}" }.join("\n")}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, thanks

…:Ripper

* Handle line continuations.
* Handle space at the end of file in LexCompat.

Co-authored-by: Earlopain <14981592+Earlopain@users.noreply.github.com>
@Earlopain Earlopain merged commit 38c64e5 into ruby:main Jan 20, 2026
66 checks passed
@Earlopain
Copy link
Collaborator

This causes test failures in syntax_suggest because it worked around missing on_sp in an incompatible way. I'll handle it ruby/ruby#15914

Earlopain added a commit to Earlopain/syntax_suggest that referenced this pull request Jan 20, 2026
It used to not emit this token type, but now it does.
So when a newer version of prism is present, we can fall back
to the same code that ripper uses.

Ref:
* ruby/ruby#15914
* ruby/prism#3859
Earlopain added a commit to Earlopain/syntax_suggest that referenced this pull request Jan 20, 2026
It used to not emit this token type, but now it does.
So when a newer version of prism is present, we can fall back
to the same code that ripper uses.

Ref:
* ruby/ruby#15914
* ruby/prism#3859
Earlopain added a commit to Earlopain/syntax_suggest that referenced this pull request Jan 20, 2026
It used to not emit this token type, but now it does.
So when a newer version of prism is present, we can fall back
to the same code that ripper uses.

Ref:
* ruby/ruby#15914
* ruby/prism#3859
@eregon
Copy link
Member Author

eregon commented Jan 20, 2026

@Earlopain Oh, sorry about that, I didn't expect that would break.
Thanks a lot for making PRs to fix it!

Earlopain added a commit to Earlopain/syntax_suggest that referenced this pull request Jan 20, 2026
It used to not emit this token type, but now it does.
So when a newer version of prism is present, we can fall back
to the same code that ripper uses.

Ref:
* ruby/ruby#15914
* ruby/prism#3859
Earlopain added a commit to Earlopain/syntax_suggest that referenced this pull request Jan 20, 2026
It used to not emit this token type, but now it does.
So when a newer version of prism is present, we can fall back
to the same code that ripper uses.

Ref:
* ruby/ruby#15914
* ruby/prism#3859
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants