feat: make min_word_length parameter configurable (#120)#153
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@cardmagic can you take a look, please? |
cardmagic
left a comment
There was a problem hiding this comment.
Thanks for this PR! The feature looks good overall. I've left a few inline comments - one critical bug to fix before merging.
Greptile SummaryThis PR adds a configurable
Confidence Score: 3/5The overall change is straightforward but carries one copy-paste mistake in TFIDF's JSON deserialization that corrupts The TFIDF
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Classifier
participant Config
participant WordHash as String#word_hash
User->>Classifier: "Classifier.configure { |c| c.min_word_length = 5 }"
Classifier->>Config: "config.min_word_length = 5"
User->>Classifier: Bayes.new('Spam', min_word_length: 4)
Classifier->>Classifier: "@min_word_length = 4"
User->>Classifier: train('Spam', 'some text')
Classifier->>WordHash: "text.word_hash(@min_word_length)"
WordHash-->>Classifier: filtered word hash
User->>Classifier: classifier.to_json
Classifier-->>User: JSON with min_word_length: 4
User->>Classifier: Bayes.from_json(json)
Classifier->>Config: "data['min_word_length'] || config.min_word_length"
Classifier-->>User: restored instance
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
lib/classifier/tfidf.rb:238-239
Wrong ivar name in `from_json_data`. Line 238 sets `@storage` to `nil`, then line 239 immediately overwrites `@storage` with the deserialized integer value. `@min_word_length` is never restored, so a round-tripped TFIDF instance silently falls back to the global default instead of its saved setting, and `@storage` is left as an integer instead of `nil`.
```suggestion
instance.instance_variable_set(:@storage, nil)
instance.instance_variable_set(:@min_word_length, data['min_word_length'] || Classifier.config.min_word_length)
```
### Issue 2 of 2
test/config/config_test.rb:5-11
`test_configure` mutates the memoized `Classifier.config` singleton (sets `min_word_length` to 1) and never resets it. Any test that runs after this one in the same process and relies on `Classifier.config.min_word_length` defaulting to 3 — for example, Bayes or TFIDF tests that omit an explicit `min_word_length` — will see 1 instead of 3, causing spurious failures. Add a `teardown` that resets `Classifier.instance_variable_set(:@config, nil)` (or sets it back to 3) to isolate the mutation.
Reviews (1): Last reviewed commit: "feat: make `min_word_length` parameter c..." | Re-trigger Greptile |
| instance.instance_variable_set(:@storage, nil) | ||
| instance.instance_variable_set(:@storage, data['min_word_length'] || Classifier.config.min_word_length) |
There was a problem hiding this comment.
Wrong ivar name in
from_json_data. Line 238 sets @storage to nil, then line 239 immediately overwrites @storage with the deserialized integer value. @min_word_length is never restored, so a round-tripped TFIDF instance silently falls back to the global default instead of its saved setting, and @storage is left as an integer instead of nil.
| instance.instance_variable_set(:@storage, nil) | |
| instance.instance_variable_set(:@storage, data['min_word_length'] || Classifier.config.min_word_length) | |
| instance.instance_variable_set(:@storage, nil) | |
| instance.instance_variable_set(:@min_word_length, data['min_word_length'] || Classifier.config.min_word_length) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/classifier/tfidf.rb
Line: 238-239
Comment:
Wrong ivar name in `from_json_data`. Line 238 sets `@storage` to `nil`, then line 239 immediately overwrites `@storage` with the deserialized integer value. `@min_word_length` is never restored, so a round-tripped TFIDF instance silently falls back to the global default instead of its saved setting, and `@storage` is left as an integer instead of `nil`.
```suggestion
instance.instance_variable_set(:@storage, nil)
instance.instance_variable_set(:@min_word_length, data['min_word_length'] || Classifier.config.min_word_length)
```
How can I resolve this? If you propose a fix, please make it concise.| def test_configure | ||
| Classifier.configure do |config| | ||
| config.min_word_length = 1 | ||
| end | ||
|
|
||
| assert_equal(1, Classifier.config.min_word_length) | ||
| end |
There was a problem hiding this comment.
test_configure mutates the memoized Classifier.config singleton (sets min_word_length to 1) and never resets it. Any test that runs after this one in the same process and relies on Classifier.config.min_word_length defaulting to 3 — for example, Bayes or TFIDF tests that omit an explicit min_word_length — will see 1 instead of 3, causing spurious failures. Add a teardown that resets Classifier.instance_variable_set(:@config, nil) (or sets it back to 3) to isolate the mutation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/config/config_test.rb
Line: 5-11
Comment:
`test_configure` mutates the memoized `Classifier.config` singleton (sets `min_word_length` to 1) and never resets it. Any test that runs after this one in the same process and relies on `Classifier.config.min_word_length` defaulting to 3 — for example, Bayes or TFIDF tests that omit an explicit `min_word_length` — will see 1 instead of 3, causing spurious failures. Add a `teardown` that resets `Classifier.instance_variable_set(:@config, nil)` (or sets it back to 3) to isolate the mutation.
How can I resolve this? If you propose a fix, please make it concise.Add missing instance variable declarations for Steep type checker: - @min_word_length in Bayes, TFIDF, LogisticRegression, LSI - @config and method signatures in Config module These annotations allow the type checker to properly verify the new min_word_length parameter added in this PR.
Steep Type Check FixThe CI is failing because the 1.
|
Closes #120