Skip to content

Narrowing type hints, update docstring#1344

Closed
bact wants to merge 5 commits intoPyThaiNLP:devfrom
bact:fix-type-hints
Closed

Narrowing type hints, update docstring#1344
bact wants to merge 5 commits intoPyThaiNLP:devfrom
bact:fix-type-hints

Conversation

@bact
Copy link
Copy Markdown
Member

@bact bact commented Mar 19, 2026

  • Passed code styles and structures
  • Passed code linting checks and unit test

Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>
@bact bact added this to the 5.3.2 milestone Mar 19, 2026
@bact bact added the documentation improve documentation and test cases label Mar 19, 2026
@bact bact added this to PyThaiNLP Mar 19, 2026
@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Mar 19, 2026
@bact bact self-assigned this Mar 19, 2026
@bact bact moved this to In progress in PyThaiNLP Mar 19, 2026
@bact bact modified the milestones: 5.3.2, 6.0 Mar 19, 2026
@bact bact marked this pull request as draft March 19, 2026 16:13
@bact bact modified the milestones: 6.0, 5.3.3 Mar 19, 2026
@bact bact requested a review from Copilot March 20, 2026 11:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on tightening type hints and docstrings across PyThaiNLP modules, largely by replacing # type: ignore with explicit typing constructs (cast, NDArray[...], narrowed generics) and adjusting a few global model singletons to cache-based patterns.

Changes:

  • Replace many # type: ignore return annotations with cast(...) and more precise types (including numpy.typing.NDArray).
  • Refactor some global model singletons into keyed caches (e.g., by device/model) to avoid cross-call state issues.
  • Update docstrings and tooling configuration (notably pyproject.toml mypy/tox formatting changes).

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pythainlp/wsd/core.py Refactor WSD model handling to a device-keyed cache; tighten typing and return conversions.
pythainlp/word_vector/core.py Replace ignores with casts; clarify sentence_vectorizer return type and dtype.
pythainlp/wangchanberta/core.py Replace ignore with cast for tokenizer output typing.
pythainlp/util/thai.py Reformat constant map; narrow return types for Thai character analysis helpers.
pythainlp/util/pronounce.py Formatting-only readability change in comprehension.
pythainlp/util/numtoword.py Broaden bahttext parameter type and update docstring typing.
pythainlp/util/keywords.py Narrow rank() return type and clarify docstring contract.
pythainlp/util/collate.py Type collate() input as Iterable[str] and align docstring types.
pythainlp/util/abbreviation.py Replace return ignore with cast for optional-score tuples.
pythainlp/ulmfit/core.py Tighten rule/token types, NDArray returns, and float32 conversion behavior.
pythainlp/transliterate/wunsen.py Replace ignore with cast for transliteration output.
pythainlp/transliterate/w2p.py Narrow internal NDArray typing; add docstrings for numeric helpers.
pythainlp/transliterate/umt5_thaig2p.py Replace ignore with cast and structured pipeline output typing.
pythainlp/transliterate/tltk.py Replace ignores with casts for third-party outputs; minor refactor.
pythainlp/transliterate/thaig2p_v2.py Replace ignore with cast and structured pipeline output typing.
pythainlp/transliterate/thaig2p.py Add type: ignore[misc] on torch modules and narrow NDArray generics.
pythainlp/transliterate/thai2rom_onnx.py Improve IO encoding, NDArray typing, and fix array end-token comparison.
pythainlp/transliterate/thai2rom.py Add type: ignore[misc] on torch module classes.
pythainlp/transliterate/lookup.py Replace return ignores with casts for typed optionals/strings.
pythainlp/transliterate/ipa.py Replace ignores with casts for epitran outputs.
pythainlp/translate/word2word_translate.py Replace ignore with cast for optional list return.
pythainlp/translate/tokenization_small100.py Replace ignores with casts; narrow state/vocab typing.
pythainlp/tokenize/newmm.py Add type parameters to defaultdict graph used in BFS.
pythainlp/tokenize/han_solo.py Narrow featurizer return types to list[Any] payloads.
pythainlp/tokenize/core.py Replace return ignore with cast for paragraph tokenization typing.
pythainlp/tag/wangchanberta_onnx.py Tighten NDArray typing, providers defaulting, and SentencePiece API usage.
pythainlp/tag/tltk.py Replace ignore with cast for POS tagging output typing.
pythainlp/tag/thainer.py Add explicit feature dict typing for NER feature extraction.
pythainlp/tag/thai_nner.py Narrow entity dict typing and expand docstrings/exception chaining.
pythainlp/tag/crfchunk.py Add a blank line for module formatting consistency.
pythainlp/tag/chunk.py Add a blank line for module formatting consistency.
pythainlp/tag/_tag_perceptron.py Simplify saved JSON payload typing to Any.
pythainlp/summarize/keybert.py Add NDArray typing, float32 conversions, and exception chaining.
pythainlp/summarize/freq.py Narrow ranking/frequency typing and adjust ranking implementation.
pythainlp/spell/words_spelling_correction.py Replace import checks with import_module, add docstrings, and add cache.
pythainlp/spell/wanchanberta_thai_grammarly.py Add type: ignore[misc] on torch module class.
pythainlp/spell/tltk.py Replace ignore with cast for spell candidate outputs.
pythainlp/spell/phunspell.py Replace ignore with cast for correction output typing.
pythainlp/soundex/sound.py Replace ignore with cast for panphon output typing.
pythainlp/soundex/complete_soundex.py Narrow internal helper signatures and return tuple typing.
pythainlp/phayathaibert/core.py Tighten callable typing and replace ignore with cast for tokenizer output.
pythainlp/parse/ud_goeswith.py Rename intermediate variables and reformat vectorized operations for clarity.
pythainlp/generate/wangchanglm.py Narrow regex pattern typing and replace ignore with cast for decode.
pythainlp/el/core.py Tighten return types for entity linking results and fix docstring param name.
pythainlp/el/_multiel.py Tighten EL output typing and replace ignore with cast.
pythainlp/corpus/wordnet.py Replace ignores with casts and align custom_lemmas return behavior.
pythainlp/corpus/core.py Replace ignores with casts and adjust minor formatting.
pythainlp/corpus/common.py Reformat complex condition and dict construction for readability.
pythainlp/coref/core.py Replace singleton with cache keyed by (model_name, device); tighten return typing.
pythainlp/chunk/crfchunk.py Replace ignore with cast for tagger outputs.
pythainlp/chunk/init.py Add a blank line for module formatting consistency.
pythainlp/benchmarks/word_tokenization.py Tighten stats typing, improve exception chaining, and type NDArray helpers.
pythainlp/augment/wordnet.py Narrow WordNet augmentation internal list typings.
pythainlp/augment/word2vec/core.py Minor formatting for long call readability.
pythainlp/augment/word2vec/bpemb_wv.py Replace ignore with cast for BPEmb tokenizer output.
pythainlp/ancient/currency.py Narrow return type and fix minor comment formatting.
pyproject.toml Reformat extras/tox entries; enable mypy strictness; expand ignore-missing-imports modules.

Comment on lines +33 to 42
def bahttext(number: Optional[float]) -> str:
"""Converts a number to Thai text and adds
a suffix "บาท" (Baht).
The precision will be fixed at two decimal places (0.00)
to fit "สตางค์" (Satang) unit.
This function works similarly to the `BAHTTEXT` function in Microsoft Excel.

:param float number: number to be converted into Thai Baht currency format
:param Optional[float] number: number to be converted into Thai Baht
currency format
:return: text representing the amount of money in the format
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bahttext() signature now accepts Optional[float], but when number is None the function returns an empty string. That’s a behavioral change from the previous contract and isn’t documented in the docstring; consider either rejecting None (raise) or explicitly documenting the None -> "" behavior to avoid surprising callers.

Copilot uses AI. Check for mistakes.
Comment on lines +441 to 443
strict_optional = true

warn_no_return = true
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling global mypy strict = true can introduce many new type-checking failures across the project and CI (the repo runs mypy pythainlp in PRs). If this PR isn’t also updating the codebase to be strict-clean, consider scoping strictness via per-module overrides or enabling the stricter flags incrementally.

Copilot uses AI. Check for mistakes.
Comment on lines 34 to 38
def __compute_frequencies(
self, word_tokenized_sents: list[list[str]]
) -> defaultdict:
) -> defaultdict[str, float]:
word_freqs: defaultdict[str, float] = defaultdict(int)
for sent in word_tokenized_sents:
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__compute_frequencies() declares defaultdict[str, float] but initializes it with defaultdict(int), which produces int defaults and can break type-checking (especially with mypy strict). Consider using a float default factory (e.g., defaultdict(float)) or otherwise ensuring the declared value type matches the factory.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation improve documentation and test cases refactoring a technical improvement which does not add any new features or change existing features.

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants