Narrowing type hints, update docstring by bact · Pull Request #1344 · PyThaiNLP/pythainlp

bact · 2026-03-19T16:08:18Z

Passed code styles and structures
Passed code linting checks and unit test

Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>

Copilot

Pull request overview

This PR focuses on tightening type hints and docstrings across PyThaiNLP modules, largely by replacing # type: ignore with explicit typing constructs (cast, NDArray[...], narrowed generics) and adjusting a few global model singletons to cache-based patterns.

Changes:

Replace many # type: ignore return annotations with cast(...) and more precise types (including numpy.typing.NDArray).
Refactor some global model singletons into keyed caches (e.g., by device/model) to avoid cross-call state issues.
Update docstrings and tooling configuration (notably pyproject.toml mypy/tox formatting changes).

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pythainlp/wsd/core.py	Refactor WSD model handling to a device-keyed cache; tighten typing and return conversions.
pythainlp/word_vector/core.py	Replace ignores with casts; clarify `sentence_vectorizer` return type and dtype.
pythainlp/wangchanberta/core.py	Replace ignore with cast for tokenizer output typing.
pythainlp/util/thai.py	Reformat constant map; narrow return types for Thai character analysis helpers.
pythainlp/util/pronounce.py	Formatting-only readability change in comprehension.
pythainlp/util/numtoword.py	Broaden `bahttext` parameter type and update docstring typing.
pythainlp/util/keywords.py	Narrow `rank()` return type and clarify docstring contract.
pythainlp/util/collate.py	Type `collate()` input as `Iterable[str]` and align docstring types.
pythainlp/util/abbreviation.py	Replace return ignore with cast for optional-score tuples.
pythainlp/ulmfit/core.py	Tighten rule/token types, NDArray returns, and float32 conversion behavior.
pythainlp/transliterate/wunsen.py	Replace ignore with cast for transliteration output.
pythainlp/transliterate/w2p.py	Narrow internal NDArray typing; add docstrings for numeric helpers.
pythainlp/transliterate/umt5_thaig2p.py	Replace ignore with cast and structured pipeline output typing.
pythainlp/transliterate/tltk.py	Replace ignores with casts for third-party outputs; minor refactor.
pythainlp/transliterate/thaig2p_v2.py	Replace ignore with cast and structured pipeline output typing.
pythainlp/transliterate/thaig2p.py	Add `type: ignore[misc]` on torch modules and narrow NDArray generics.
pythainlp/transliterate/thai2rom_onnx.py	Improve IO encoding, NDArray typing, and fix array end-token comparison.
pythainlp/transliterate/thai2rom.py	Add `type: ignore[misc]` on torch module classes.
pythainlp/transliterate/lookup.py	Replace return ignores with casts for typed optionals/strings.
pythainlp/transliterate/ipa.py	Replace ignores with casts for epitran outputs.
pythainlp/translate/word2word_translate.py	Replace ignore with cast for optional list return.
pythainlp/translate/tokenization_small100.py	Replace ignores with casts; narrow state/vocab typing.
pythainlp/tokenize/newmm.py	Add type parameters to `defaultdict` graph used in BFS.
pythainlp/tokenize/han_solo.py	Narrow featurizer return types to `list[Any]` payloads.
pythainlp/tokenize/core.py	Replace return ignore with cast for paragraph tokenization typing.
pythainlp/tag/wangchanberta_onnx.py	Tighten NDArray typing, providers defaulting, and SentencePiece API usage.
pythainlp/tag/tltk.py	Replace ignore with cast for POS tagging output typing.
pythainlp/tag/thainer.py	Add explicit feature dict typing for NER feature extraction.
pythainlp/tag/thai_nner.py	Narrow entity dict typing and expand docstrings/exception chaining.
pythainlp/tag/crfchunk.py	Add a blank line for module formatting consistency.
pythainlp/tag/chunk.py	Add a blank line for module formatting consistency.
pythainlp/tag/_tag_perceptron.py	Simplify saved JSON payload typing to `Any`.
pythainlp/summarize/keybert.py	Add NDArray typing, float32 conversions, and exception chaining.
pythainlp/summarize/freq.py	Narrow ranking/frequency typing and adjust ranking implementation.
pythainlp/spell/words_spelling_correction.py	Replace import checks with `import_module`, add docstrings, and add cache.
pythainlp/spell/wanchanberta_thai_grammarly.py	Add `type: ignore[misc]` on torch module class.
pythainlp/spell/tltk.py	Replace ignore with cast for spell candidate outputs.
pythainlp/spell/phunspell.py	Replace ignore with cast for correction output typing.
pythainlp/soundex/sound.py	Replace ignore with cast for panphon output typing.
pythainlp/soundex/complete_soundex.py	Narrow internal helper signatures and return tuple typing.
pythainlp/phayathaibert/core.py	Tighten callable typing and replace ignore with cast for tokenizer output.
pythainlp/parse/ud_goeswith.py	Rename intermediate variables and reformat vectorized operations for clarity.
pythainlp/generate/wangchanglm.py	Narrow regex pattern typing and replace ignore with cast for decode.
pythainlp/el/core.py	Tighten return types for entity linking results and fix docstring param name.
pythainlp/el/_multiel.py	Tighten EL output typing and replace ignore with cast.
pythainlp/corpus/wordnet.py	Replace ignores with casts and align `custom_lemmas` return behavior.
pythainlp/corpus/core.py	Replace ignores with casts and adjust minor formatting.
pythainlp/corpus/common.py	Reformat complex condition and dict construction for readability.
pythainlp/coref/core.py	Replace singleton with cache keyed by (model_name, device); tighten return typing.
pythainlp/chunk/crfchunk.py	Replace ignore with cast for tagger outputs.
pythainlp/chunk/init.py	Add a blank line for module formatting consistency.
pythainlp/benchmarks/word_tokenization.py	Tighten stats typing, improve exception chaining, and type NDArray helpers.
pythainlp/augment/wordnet.py	Narrow WordNet augmentation internal list typings.
pythainlp/augment/word2vec/core.py	Minor formatting for long call readability.
pythainlp/augment/word2vec/bpemb_wv.py	Replace ignore with cast for BPEmb tokenizer output.
pythainlp/ancient/currency.py	Narrow return type and fix minor comment formatting.
pyproject.toml	Reformat extras/tox entries; enable mypy strictness; expand ignore-missing-imports modules.

Copilot · 2026-03-20T11:50:31Z

pythainlp/util/numtoword.py

+def bahttext(number: Optional[float]) -> str:
    """Converts a number to Thai text and adds
    a suffix "บาท" (Baht).
    The precision will be fixed at two decimal places (0.00)
    to fit "สตางค์" (Satang) unit.
    This function works similarly to the `BAHTTEXT` function in Microsoft Excel.

-    :param float number: number to be converted into Thai Baht currency format
+    :param Optional[float] number: number to be converted into Thai Baht
+        currency format
    :return: text representing the amount of money in the format


The bahttext() signature now accepts Optional[float], but when number is None the function returns an empty string. That’s a behavioral change from the previous contract and isn’t documented in the docstring; consider either rejecting None (raise) or explicitly documenting the None -> "" behavior to avoid surprising callers.

Copilot · 2026-03-20T11:50:32Z

pyproject.toml

+strict_optional = true
+
 warn_no_return = true


Enabling global mypy strict = true can introduce many new type-checking failures across the project and CI (the repo runs mypy pythainlp in PRs). If this PR isn’t also updating the codebase to be strict-clean, consider scoping strictness via per-module overrides or enabling the stricter flags incrementally.

pythainlp/tag/thai_nner.py

Copilot · 2026-03-20T11:50:32Z

pythainlp/summarize/freq.py

    def __compute_frequencies(
        self, word_tokenized_sents: list[list[str]]
-    ) -> defaultdict:
+    ) -> defaultdict[str, float]:
        word_freqs: defaultdict[str, float] = defaultdict(int)
        for sent in word_tokenized_sents:


__compute_frequencies() declares defaultdict[str, float] but initializes it with defaultdict(int), which produces int defaults and can break type-checking (especially with mypy strict). Consider using a float default factory (e.g., defaultdict(float)) or otherwise ensuring the declared value type matches the factory.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

sonarqubecloud · 2026-03-21T23:00:24Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.3% Duplication on New Code

See analysis details on SonarQube Cloud

Narrowing type hints, update docstring

5e4be64

Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>

bact added this to the 5.3.2 milestone Mar 19, 2026

bact added the documentation improve documentation and test cases label Mar 19, 2026

bact added this to PyThaiNLP Mar 19, 2026

bact added the refactoring a technical improvement which does not add any new features or change existing features. label Mar 19, 2026

bact self-assigned this Mar 19, 2026

bact moved this to In progress in PyThaiNLP Mar 19, 2026

bact modified the milestones: 5.3.2, 6.0 Mar 19, 2026

bact marked this pull request as draft March 19, 2026 16:13

bact modified the milestones: 6.0, 5.3.3 Mar 19, 2026

bact requested a review from Copilot March 20, 2026 11:46

Copilot started reviewing on behalf of bact March 20, 2026 11:46 View session

Merge branch 'dev' into fix-type-hints

1528a8d

Copilot AI reviewed Mar 20, 2026

View reviewed changes

bact and others added 3 commits March 20, 2026 11:57

Update pythainlp/tag/thai_nner.py

9e5bb06

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix frequency computation and type annotations

ee0a8d6

Merge branch 'dev' into fix-type-hints

8c11b79

Copilot AI mentioned this pull request Mar 21, 2026

Fix runtime AttributeError in wordnet type casts, improve type safety and API contracts #1354

Merged

2 tasks

bact closed this Mar 24, 2026

bact deleted the fix-type-hints branch March 24, 2026 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Narrowing type hints, update docstring#1344

Narrowing type hints, update docstring#1344
bact wants to merge 5 commits intoPyThaiNLP:devfrom
bact:fix-type-hints

bact commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

sonarqubecloud bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bact commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 21, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants