Korean TN fixes: cardinal, decimal, fraction, date#374
Korean TN fixes: cardinal, decimal, fraction, date#374bbae0312 wants to merge 6 commits intoNVIDIA:ko_tn_staging_v1from
Conversation
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
…zation Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
|
This PR was closed because it has been inactive for 7 days since being marked as stale. |
| optional_sign = pynini.closure(pynutil.insert('negative: "true" ') + pynini.cross("-", ""), 0, 1) | ||
| final_graph = optional_sign + pynutil.insert('integer: "') + graph_num + pynutil.insert('"') | ||
| # Delete group separators when they appear between digits (e.g., "1,234" -> "1234") | ||
| delete_sep_between_digits = pynini.cdrewrite( |
There was a problem hiding this comment.
checking: is there any occurence of European numbering in Korean text?
There was a problem hiding this comment.
It does show up sometimes, but not very often. I agree it may be better to drop it and keep the Korean cardinal grammar simpler.
There was a problem hiding this comment.
Okay we can assume canonical numbering along the US standard.
nemo_text_processing/text_normalization/ko/taggers/electronic.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/telephone.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/telephone.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/tokenize_and_classify.py
Outdated
Show resolved
Hide resolved
|
@bbae0312 Can you confirm tests passing (sparrowhawk and unit)? |
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
What does this PR do ?
Add fixes and improvements for Korean TN: cardinal, decimal, ordinal, fraction, date, and post-processing.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.