Add Normalization Form and Language to ParatextSettingsParser#313
Add Normalization Form and Language to ParatextSettingsParser#313TaperChipmunk32 wants to merge 4 commits into
Conversation
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed 2 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on TaperChipmunk32).
machine/corpora/paratext_project_settings.py line 25 at r1 (raw file):
language_code: Optional[str] translation_type: str normalization_form: str
It might be good to test these in the settings parser tests somehow - especially with how many properties there now are.
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
parent_guid = translation_info_setting_parts[2] if translation_info_setting_parts[2] != "" else None normalization_form: str = settings_tree.getroot().findtext("NormalizationForm", "Off")
Are you confident this should be the default? Looking here https://paratext.org/2019/04/02/announcement-of-feature-to-control-unicode-normalization/, it looks like 'Off' is what it is for older projects migrated over to Paratext 8, but that NFC is the new default. I wonder what the true default behavior is if this setting doesn't exist 🤔. Alternatively, if it is truly missing, maybe None is better? What do you think?
|
Previously, Enkidu93 (Eli C. Lowry) wrote…
When Normalization is |
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit and TaperChipmunk32).
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
When Normalization is
Off, Paratext does not include theNormalizationFormin the Settings file, as far as I can tell, which is why the default I chose isOff. I think Off is never actually in the Settings file, so when a new Paratext project is made with the default,NFC, it is still included in the Settings file.
OK, hmm, so if Off is never a NormalizationForm from the Settings.xml perspective, maybe it shouldn't be the default here - i.e., maybe the thing to do here is to just set it to None (like we do for some of the other settings) and then on your end in the script you're making, interpret that None as 'Off' 🤷♂️. @ddaspit, any thoughts?
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on Enkidu93 and TaperChipmunk32).
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
OK, hmm, so if
Offis never aNormalizationFormfrom the Settings.xml perspective, maybe it shouldn't be the default here - i.e., maybe the thing to do here is to just set it toNone(like we do for some of the other settings) and then on your end in the script you're making, interpret thatNoneas'Off'🤷♂️. @ddaspit, any thoughts?
The default value is actually Undefined. If NormalizationForm is set to Off in Paratext, then the value will be Undefined. If it is not set in the settings file, then the value will be Undefined.
|
@Enkidu93 Should I make the default |
Enkidu93
left a comment
There was a problem hiding this comment.
Whatever Damien thinks: Sounds like 'Undefined' is the way to go.
If you rebase, you'll see some new tests that cover the other settings properties. If you could just add the normalization form there as well, that would great! Thank you!
@Enkidu93 made 1 comment and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on TaperChipmunk32).
3617b02 to
0b725b5
Compare
TaperChipmunk32
left a comment
There was a problem hiding this comment.
@TaperChipmunk32 made 2 comments.
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on ddaspit and Enkidu93).
machine/corpora/paratext_project_settings.py line 25 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It might be good to test these in the settings parser tests somehow - especially with how many properties there now are.
Done.
machine/corpora/paratext_project_settings_parser_base.py line 102 at r1 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The default value is actually
Undefined. IfNormalizationFormis set toOffin Paratext, then the value will beUndefined. If it is not set in the settings file, then the value will beUndefined.
I have pushed a commit that sets the default to "Undefined".
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed 5 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on TaperChipmunk32).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #313 +/- ##
=======================================
Coverage 90.39% 90.39%
=======================================
Files 355 355
Lines 22843 22850 +7
=======================================
+ Hits 20648 20655 +7
Misses 2195 2195 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This PR add
normalization_formandlanguagetoParatextProjectSettings, from theNormalizationFormandLanguagefields in a project'sSettings.xmlfile. These are both wanted for some improved Onboarding information collection.This change is