-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
sort: add locale-aware month parsing using ICU #9722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3b395de to
0754860
Compare
|
Going to have to wait and rebase until this PR is in: #9720 |
|
i will have a look once the jobs are green but the approach looks perfect! |
0754860 to
0d9897b
Compare
0d9897b to
0635270
Compare
f6801e0 to
efae246
Compare
|
Had to revert the string lossy change and add a comment, the string lossy modifies the non utf characters |
efae246 to
626314f
Compare
|
GNU testsuite comparison: |
|
I think another one of the utilities is lossy converting the non utf, investigating now |
|
Oh the CI is missing the Japan locale |
|
It passed! it was the Japanese locale missing causing the failure |
|
GNU testsuite comparison: |
|
Look really good ! Thank you for taking over the month sorting :D One thing we should be mindful of is the fact that GNU's month sorting exclusively matches month names against the abbreviated month name (see this discussion I had with ICU guys), so it's not bothering with trimming. Now, this may be a limitation of GNU, and we might not want to stick with it |
|
Sounds like the ICU data abbreviations don't always match the GNU abbreviations, the one specifically that would fail this is the french april. I don't see how we could do this without the non-exclusive matching |
|
@RenjiSann is there a different pathway we should take for this then? Was wondering if you had any thoughts on what the next steps should be? |
|
GNU testsuite comparison: |
|
I assumed that would be a known incompatibility, since I can't come with a better solution |
This PR is to address the lack of locale information when using the -M option in sort since all of the months were hardcoded to the english months. This PR introduces a new dependency icu_datetime to be able to get all of the month locale information.
Originally when trying to fix the associated GNU test it wasn't working because nl was removing non-utf8 characters. Now that the nl issue was addressed this should fix that GNU test.
When implementing this, choices were made that were maybe not the most efficient, but more readable and I think thats the better choice for this type of solution because there will only be a maximum of 12 comparisons when implementing this type of sorting.