-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
Description
Bug report
Bug description:
Per UAX #44, "entries for a code point may be omitted in a data file of the code point has a default value for the property in question.".
Additionally, certain default properties can be "complex", which means that the correct default for the value is contingent on a code point range or other conditions. In the case of BiDi, these default values are defined for code point ranges, in DerivedBidiClass.txt.
As an illustrative example, consider two code points, U+0378 and U+0590. The first is in the private use block, and the second is an unassigned code point in the hebrew block. Per DerivedBidiClass.txt, the first should be assigned Left_To_Right (L) and the second should be assigned 'Right_To_Left' (R); however unicodedata.bidrectional returns the empty string in both cases.
Although I have not investigated thoroughly, it appears that other complex default values are also not being handled correctly: for instance UAX#11 indicates that the East_Asian_Width property of U+2FFFE should be 'W', but unicodedata.east_asian_width returns 'N'.
This issue was first encountered in python 3.13.7, and I have verified it is still present in 3.15, as of 8801c6d.
import unicodedata
# both assertions should pass
assert unicodedata.bidirectional(chr(0x0378)) == 'L'
assert unicodedata.bidirectional(chr(0x0590)) == 'R'CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS