forked from pinard/Recode
-
-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
In the recode 3.7.14 manual:
The Recode library is able to combine 'UCS-2' some sequences of codes
into single code characters, to represent a few diacriticized
characters, ligatures or diphtongs which have been included to ease
mapping with other existing charsets. It is also able to explode such
single code characters into the corresponding sequence of codes. The
request syntax for triggering such operations is rudimentary and
temporary. The 'combined-UCS-2' pseudo character set is a special form
of 'UCS-2' in which known combinings have been replaced by the simpler
code. Using 'combined-UCS-2' instead of 'UCS-2' in an _after_ position
of a request forces a combining step, while using 'combined-UCS-2'
instead of 'UCS-2' in a _before_ position of a request forces an
exploding step. For the time being, one has to resort to advanced
request syntax to achieve other effects. For example:
recode u8..co,u2..u8 < INPUT > OUTPUT
copies an 'UTF-8' INPUT over OUTPUT, still to be in 'UTF-8', yet merging
combining characters into single codes whenever possible.
However, nowadays not all characters can be represented in UCS-2. UCS-4 should be used instead. So it would be nice to have combined-UCS-4.
rrthomas
Metadata
Metadata
Assignees
Labels
No labels