Skip to content

should support combined-UCS-4, replacing combined-UCS-2 #63

@vinc17fr

Description

@vinc17fr

In the recode 3.7.14 manual:

   The Recode library is able to combine 'UCS-2' some sequences of codes
into single code characters, to represent a few diacriticized
characters, ligatures or diphtongs which have been included to ease
mapping with other existing charsets.  It is also able to explode such
single code characters into the corresponding sequence of codes.  The
request syntax for triggering such operations is rudimentary and
temporary.  The 'combined-UCS-2' pseudo character set is a special form
of 'UCS-2' in which known combinings have been replaced by the simpler
code.  Using 'combined-UCS-2' instead of 'UCS-2' in an _after_ position
of a request forces a combining step, while using 'combined-UCS-2'
instead of 'UCS-2' in a _before_ position of a request forces an
exploding step.  For the time being, one has to resort to advanced
request syntax to achieve other effects.  For example:

     recode u8..co,u2..u8 < INPUT > OUTPUT

copies an 'UTF-8' INPUT over OUTPUT, still to be in 'UTF-8', yet merging
combining characters into single codes whenever possible.

However, nowadays not all characters can be represented in UCS-2. UCS-4 should be used instead. So it would be nice to have combined-UCS-4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions