Skip to content

Support for the ZOS_UNIX surface for EBCDIC encodings #49

@bhaible

Description

@bhaible

For the end-of-line handling, the only documented surfaces so far are CR and CR-LF. (Doc node "Representation for end of lines")

The Unicode Standard https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf explains (section 5.8 "Newline Guidelines") that for EBCDIC encodings there are two end-of-line mapping conventions in use (see table 5-1):

This is the summary; more details in the thread that starts at https://lists.gnu.org/archive/html/bug-gnu-libiconv/2023-04/msg00002.html .

GNU libiconv now makes use of the concept and syntax of a recode "surface":

  • When an encoding such as IBM-1047 is specified (AFAIU, that's the default encoding for many people on z/OS), the newline 0x15 maps to U+0085.
  • When an encoding is specified as IBM-1047/ZOS_UNIX, the newline 0x15 maps to U+000A, and 0x25 maps to U+0085. Like shown in table 5-1.

I would suggest that recode supports the same surface ZOS_UNIX with the same name and the same semantics (swap 0x15 and 0x25).

To understand how this works in practice, with GNU libiconv, see this unit test:
https://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blob;f=tests/check-ebcdic;h=62dfd61437d008af1f3f47ae69baeba692e01792;hb=19b6af5e5efe306bc1b2da87ba054b7391360ca2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions