-
-
Notifications
You must be signed in to change notification settings - Fork 10
Description
For the end-of-line handling, the only documented surfaces so far are CR and CR-LF. (Doc node "Representation for end of lines")
The Unicode Standard https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf explains (section 5.8 "Newline Guidelines") that for EBCDIC encodings there are two end-of-line mapping conventions in use (see table 5-1):
- The newline function is represented by EBCDIC byte 0x15.
- For "most EBCDIC environments" (this apparently includes IBM i, z/VM, z/VSE, see https://lists.gnu.org/archive/html/bug-gnu-libiconv/2023-04/msg00002.html), the newline function maps to Unicode U+0085.
- For the "z/OS Unix System Services", on the other hand, the newline function maps to Unicode U+000A.
This is the summary; more details in the thread that starts at https://lists.gnu.org/archive/html/bug-gnu-libiconv/2023-04/msg00002.html .
GNU libiconv now makes use of the concept and syntax of a recode "surface":
- When an encoding such as IBM-1047 is specified (AFAIU, that's the default encoding for many people on z/OS), the newline 0x15 maps to U+0085.
- When an encoding is specified as IBM-1047/ZOS_UNIX, the newline 0x15 maps to U+000A, and 0x25 maps to U+0085. Like shown in table 5-1.
I would suggest that recode supports the same surface ZOS_UNIX with the same name and the same semantics (swap 0x15 and 0x25).
To understand how this works in practice, with GNU libiconv, see this unit test:
https://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blob;f=tests/check-ebcdic;h=62dfd61437d008af1f3f47ae69baeba692e01792;hb=19b6af5e5efe306bc1b2da87ba054b7391360ca2