Skip to content

Rascal layout does not match current Unicode whitespace specification #2649

@toinehartman

Description

@toinehartman

Describe the bug

The Rascal grammar rule for layout contains many different white-space characters, with the aim to follow the Unicode specification.

lexical LAYOUT
= Comment
// all the white space chars defined in Unicode 6.0
| [\u0009-\u000D \u0020 \u0085 \u00A0 \u1680 \u180E \u2000-\u200A \u2028 \u2029 \u202F \u205F \u3000]
;

At the time of writing, it followed Unicode 6.0. However, since 6.3, the 'Mongolian vowel separator' (\u180E) is not considered white-space anymore.

We should consider removing this from the grammar to better align with Unicode.

Downside If anyone felt the urgent need to use this character as layout in their Rascal module, they will get (hard-to-debug) parse errors.

Desktop (please complete the following information):

  • Context: Eclipse plugin, Commandline REPL
  • Rascal Version: 0.41.3-RC8

Additional context
https://unicode-explorer.com/c/180E

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions