Update the ABNF rules for single-line comments#32
Conversation
Specify that greedy mode should be used. Resolve cases where single-line comments end with EOF, LS or PS.
It's not part of the spec, but all the practical implementations of ABNF-based parsers I've seen make this assumption. It's likely the only sane approach. Hence, this is uneccessary in the README.md for the grammar:
|
|
Regarding the simplicity of having no This lost semantic only surfaces with the parser that will fail the parse because PS and LS can't be the start of the next token, while LF and CR can. Aside from simplicity, the upside of your approach is that in terms of tokenisation the line endings gets tokenized as a regular whitespace just like the other whitespace in the document instead of having a seperate adhoc tokenization for that specific whitespace at the end of a single line comment. All things considered, simplicity is probably the best approach here. |
|
@chirsz-ever, would you like this to be merged in the initial release (v0.1) or in the future v1.0-rc1 release? If you want it to be the initial release, the exclusion for PS and LS can't be there, since pretty much no implementation of JSONC supports this exclusion. We can only make that change in v1. |
|
Do you have any plans or roadmap for versions of the JSONC specification? I haven't found any descriptions of possible different versions (v0.1, v1.0-rc1, etc) on the website or in the repository. You are welcome to directly modify the content of this PR or push a version unrelated to LS and PS to the main branch. My suggestion is to add trailing commas and reject LS and PS characters in single-line comments all at once in a final version. The reality is that most popular parsers support both comments and trailing commas, while LS and PS in single-line comments are extremely rare edge cases that almost never occur in practice. Therefore, my suggested approach should not actually cause many problems. Differences between versions can be confusing, which I think is why Douglas Crockford decided not to version JSON and never change the syntax. Perhaps we should also adhere to the same principle. |
The advantages of this modification are:
The corresponding problem is that it requires greedy matching, which is not specified in the ABNF specification (RFC 5234).
The JSON specification does not require greedy matching; ECMA-262 explicitly provides a description equivalent to greedy matching1:
The
grammar/railroad-diagram.htmlafter runningnpm run railroadappears to have font differences compared to the current version on jsonc.org, so I have not submitted the corresponding changes.Footnotes
https://262.ecma-international.org/16.0/index.html?_gl=1*1bsegw2*_ga*MTYzMTg1NjA4OS4xNzc2ODUwNjM4*_ga_TDCK4DWEPP*czE3NzY4NTA2MzgkbzEkZzAkdDE3NzY4NTA2MzgkajYwJGwwJGgw#sec-ecmascript-language-lexical-grammar ↩