Skip to content

Losing focus of the original UBJSON goal #33

@nebkat

Description

@nebkat

UBJSON was originally proposed to solve a problem with existing binary JSON formats:

Attempts to make using JSON faster through binary specifications like BSON, BJSON or Smile exist, but have been rejected from mass-adoption for two reasons:

  1. Custom (Binary-Only) Data Types: Inclusion of custom data types that have no ancillary in the original JSON spec, leaving room for incompatibilities to exist as different implementations of the spec handle the binary-only data types differently.
  2. Complexity: Some specifications provide higher performance or smaller representations at the cost of a much more complex specification, making implementations more difficult which can slow or block adoption. One of the key reasons JSON became as popular as it did was because of its ease of use.

BSON, for example, defines types for binary data, regular expressions, JavaScript code blocks and other constructs that have no equivalent data type in JSON. BJSON defines a binary data type as well, again leaving the door wide open to interpretation that can potentially lead to incompatibilities between two implementations of the spec and Smile, while the closest, defines more complex data constructs and generation/parsing rules in the name of absolute space efficiency. These are not short-comings, just trade-offs the different specs made in order to service specific use-cases.

The existing binary JSON specifications all define incompatibilities or complexities that undo the singular tenant that made JSON so successful: simplicity.

JSON’s simplicity made it accessible to anyone, made implementations in every language available and made explaining it to anyone consuming your data immediate.

Any successful binary JSON specification must carry these properties forward for it to be genuinely helpful to the community at large.

As an evolution of UBJSON, BJData originally fixed some basic quality-of-life issues with UBJSON, namely the lack of unsigned numbers and use of big-endian.


I believe the recent addition of extension types unfortunately bring us back to where the other binary specifications were in terms of complexity and risks reducing the further adoption of BJData as a generic JSON binary format.

It could be argued that other types such as char, half, and even the strongly-typed byte arrays I proposed had already deviated from the original JSON spec, but crucially these types all have an obvious/implicit representation in regular JSON, namely string, float and int array. This is not the case for extension types, where the context is completely lost if we turn them into plain numbers or strings.

I have wondered in hindsight whether it would have been better to stick with U based binary type as while it ended up solving a problem in some languages, it introduced further ambiguity in others e.g. Javascript which has a Uint8Array but no corresponding ByteArray. Extension types are likely to suffer the same fate, where some languages will have first-level support for timestamps, dates and UUIDs, while others will have to use proprietary types that make it more complicated to integrate fully.


Right now BJData is the closest thing we have to a one-to-one binary JSON representation, and I really don't want to lose that. Other than simply removing extension types from the spec, one option could be to branch BJData into two formats (which is ever so slightly better than introducing a new format...).

One would be a basic format that would stay true to the original goal - perhaps removing features such as no-op, N-dim arrays and any features deemed to be adding unnecessary complexity.

The other would be a feature-rich superset that could still parse the basic format but also introduce types that are not directly compatible with JSON.

This way those willing to commit to using "BJData++" exclusively could benefit from advanced features (though even there I would urge caution!), while BJData would remain a solid choice if you only care about JSON.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions