Skip to content

Explore SIMD-accelerated parsing architecture for the standard json module #142915

@ErwinCell

Description

@ErwinCell

Feature or enhancement

Proposal:

Motivation

The standard library json module is widely used and highly stable, but its core parsing architecture is still fundamentally scalar and recursive-descent based. On modern CPUs (especially x86_64 and increasingly ARM64), JSON parsing is often dominated by:

  • UTF-8 validation
  • structural character detection ({ } [ ] , :)
  • whitespace skipping

These stages are well known to be amenable to SIMD acceleration.
Projects such as simdjson demonstrate that a two-stage parsing pipeline (structural scan + semantic parsing), driven by SIMD instructions, can deliver multiple-x speedups while remaining fully compliant with RFC 8259.

Given the growing importance of JSON in performance-sensitive workloads (ML pipelines, telemetry, configuration at scale), I would like to propose a discussion on whether CPython’s json module could adopt a simdjson-inspired architecture, at least optionally.

Scope of the proposal

This issue is not a request to immediately replace the existing implementation. Instead, I would like to explore:

  1. Feasibility
    Whether a SIMD-based parsing backend could coexist with the current implementation.
    Whether this would fit CPython’s portability and maintenance constraints.
  2. Architecture
    A staged parsing model similar to simdjson:
    • Stage 1: SIMD structural scan (identify string boundaries, braces, commas, etc.)
    • Stage 2: scalar semantic parsing using the structural index

Integration options

  • Optional backend selected at build time or runtime
  • Fallback to the existing implementation when SIMD is unavailable

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions