Skip to content

RFC: Automagical left recursion elimination #5

@bugaevc

Description

@bugaevc

Sometimes, it would be very convenient to implement a parser using left recursion, like in this example:

number = digit.at_least(1).map(''.join).map(int)

@generate
def expr():
    lhs = yield expr
    op = yield string('+') | string('-')
    rhs = yield number
     if op == '+':
        return lhs + rhs
    else:
        return lhs - rhs

expr |= number

Note that this cannot be easily rewritten to use number +/- expr, because we need to parse 1 - 2 - 3 as (1 - 2) - 3 and not 1 - (2 - 3).

This currently doesn't work because calling expr inside of expr causes infinite recursion. The proper way to implement this is to unwrap the recursion manually, i.e. to use an explicit loop that consumes and adds/subtracts a sequence of numbers. However, this is much less convenient than implementing 'the mental model' directly.

This RFC proposes for parsy to be able to do such a transformation automatically.

A proof-of-concept (barely readable) implementation is here, and here's an explanation in pseudocode (that skips over many small details):

recursion_detected = False
parser_is_running = False
saved_result = None

def invoke_parser():
    global parser_is_running, recursion_detected, saved_result

    if not parser_is_running:
        # we are the first invocation of this parser at this index
        parser_is_running = True
        res = normal_logic()
    else:
        # we are the second left-recursive invocation
        recursion_detected = True
        return saved_result or fail('recursion detected')

    # we are the first invocation again
    if not recursion_detected or res.is_fail():
        parser_is_running = False  # and other cleanup
        return res

    while True:
        saved_result = res
        res = normal_logic()
        if res.is_fail():
            parser_is_running = False  # and other cleanup
            return saved_result

This works as if the recursion magically failed at just the right depth, giving the alternative sub-parsers (number in the example above) a chance to try their logic.

Unresolved questions:

  • Should this be implemented for every parser, or should this be an explicit wrapper? If latter, do we call it parsy.recursion_catcher(), Parser.left_recursive() or what?
  • Does this work for cases that are more complex than a simple left recursion in one of the parsers?
  • How do we make it thread- (and multiple independent invocation-) safe?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions