Skip to content

Speed up long chat buffer rendering#222

Open
leo-ar wants to merge 2 commits into
dnouri:masterfrom
leo-ar:perf/batch-history-render
Open

Speed up long chat buffer rendering#222
leo-ar wants to merge 2 commits into
dnouri:masterfrom
leo-ar:perf/batch-history-render

Conversation

@leo-ar
Copy link
Copy Markdown

@leo-ar leo-ar commented Jun 7, 2026

First, thank you for building and maintaining the Emacs frontend. It has become
a very useful part of my workflow. This PR is an attempt to contribute back a
performance improvement for long-running chat buffers, with locally gathered
profiling/benchmark evidence included below to make the behavior easier to
evaluate.

I am also testing this branch in everyday Emacs/Pi usage. Thorough interactive
testing will take longer, but the early results look good enough that I wanted
to open the PR now so you can review the approach, tradeoffs, and implementation
while that real-world validation continues.

Summary

Speed up the Emacs chat renderer in two places where display-layer work was
repeated while buffers grow:

  1. Batch history replay post-processing. During
    pi-coding-agent--display-session-history, defer per-message
    fontification/table decoration and run one consolidated pass after all
    history has been inserted.

  2. Avoid streaming table scans when no markdown pipe table is possible. During
    assistant text streaming, only call
    pi-coding-agent--maybe-decorate-streaming-table after a newline if recent
    streamed text contained |.

This keeps live streaming/rendering behavior for actual markdown tables, while
avoiding tree-sitter table queries for ordinary prose/code deltas.

Motivation / evidence

Pi standalone does not show this slowdown; the bottleneck is in the Emacs
display/render layer.

History replay benchmark

To measure history replay, I used a locally saved long-running real session and
fed its messages through the Emacs history-rendering path. The benchmarked
session had 924 display messages, rendered to 322,947 chat-buffer characters,
and included 406 tool-call renderings. The benchmark compared the pre-change
behavior, which ran fontification/table decoration during each replayed message,
with the new batched behavior, which runs one consolidated post-processing pass
after replay.

Results:

legacy-all:  924 messages, 322947 chars, 48.293723s
batched-all: 924 messages, 322947 chars,  4.354221s

An ELP profile of the legacy replay showed the expensive work was display
post-processing, not insertion/tool rendering:

pi-coding-agent--display-session-history       48.457s
pi-coding-agent--display-history-messages      48.301s
pi-coding-agent--render-history-text           40.877s
font-lock-ensure                               40.806s
pi-coding-agent--decorate-tables-in-region      7.519s
pi-coding-agent--display-user-message           6.248s
pi-coding-agent--render-history-tool            0.113s
pi-coding-agent--append-to-chat                 0.030s

Live streaming benchmark

To measure live growth, I preloaded the same real-session history into the Emacs
chat buffer, then simulated continuing the conversation with 200 ordinary
assistant text deltas containing prose/code-style markdown but no pipe tables.
This isolates the cost of the streaming table-detection path as the buffer grows
without depending on backend latency or model behavior.

Results:

legacy-all:  924 history messages + 200 deltas, 2.072428s
guarded-all: 924 history messages + 200 deltas, 0.102220s

Related upstream issues/PRs

Tests

Added render unit coverage for:

  • history replay batching post-processing exactly once, including visible custom
    messages with table-like content;
  • skipping streaming table scans for newline-only non-pipe text;
  • preserving streaming table scan behavior once a pipe and newline arrive;
  • clearing the streaming table candidate after the text_end backstop scan.

Validation run locally:

make check
# OK; Ran 1004 tests, 1004 results as expected, 0 unexpected

make test-integration-fake
# OK; 15 fake tests passed, 15 real-lane variants skipped

I also ran:

make test-gui

It failed on pi-coding-agent-gui-test-table-resize-refreshes-hot-tail-only with:

(should (= line-before (pi-coding-agent-gui-test-top-line-number)))
:form (= 75 77)

I checked the same single GUI test on a clean upstream/master worktree and it
failed identically in this environment, so I do not think this branch introduced
that failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant