Speed up long chat buffer rendering by leo-ar · Pull Request #222 · dnouri/pi-coding-agent

leo-ar · 2026-06-07T05:03:33Z

First, thank you for building and maintaining the Emacs frontend. It has become
a very useful part of my workflow. This PR is an attempt to contribute back a
performance improvement for long-running chat buffers, with locally gathered
profiling/benchmark evidence included below to make the behavior easier to
evaluate.

I am also testing this branch in everyday Emacs/Pi usage. Thorough interactive
testing will take longer, but the early results look good enough that I wanted
to open the PR now so you can review the approach, tradeoffs, and implementation
while that real-world validation continues.

Summary

Speed up the Emacs chat renderer in two places where display-layer work was
repeated while buffers grow:

Batch history replay post-processing. During
pi-coding-agent--display-session-history, defer per-message
fontification/table decoration and run one consolidated pass after all
history has been inserted.
Avoid streaming table scans when no markdown pipe table is possible. During
assistant text streaming, only call
pi-coding-agent--maybe-decorate-streaming-table after a newline if recent
streamed text contained |.

This keeps live streaming/rendering behavior for actual markdown tables, while
avoiding tree-sitter table queries for ordinary prose/code deltas.

Motivation / evidence

Pi standalone does not show this slowdown; the bottleneck is in the Emacs
display/render layer.

History replay benchmark

To measure history replay, I used a locally saved long-running real session and
fed its messages through the Emacs history-rendering path. The benchmarked
session had 924 display messages, rendered to 322,947 chat-buffer characters,
and included 406 tool-call renderings. The benchmark compared the pre-change
behavior, which ran fontification/table decoration during each replayed message,
with the new batched behavior, which runs one consolidated post-processing pass
after replay.

Results:

legacy-all:  924 messages, 322947 chars, 48.293723s
batched-all: 924 messages, 322947 chars,  4.354221s

An ELP profile of the legacy replay showed the expensive work was display
post-processing, not insertion/tool rendering:

pi-coding-agent--display-session-history       48.457s
pi-coding-agent--display-history-messages      48.301s
pi-coding-agent--render-history-text           40.877s
font-lock-ensure                               40.806s
pi-coding-agent--decorate-tables-in-region      7.519s
pi-coding-agent--display-user-message           6.248s
pi-coding-agent--render-history-tool            0.113s
pi-coding-agent--append-to-chat                 0.030s

Live streaming benchmark

To measure live growth, I preloaded the same real-session history into the Emacs
chat buffer, then simulated continuing the conversation with 200 ordinary
assistant text deltas containing prose/code-style markdown but no pipe tables.
This isolates the cost of the streaming table-detection path as the buffer grows
without depending on backend latency or model behavior.

Results:

legacy-all:  924 history messages + 200 deltas, 2.072428s
guarded-all: 924 history messages + 200 deltas, 0.102220s

Related upstream issues/PRs

Related but distinct from Keep following chat panes filled while output streams #201. That issue concerns tail-following windows
becoming visually underfilled while streaming; this PR targets repeated
display post-processing/tree-sitter table scans.
Adjacent to pi-coding-agent--treesit-table-regions: Query pattern is malformed #216 / Warn about incompatible Markdown grammars #218 because both involve markdown tree-sitter table
queries. This PR does not replace grammar compatibility checks; it reduces
unnecessary calls into the table-query path when no table is possible.
Similar motivation to Keep generic tool previews responsive while they stream #199, but a different path. Keep generic tool previews responsive while they stream #199 handled generic tool
preview streaming; this PR handles history replay and ordinary assistant text
deltas.

Tests

Added render unit coverage for:

history replay batching post-processing exactly once, including visible custom
messages with table-like content;
skipping streaming table scans for newline-only non-pipe text;
preserving streaming table scan behavior once a pipe and newline arrive;
clearing the streaming table candidate after the text_end backstop scan.

Validation run locally:

make check
# OK; Ran 1004 tests, 1004 results as expected, 0 unexpected

make test-integration-fake
# OK; 15 fake tests passed, 15 real-lane variants skipped

I also ran:

make test-gui

It failed on pi-coding-agent-gui-test-table-resize-refreshes-hot-tail-only with:

(should (= line-before (pi-coding-agent-gui-test-top-line-number)))
:form (= 75 77)

I checked the same single GUI test on a clean upstream/master worktree and it
failed identically in this environment, so I do not think this branch introduced
that failure.

leo-ar added 2 commits June 6, 2026 22:02

Batch history replay post-processing

6141936

Avoid unnecessary streaming table scans

0f6dff9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up long chat buffer rendering#222

Speed up long chat buffer rendering#222
leo-ar wants to merge 2 commits into
dnouri:masterfrom
leo-ar:perf/batch-history-render

leo-ar commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leo-ar commented Jun 7, 2026

Summary

Motivation / evidence

History replay benchmark

Live streaming benchmark

Related upstream issues/PRs

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant