Webui/prompt processing progress #18300

ServeurpersoCom · 2025-12-22T20:15:34Z

Integrates existing backend 'return_progress' feature into WebUI to show real-time token processing statistics during both prompt preprocessing and generation phases.

Key Features

Unified statistics UI: Same Reading/Generation tab switcher used during streaming and after completion
Live ETA countdown: Real-time countdown updates every second during prompt processing
Auto tab switching: Automatically switches from Reading to Generation tab when prompt processing completes
Manual tab navigation: Users can switch between tabs at any time during generation
Preserved stats: Reading stats are preserved and viewable even after generation starts

Implementation

useProcessingState hook: Extended with getLiveProcessingStats() and getLiveGenerationStats() methods, plus ETA countdown logic
ChatMessageStatistics component: Enhanced with isLive and isProcessingPrompt props for streaming mode
ChatMessageAssistant component: Uses unified statistics component during loading phase

Demo

demo.mp4

ngxson · 2025-12-22T21:46:44Z

Just a nits improvement, I think showing percentage + ETA instead of elapsed time can be more useful:

Processing (123 / 456 tokens - 27% - ETA: 50s)

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

ServeurpersoCom · 2025-12-23T09:25:45Z

It can still be improved; I don't know if people have prompts that take several minutes, but adding the minutes might be a good idea! (and also we calculate the tokens/s we can display them, but it will bloat, and we already have the final value)
I also need to double-check on CPU to break down the display and see if I can't have NaN or similar. Even with the first chunk #18305

ServeurpersoCom · 2025-12-23T17:34:40Z

I think we're good. Now the client side message "Processing..." is no longer visible.

During first batch :

Next one :

ngxson

Very nice feature!

(May need approval from @allozaur too)

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

ServeurpersoCom · 2025-12-29T10:30:45Z

I made a small observation: the chunk format is very close to that expected during normal inference, which spoofs the stat bubbles displayed during inference (those with the Settings/"Keep stats visible after generation" option). It might be wise to filter at this stage if "delta.content = null".

allozaur

Overall good stuff, but some changes are required in order to make it ready for merging. I will handle this on my end.

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Address review feedback from ngxson

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

ServeurpersoCom · 2025-12-29T13:41:29Z

I'm doing a quick re-testing and we can merge it

allozaur · 2025-12-29T13:42:58Z

I'm doing a quick re-testing and we can merge it

wait! haven't finished yet, im polishing it up

ServeurpersoCom · 2025-12-29T13:48:57Z

I'm doing a quick re-testing and we can merge it

wait! haven't finished yet, im polishing it up

No worries, I'll test it when you're finished, and I never merge myself :)

allozaur · 2025-12-29T14:04:45Z

Alright, last changes pushed and also updated PR description with new demo video.

allozaur

Alright now I think that this is ready for merging

allozaur · 2025-12-29T14:09:15Z

@ggerganov please take a look at this and let me know if you also think that it's production ready :)

ServeurpersoCom · 2025-12-29T14:26:56Z

https://github.com/user-attachments/assets/6bec9d79-5b63-4d37-9a88-11be4aa0deae
On my side, there is a weird double update like this:
50s 51s 50s 49s 50s 49s...

allozaur · 2025-12-29T14:36:55Z

https://github.com/user-attachments/assets/6bec9d79-5b63-4d37-9a88-11be4aa0deae On my side, there is a weird double update like this: 50s 51s 50s 49s 50s 49s...

I see... maybe in this case i will remove this client-side countdown and will leave just the default ETA value

ngxson · 2025-12-29T15:00:44Z

Mathematically say, I designed the progress object such that the ETA can be inferred without keeping track of a timer on client side.

The progress can be non-linear. For example, if you're doing something else (watch a video, open a new tab, etc) while prompt is processing, it can slow down the progress enough to be noticeable.

ServeurpersoCom · 2025-12-29T15:03:40Z

Yes, the timer creates a race with server updates, causing the jump. My original approach skipped the first chunk and recalculated from total elapsed time, avoiding both the initial error and the need for client-side countdown. I'll let allozaur finish the refactoring, the update of the existing stats fields is superb! now we also have the pre-processing tokens/s!

allozaur · 2025-12-29T15:10:46Z

well, I've applied 4807b0f to still keep the client-side counter as one more attempt to have this smoother waiting experience, @ngxson @ServeurpersoCom please check it and test on your ends.

In the end if it's working good for you, let's keep it, but if you think that this is better to just use the chunk-based calculations, then I will remove this browser time counter logic.

ServeurpersoCom · 2025-12-29T15:23:27Z

~~No more "ETA data race" on my side, the linearity adjusts from the beginning, and the last second is spot on -> we can merge~~

If we want to be meticulous, it's a good idea to retest the very first batch slowly on the processor (testing in progress)
I check if "percent = Math.round(0 / 0) = NaN" exist

Still not perfect :
https://github.com/user-attachments/assets/15a1d51e-83c9-498f-ab36-12bce67e6da8

ServeurpersoCom · 2025-12-29T16:16:06Z

chunk-based calculation

Yes we need this !

ggerganov · 2025-12-29T16:21:09Z

The ETA is quite inaccurate:

Screen.Recording.2025-12-29.at.18.18.35.mov

I would suggest to remove the ETA since it is not very useful.

allozaur · 2025-12-29T16:23:10Z

The ETA is quite inaccurate:
Screen.Recording.2025-12-29.at.18.18.35.mov

I would suggest to remove the ETA since it is not very useful.

can do! :D so eventually we are simplifying this and probably that's the best outcome

ServeurpersoCom · 2025-12-29T16:39:59Z

Now it's perfect, no more strange jitter, and 1 refresh for each batch

ServeurpersoCom · 2025-12-29T18:34:00Z

Merge needed to continue on 18226, which affects many files

ngxson · 2025-12-29T19:23:45Z

The formula seems to be off (that why the ETA was incorrect). I'll push a fix.

ServeurpersoCom · 2025-12-29T19:37:24Z

I could do another small front PR to test it and put it back. % and tokens/s are relatively stable during preprocessing, so the ETA should converge to accurate values after a few chunks once the average stabilizes

Done with front-only ngxson commit. batch refresh rate + linear ETA + better UI from allozaur

ServeurpersoCom · 2025-12-29T20:48:44Z

530831841-7c575e2f-fd79-4d3b-b0da-092ac798d769.mp4

Now it works with what everyone brought! Perfect!

* webui: display prompt preprocessing progress * webui: add percentage/ETA and exclude cached tokens from progress Address review feedback from ngxson * webui: add minutes and first chunk (0%) case * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: address review feedback from allozaur * chore: update webui build output * webui: address review feedback from allozaur * nit * chore: update webui build output * feat: Enhance chat processing state * feat: Improve chat processing statistics UI * chore: update webui build output * feat: Add live generation statistics to processing state hook * feat: Persist prompt processing stats in hook for better UX * refactor: Enhance ChatMessageStatistics for live stream display * feat: Implement enhanced live chat statistics into assistant message * chore: update webui build output * fix: Proper tab for each stage of prompt processing/generation * chore: update webui build output * fix: Improved ETA calculation & display logic * chore: update webui build output * feat: Simplify logic & remove ETA from prompt progress * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

ServeurpersoCom requested a review from allozaur as a code owner December 22, 2025 20:15

loci-dev mentioned this pull request Dec 22, 2025

UPSTREAM PR #18300: Webui/prompt processing progress auroralabs-loci/llama.cpp#666

Open

ngxson reviewed Dec 22, 2025

View reviewed changes

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Outdated Show resolved Hide resolved

github-actions bot added examples server labels Dec 22, 2025

ngxson approved these changes Dec 23, 2025

View reviewed changes

ServeurpersoCom mentioned this pull request Dec 24, 2025

Feature Request: webui: add parsing progress #17079

Closed

4 tasks

allozaur reviewed Dec 29, 2025

View reviewed changes

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Outdated Show resolved Hide resolved

allozaur reviewed Dec 29, 2025

View reviewed changes

ServeurpersoCom and others added 7 commits December 29, 2025 12:19

webui: display prompt preprocessing progress

46199dd

webui: add percentage/ETA and exclude cached tokens from progress

1ed3745

Address review feedback from ngxson

webui: add minutes and first chunk (0%) case

2f91f6b

Update tools/server/webui/src/lib/components/app/chat/ChatMessages/Ch…

44b2f2b

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

Update tools/server/webui/src/lib/components/app/chat/ChatMessages/Ch…

3b6645c

…atMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

webui: address review feedback from allozaur

6e1b985

chore: update webui build output

c56418e

ServeurpersoCom force-pushed the webui/prompt-processing-progress branch from 2eeb45f to c56418e Compare December 29, 2025 11:21

ServeurpersoCom and others added 9 commits December 29, 2025 13:07

webui: address review feedback from allozaur

bb3c3c2

nit

23a1b0c

chore: update webui build output

4f984a9

feat: Enhance chat processing state

e52e659

feat: Improve chat processing statistics UI

5d7be60

chore: update webui build output

cf117a3

feat: Add live generation statistics to processing state hook

a77b5bb

feat: Persist prompt processing stats in hook for better UX

06b4ca8

refactor: Enhance ChatMessageStatistics for live stream display

b25fd19

allozaur added 2 commits December 29, 2025 14:51

fix: Proper tab for each stage of prompt processing/generation

778cc46

chore: update webui build output

3436a8a

allozaur approved these changes Dec 29, 2025

View reviewed changes

allozaur requested a review from ggerganov December 29, 2025 14:08

fix: Improved ETA calculation & display logic

4807b0f

chore: update webui build output

063504b

ggerganov approved these changes Dec 29, 2025

View reviewed changes

allozaur added 2 commits December 29, 2025 17:26

feat: Simplify logic & remove ETA from prompt progress

1cd5c07

chore: update webui build output

42edb74

ServeurpersoCom merged commit c9a3b40 into ggml-org:master Dec 29, 2025
10 checks passed

Webui/prompt processing progress #18300

Webui/prompt processing progress #18300

Uh oh!

Conversation

ServeurpersoCom commented Dec 22, 2025 • edited by allozaur Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Implementation

Demo

Uh oh!

ngxson commented Dec 22, 2025

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

allozaur commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Dec 29, 2025

Uh oh!

allozaur left a comment

Choose a reason for hiding this comment

Uh oh!

allozaur commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

allozaur commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

ggerganov commented Dec 29, 2025

Uh oh!

allozaur commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

ngxson commented Dec 29, 2025

Uh oh!

ServeurpersoCom commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 29, 2025

Uh oh!

Reviewers

ServeurpersoCom commented Dec 22, 2025 •

edited by allozaur

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

ServeurpersoCom commented Dec 29, 2025 •

edited

Loading

allozaur commented Dec 29, 2025 •

edited

Loading

ServeurpersoCom commented Dec 29, 2025 •

edited

Loading

ServeurpersoCom commented Dec 29, 2025 •

edited

Loading

ServeurpersoCom commented Dec 29, 2025 •

edited

Loading

ServeurpersoCom commented Dec 29, 2025 •

edited

Loading