Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Dec 22, 2025

Close #17079

Integrates existing backend 'return_progress' feature into WebUI to show real-time token processing statistics during both prompt preprocessing and generation phases.

Key Features

  • Unified statistics UI: Same Reading/Generation tab switcher used during streaming and after completion
  • Live ETA countdown: Real-time countdown updates every second during prompt processing
  • Auto tab switching: Automatically switches from Reading to Generation tab when prompt processing completes
  • Manual tab navigation: Users can switch between tabs at any time during generation
  • Preserved stats: Reading stats are preserved and viewable even after generation starts

Implementation

  • useProcessingState hook: Extended with getLiveProcessingStats() and getLiveGenerationStats() methods, plus ETA countdown logic
  • ChatMessageStatistics component: Enhanced with isLive and isProcessingPrompt props for streaming mode
  • ChatMessageAssistant component: Uses unified statistics component during loading phase

Demo

demo.mp4

@ngxson
Copy link
Collaborator

ngxson commented Dec 22, 2025

Just a nits improvement, I think showing percentage + ETA instead of elapsed time can be more useful:

Processing (123 / 456 tokens - 27% - ETA: 50s)

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 23, 2025

It can still be improved; I don't know if people have prompts that take several minutes, but adding the minutes might be a good idea! (and also we calculate the tokens/s we can display them, but it will bloat, and we already have the final value)
I also need to double-check on CPU to break down the display and see if I can't have NaN or similar. Even with the first chunk #18305

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 23, 2025

I think we're good. Now the client side message "Processing..." is no longer visible.

During first batch :

0

Next one :

1

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature!

(May need approval from @allozaur too)

@ServeurpersoCom
Copy link
Collaborator Author

I made a small observation: the chunk format is very close to that expected during normal inference, which spoofs the stat bubbles displayed during inference (those with the Settings/"Keep stats visible after generation" option). It might be wise to filter at this stage if "delta.content = null".

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good stuff, but some changes are required in order to make it ready for merging. I will handle this on my end.

@ServeurpersoCom ServeurpersoCom force-pushed the webui/prompt-processing-progress branch from 2eeb45f to c56418e Compare December 29, 2025 11:21
@ServeurpersoCom
Copy link
Collaborator Author

I'm doing a quick re-testing and we can merge it

@allozaur
Copy link
Collaborator

I'm doing a quick re-testing and we can merge it

wait! haven't finished yet, im polishing it up

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 29, 2025

I'm doing a quick re-testing and we can merge it

wait! haven't finished yet, im polishing it up

No worries, I'll test it when you're finished, and I never merge myself :)

@allozaur
Copy link
Collaborator

Alright, last changes pushed and also updated PR description with new demo video.

Copy link
Collaborator

@allozaur allozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright now I think that this is ready for merging

@allozaur allozaur requested a review from ggerganov December 29, 2025 14:08
@allozaur
Copy link
Collaborator

@ggerganov please take a look at this and let me know if you also think that it's production ready :)

@ServeurpersoCom
Copy link
Collaborator Author

https://github.com/user-attachments/assets/6bec9d79-5b63-4d37-9a88-11be4aa0deae
On my side, there is a weird double update like this:
50s 51s 50s 49s 50s 49s...

@allozaur
Copy link
Collaborator

allozaur commented Dec 29, 2025

https://github.com/user-attachments/assets/6bec9d79-5b63-4d37-9a88-11be4aa0deae On my side, there is a weird double update like this: 50s 51s 50s 49s 50s 49s...

I see... maybe in this case i will remove this client-side countdown and will leave just the default ETA value

@ngxson
Copy link
Collaborator

ngxson commented Dec 29, 2025

Mathematically say, I designed the progress object such that the ETA can be inferred without keeping track of a timer on client side.

The progress can be non-linear. For example, if you're doing something else (watch a video, open a new tab, etc) while prompt is processing, it can slow down the progress enough to be noticeable.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 29, 2025

Yes, the timer creates a race with server updates, causing the jump. My original approach skipped the first chunk and recalculated from total elapsed time, avoiding both the initial error and the need for client-side countdown. I'll let allozaur finish the refactoring, the update of the existing stats fields is superb! now we also have the pre-processing tokens/s!

@allozaur
Copy link
Collaborator

well, I've applied 4807b0f to still keep the client-side counter as one more attempt to have this smoother waiting experience, @ngxson @ServeurpersoCom please check it and test on your ends.

In the end if it's working good for you, let's keep it, but if you think that this is better to just use the chunk-based calculations, then I will remove this browser time counter logic.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 29, 2025

No more "ETA data race" on my side, the linearity adjusts from the beginning, and the last second is spot on -> we can merge

If we want to be meticulous, it's a good idea to retest the very first batch slowly on the processor (testing in progress)
I check if "percent = Math.round(0 / 0) = NaN" exist

Still not perfect :
https://github.com/user-attachments/assets/15a1d51e-83c9-498f-ab36-12bce67e6da8

@ServeurpersoCom
Copy link
Collaborator Author

chunk-based calculation

Yes we need this !

@ggerganov
Copy link
Member

The ETA is quite inaccurate:

Screen.Recording.2025-12-29.at.18.18.35.mov

I would suggest to remove the ETA since it is not very useful.

@allozaur
Copy link
Collaborator

The ETA is quite inaccurate:
Screen.Recording.2025-12-29.at.18.18.35.mov

I would suggest to remove the ETA since it is not very useful.

can do! :D so eventually we are simplifying this and probably that's the best outcome

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 29, 2025

Now it's perfect, no more strange jitter, and 1 refresh for each batch

@ServeurpersoCom ServeurpersoCom merged commit c9a3b40 into ggml-org:master Dec 29, 2025
10 checks passed
@ServeurpersoCom
Copy link
Collaborator Author

Merge needed to continue on 18226, which affects many files

@ngxson
Copy link
Collaborator

ngxson commented Dec 29, 2025

The formula seems to be off (that why the ETA was incorrect). I'll push a fix.

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 29, 2025

I could do another small front PR to test it and put it back. % and tokens/s are relatively stable during preprocessing, so the ETA should converge to accurate values after a few chunks once the average stabilizes

Done with front-only ngxson commit. batch refresh rate + linear ETA + better UI from allozaur

@ServeurpersoCom
Copy link
Collaborator Author

530831841-7c575e2f-fd79-4d3b-b0da-092ac798d769.mp4

Now it works with what everyone brought! Perfect!

thad0ctor pushed a commit to thad0ctor/llama.cpp that referenced this pull request Dec 30, 2025
* webui: display prompt preprocessing progress

* webui: add percentage/ETA and exclude cached tokens from progress

Address review feedback from ngxson

* webui: add minutes and first chunk (0%) case

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: address review feedback from allozaur

* chore: update webui build output

* webui: address review feedback from allozaur

* nit

* chore: update webui build output

* feat: Enhance chat processing state

* feat: Improve chat processing statistics UI

* chore: update webui build output

* feat: Add live generation statistics to processing state hook

* feat: Persist prompt processing stats in hook for better UX

* refactor: Enhance ChatMessageStatistics for live stream display

* feat: Implement enhanced live chat statistics into assistant message

* chore: update webui build output

* fix: Proper tab for each stage of prompt processing/generation

* chore: update webui build output

* fix: Improved ETA calculation & display logic

* chore: update webui build output

* feat: Simplify logic & remove ETA from prompt progress

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: webui: add parsing progress

4 participants