Skip to content
Discussion options

You must be logged in to vote

Okay, that's on me. I found what was the discrepancy coming from.

I never bothered enabling smart cache / checkpoints on KoboldCpp.

On Llama-server it's enabled by default, and worse, checkpoint / and RAM cache are broken with Gemma 4, leaking VRAM and RAM by the gallon, hence the memory usage discrepancy.

Once checkpoints and cache are disabled on LLama.cpp the VRAM and RAM are basically the same. Yay mystery solved!

Replies: 4 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@LostRuins
Comment options

@SerialKicked
Comment options

@wbruna
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by SerialKicked
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants