Skip to content

[Feature Request] Add the ability to manually control caching rather than relying upon the smart cache during generation #1956

@matthock

Description

@matthock

Describe the Issue
I would like the ability to more directly control KV cache slots during a koboldcpp API Generate call. Specifically, I would like to be able to include a field as part of the generate API call giving one or more cache slot IDs and have the back end use either that slot, or smart cache among the listed slots, rather than using full smart cache over the entire set of cache slots.

Additional Information:
My immediate desire for this relates to using SillyTavern with the MessageSummarize extension. This produces a message by message summary entry via a call after each response to the back end, rather than the standard Summarize extension that will process at long periods. I am finding with this setup, the smart cache almost never finds valid cache when it switches back to the regular chat. This change would need to be paired with one made otherwise to the SillyTavern connection for KoboldCPP to allow optionally specifying the slots as part of the connection profile. Then, two different connection profiles with a different subset of slots could be used to allow MessageSummarize to operate either in one fixed slot, or depending on feature implementation, tell it to avoid caching the response at all, as each of these will necessarily be a completely bespoke and short call without any real overlap.

A potentially easier first implementation step would be to add a no-cache flag to the generate call that prompts KoboldCPP to store the current cache, perform the generate, then immediately reload what was swapped out. This wouldn't fit some other ideas I had for how to use it (which involved using the slots to represent individual characters within a multi-character chat, allowing each character's stream of thought and information to be independent of others), but it would solve this particular issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions