Drop unused Gemma4TextAttention weights when sharing KV Cache#45328
Drop unused Gemma4TextAttention weights when sharing KV Cache#45328RyanMullins wants to merge 4 commits intohuggingface:mainfrom
Conversation
48be161 to
23b0072
Compare
|
Hi @Cyrilvallez / HuggingFace team, I talked to @RyanMullins about this PR and I and the team acknowledge that it is a critical problem to be fixed. But as it involves more architectural changes (ie. how to handle qkv, new model checkpoints, etc) we would like to ask holding this PR merging. We (Google side) want to coordinate and validate those code/checkpoint changes across all internal and external stakeholders to make sure we won't cause harm to other customers using Gemma4 models while we sync this to other environments. |
|
Hey @RyanMullins @lucianommartins! I'm not sure why, but this PR was mostly a full-on copy of my PR here #45312, which was just merged... Just seeing your message here about holding a bit @lucianommartins, very sorry for that, I did not see it before merge. However, the architectural change is quite minimal (only with `use_cache=False, which is not very common), so it should not cause any issues. Let me know if that's still a problem, and we can revert |
|
I think @RyanMullins mostly wanted to update the weight conversion script here, and mistakenly brought my changes from the other PR |
|
@RyanMullins @lucianommartins #45336 completely drop the weights, and silently skip loading them from the hub, so that the weights on the hub repos do not have to be changed, while keeping the memory benefits of dropping them when we use the model |
|
Yeah I started with #45336 and then added the conditionals to drop the modules that aren't used when shared. There was a history difference between that PR and main yesterday, so I squashed it all and rebased on mine, hence the different histories here. Landing #45336 without coordinating with vLLM is fine. But dropping the modules needs to be coordinated. I'll rebase this morning and we can take it from there. |
23b0072 to
c363523
Compare
|
@Cyrilvallez sync'd this branch with |
|
#45336 also contains the required properties so that unexpected weights on the hub will not trigger huge warnings for users (without it, a huge loading report will pop up, saying that quite a lot of weights have not been used), so gonna merge this one instead! |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45328&sha=a187bc |
a187bcc to
09156ad
Compare
09156ad to
2685bc7
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma4 |
What does this PR do?
Fixes #45242
k_proj,k_norm, andv_projweights forGemma4TextAttentionmodules from the checkpoint if the layer shares KV cache values.These changes can also be adapted to Gemma 3n if that's desirable.
I have a weights update I can push to drop the unused matrices from the checkpoints, but I want to wait until this PR lands before we push those.
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case. On Slack.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@zucchini-nlp @Cyrilvallez