Skip to content

add snapKV implementation for transformers sdpa attention with flash_attn availability checking#32

Open
Clement25 wants to merge 1 commit intoFasterDecoding:mainfrom
Clement25:main
Open

add snapKV implementation for transformers sdpa attention with flash_attn availability checking#32
Clement25 wants to merge 1 commit intoFasterDecoding:mainfrom
Clement25:main

Conversation

@Clement25
Copy link

@Clement25 Clement25 commented Aug 9, 2025

In the case that flash_attn_2 is not available.

Currently only add hijiack_llama, will add implementations for other models in a later time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant