Fix dangling pointer in TextTokenGenerator non-kv-cache path#18725
Fix dangling pointer in TextTokenGenerator non-kv-cache path#18725kirklandsign wants to merge 1 commit intomainfrom
Conversation
Summary: In the non-kv-cache branch of TextTokenGenerator::generate(), push_back() on token_data can trigger vector reallocation, but the tensor created via from_blob still points to the old data address. resize_tensor_ptr only updates shape metadata, not the data pointer, resulting in a dangling pointer. Fix by pre-allocating the vector with reserve() before creating the tensor, ensuring push_back never triggers reallocation during the generate loop. Differential Revision: D99408541
|
@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99408541. |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18725
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 9 PendingAs of commit b5e15c8 with merge base 3d2c853 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Fixes a memory-safety issue in the LLM token generation loop when KV-cache is disabled: from_blob() wraps token_data.data(), but subsequent push_back() could reallocate the vector and leave the tensor with a dangling data pointer.
Changes:
- Pre-reserve
token_datacapacity in the non-KV-cache path to prevent reallocation during generation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary:
In the non-kv-cache branch of TextTokenGenerator::generate(), push_back()
on token_data can trigger vector reallocation, but the tensor created via
from_blob still points to the old data address. resize_tensor_ptr only
updates shape metadata, not the data pointer, resulting in a dangling
pointer.
Fix by pre-allocating the vector with reserve() before creating the
tensor, ensuring push_back never triggers reallocation during the
generate loop.
Differential Revision: D99408541