You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+37Lines changed: 37 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
## [0.3.25]
11
+
- feat: [Refactor Llama class to use new LlamaSampler chain API from _internals](https://github.com/JamePeng/llama-cpp-python/commit/1e6094a327f0fb9dc35d52f84d8ebabc1faa1e95)
12
+
This commit refactors the high-level Llama class to fully utilize the new C++ `llama_sampler` chain architecture via `LlamaSamplingContext`.
13
+
- Replaced manual sampling logic and obsolete `_init_sampler` with `LlamaSamplingContext`.
14
+
- Updated `sample()` and `generate()` to support the full suite of modern sampling strategies (DRY, XTC, Adaptive-P, Infill, etc.).
15
+
- Added new sampling parameters to all generation methods (`create_completion`, `create_chat_completion`, `__call__`):
- Refactored `logits_processor` handling to use `CustomSampler` adapter for better performance and C++ interop.
19
+
- Improved sampling state management (e.g., repetition penalties) by persisting `_sampling_ctx` during generation.
20
+
- Removed manual `logit_bias` processing in Python; now delegated to the underlying sampler chain.
21
+
22
+
- feat: Separate the grammar sampler, improve the code stability of Sampler Chain processing, and fix some bugs.
23
+
24
+
-[Improve sampling and grammar lifecycle management, fix memory growth issues](https://github.com/JamePeng/llama-cpp-python/commit/5ef874cf7e5b08533c7782286eda777e44be9744)
25
+
- Validate grammar sampler initialization and inputs
26
+
- Replace unbounded prev token list with bounded deque by LlamaSamplingParams n_prev param
27
+
- Reuse logits NumPy view to avoid repeated allocations
28
+
- Reuse single-token buffers for grammar rejection sampling
29
+
- Minor cleanups and consistency improvements in sampling flow
30
+
31
+
- feat: [Fix sampling history alignment with llama.cpp](https://github.com/JamePeng/llama-cpp-python/commit/9f79b78cb89cef44397f8727adc55e288c74946c)
32
+
33
+
- test: update integration tests for new sampler architecture
34
+
35
+
- test: replace unstable grammar test with deterministic mechanism check
36
+
37
+
- fix: Optimize .gitignore and add macOS system files
38
+
39
+
- feat: Refactor the build-wheels-metal.yaml
40
+
41
+
- feat: Update llama.cpp to [ggml-org/llama.cpp/commit/079feab9e3efee1d6d4ca370eac50f156e2dc6e8](https://github.com/ggml-org/llama.cpp/commit/079feab9e3efee1d6d4ca370eac50f156e2dc6e8)
42
+
43
+
- feat: Sync llama.cpp llama/mtmd API Binding 20260214
44
+
45
+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/4ab182382b87bbbba4fb05ff184b557414740103...dc5f7e5564dd68af9d62f7d450cda45313f80b5d
46
+
10
47
## [0.3.24]
11
48
- feat: [Refactor sampling infrastructure to use llama.cpp sampler chain API](https://github.com/JamePeng/llama-cpp-python/commit/1df39b422890db55cb9f6de43cb792a26921752e)
0 commit comments