Skip to content

Commit b03224b

Browse files
committed
Bump version to 0.3.25
1 parent dc5f7e5 commit b03224b

2 files changed

Lines changed: 38 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.3.25]
11+
- feat: [Refactor Llama class to use new LlamaSampler chain API from _internals](https://github.com/JamePeng/llama-cpp-python/commit/1e6094a327f0fb9dc35d52f84d8ebabc1faa1e95)
12+
This commit refactors the high-level Llama class to fully utilize the new C++ `llama_sampler` chain architecture via `LlamaSamplingContext`.
13+
- Replaced manual sampling logic and obsolete `_init_sampler` with `LlamaSamplingContext`.
14+
- Updated `sample()` and `generate()` to support the full suite of modern sampling strategies (DRY, XTC, Adaptive-P, Infill, etc.).
15+
- Added new sampling parameters to all generation methods (`create_completion`, `create_chat_completion`, `__call__`):
16+
- `dynatemp_range`, `dynatemp_exponent` (Dynamic Temperature)
17+
- `min_keep`
18+
- Refactored `logits_processor` handling to use `CustomSampler` adapter for better performance and C++ interop.
19+
- Improved sampling state management (e.g., repetition penalties) by persisting `_sampling_ctx` during generation.
20+
- Removed manual `logit_bias` processing in Python; now delegated to the underlying sampler chain.
21+
22+
- feat: Separate the grammar sampler, improve the code stability of Sampler Chain processing, and fix some bugs.
23+
24+
- [Improve sampling and grammar lifecycle management, fix memory growth issues](https://github.com/JamePeng/llama-cpp-python/commit/5ef874cf7e5b08533c7782286eda777e44be9744)
25+
- Validate grammar sampler initialization and inputs
26+
- Replace unbounded prev token list with bounded deque by LlamaSamplingParams n_prev param
27+
- Reuse logits NumPy view to avoid repeated allocations
28+
- Reuse single-token buffers for grammar rejection sampling
29+
- Minor cleanups and consistency improvements in sampling flow
30+
31+
- feat: [Fix sampling history alignment with llama.cpp](https://github.com/JamePeng/llama-cpp-python/commit/9f79b78cb89cef44397f8727adc55e288c74946c)
32+
33+
- test: update integration tests for new sampler architecture
34+
35+
- test: replace unstable grammar test with deterministic mechanism check
36+
37+
- fix: Optimize .gitignore and add macOS system files
38+
39+
- feat: Refactor the build-wheels-metal.yaml
40+
41+
- feat: Update llama.cpp to [ggml-org/llama.cpp/commit/079feab9e3efee1d6d4ca370eac50f156e2dc6e8](https://github.com/ggml-org/llama.cpp/commit/079feab9e3efee1d6d4ca370eac50f156e2dc6e8)
42+
43+
- feat: Sync llama.cpp llama/mtmd API Binding 20260214
44+
45+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/4ab182382b87bbbba4fb05ff184b557414740103...dc5f7e5564dd68af9d62f7d450cda45313f80b5d
46+
1047
## [0.3.24]
1148
- feat: [Refactor sampling infrastructure to use llama.cpp sampler chain API](https://github.com/JamePeng/llama-cpp-python/commit/1df39b422890db55cb9f6de43cb792a26921752e)
1249
- LlamaContext: Remove obsolete manual sampling methods.

llama_cpp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .llama_cpp import *
22
from .llama import *
33

4-
__version__ = "0.3.24"
4+
__version__ = "0.3.25"

0 commit comments

Comments
 (0)