docs(advance): add Add a New Speculative Decoding Method guide#4589
docs(advance): add Add a New Speculative Decoding Method guide#4589SuperMarioYL wants to merge 2 commits into
Conversation
Document the BaseSpecProposer + SPEC_PROPOSERS extension contract so that third parties can add a draft-token proposer without reverse engineering the engine. The existing spec_decoding.md teaches usage for the four shipped methods (eagle, eagle3, deepseek_mtp, qwen3_5_mtp) but does not explain the plug-in surface; users have asked for this in InternLM#1738 and InternLM#4530. Contents follow the same shape as docs/en/advance/pytorch_new_model.md: the registry / base-class / method-string triad, what BaseSpecProposer already implements, a minimal new proposer, the get_outputs contract, when to override build_model (with the in-tree Qwen3_5MTP and Eagle3 examples), and a 5-item shipping checklist. Add the page to docs/en/index.rst under the Advance section right next to spec_decoding.md.
There was a problem hiding this comment.
Pull request overview
Adds a new documentation page that explains how to extend the PyTorch engine's speculative decoding pipeline with a new proposer, and wires it into the docs toctree. This addresses the docs gap referenced by issues #1738 and #4530.
Changes:
- Adds
docs/en/advance/spec_decoding_new_method.mdwalking through theSPEC_PROPOSERSregistry,BaseSpecProposercontract,get_outputsreturn tuple, when to overridebuild_model, and a contributor checklist. - Registers the new page in
docs/en/index.rstnext tospec_decoding.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| docs/en/advance/spec_decoding_new_method.md | New guide describing the proposer plug-in contract, with examples mirrored from the in-tree deepseek_mtp, eagle3, and qwen3_5_mtp proposers. |
| docs/en/index.rst | Adds the new doc to the advance toctree. |
Verified against lmdeploy/pytorch/spec_decode/proposers/{base,deepseek_mtp,eagle3,qwen3_5_mtp}.py: registry name, build_specdecode_proposer signature, BaseSpecProposer API surface, the get_outputs 3-tuple, and the Eagle3/Qwen3_5MTP build_model overrides quoted in the doc all match the current code.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @SPEC_PROPOSERS.register_module(name='qwen3_5_mtp') | ||
| class Qwen3_5MTP(DeepseekMTP): | ||
|
|
||
| def build_model(self, empty_init, target_model=None, build_model_ctx=None): |
There was a problem hiding this comment.
one may also need to make changes in lmdeploy/pytorch/configurations and add model definition in lmdeploy/pytorch/models.
There was a problem hiding this comment.
Thanks for the pointer @RunningLeon! Pushed 0f0fcd6 which adds a new "Wire up the draft model architecture" section right after the build_model discussion, covering both touch-points you flagged:
lmdeploy/pytorch/configurations/— how to add anAutoModelConfigBuilder(auto-registered viaconfigurations/__init__.pywalking the package), with references todeepseek_v2.py,qwen3_5.py, andllama.pyfor thedeepseek_mtp/qwen3_5_mtp/eaglepatterns.lmdeploy/pytorch/models/— how to add the draft model class and register the architecture string inmodule_map.py, withdeepseek_mtp.py,qwen3_5_mtp.py,llama_eagle.py/llama_eagle3.py, andglm4moe_mtp.pycited as templates.
The checklist at the bottom was also extended with the two new items. PTAL when you have a moment.
… for new spec-decoding method Address review feedback on PR InternLM#4589: a new speculative-decoding method typically also needs (1) an AutoModelConfigBuilder under lmdeploy/pytorch/configurations/ to recognise the draft hf_config and flip model_paradigm to 'ar_spec', and (2) a draft model class under lmdeploy/pytorch/models/ registered in module_map.py so the engine patcher can resolve the draft architecture string. Add a new section covering both touch-points with references to existing implementations (deepseek_mtp, qwen3_5_mtp, llama_eagle/eagle3, glm4moe_mtp), and extend the checklist accordingly.
Motivation
The PyTorch engine has a clean plug-in surface for speculative decoding
(
BaseSpecProposer+SPEC_PROPOSERSregistry inlmdeploy/pytorch/spec_decode/proposers/base.py), and four shippedmethods register against it:
eagle,eagle3,deepseek_mtp,qwen3_5_mtp. The user-facingdocs/en/advance/spec_decoding.mdteaches usage of those four names but never explains how to add a
fifth, so users have asked the question externally:
Both are open. A short extension-contract page closes the gap without
locking the engine into anything new.
Modification
Add
docs/en/advance/spec_decoding_new_method.mdand a toctree entryfor it in
docs/en/index.rst, right next tospec_decoding.md.The page mirrors the shape of the existing
docs/en/advance/pytorch_new_model.md(which documents the model-patchextension contract):
methodstring triad.build_specdecode_proposerentry point and whyproposers/__init__.pymust import the new class.BaseSpecProposeralready provides so contributors don'tre-implement weight loading, draft forward, decoding-input update,
or fallbacks.
MyMethod(BaseSpecProposer)skeleton with@SPEC_PROPOSERS.register_module(name='my_method').get_outputs(draft token ids,model_metas,target_hidden_states).build_model, illustrated with the two in-treeprecedents (
Qwen3_5MTPshares the target embeddings;Eagle3swaps embeddings conditionally and widens
get_target_hidden_size).No code changes. All snippets and references point to symbols that
exist in
lmdeploy/pytorch/spec_decode/proposers/.BC-breaking
None — docs only.
Use cases
Anyone wanting to add a new draft-token proposer (e.g. the DFlash
method requested in #4530) can now read one page and know which class
to subclass, which method to implement, what to return, and where to
register.
Checklist
pre-commit run --files docs/en/advance/spec_decoding_new_method.md docs/en/index.rstpasses (mdformat, codespell, trailing whitespace, end-of-file, copyright check).spec_decoding.mdand explicitly names the four shipped methods so the new page does not drift from them.Closes (partially) the docs side of #1738 and #4530.