feat: lr scheduler by Chamberlain0w0 · Pull Request #151 · InfiniTensor/InfiniTrain

Chamberlain0w0 · 2026-05-08T01:30:01Z

2025 年训练营项目选题“学习率调度器实现”中， @littleotherut 完成了学习率调度器模块的基本实现（PR #113 ）。这个 PR 是在他实现的基础上进一步修改接口、规范相关，使之符合我们项目实际应用的需求。

设计文档可以参考：https://gxtctab8no8.feishu.cn/wiki/Bd6Pw1BeeiQ7QfktiT8cbSiTnFb?from=from_copylink。

核心修改：

接口上的修改：采用 megatron 风格的学习率调度相关参数，main.cc 里面加了相应的 gflags。

DEFINE_double(learning_rate, ..., "Peak learning rate.");
DEFINE_double(min_lr, 0.0, "Minimum learning rate.");
DEFINE_string(lr_decay_style, "constant", "LR decay style: none|constant|linear|cosine|inverse-square-root");
DEFINE_int64(lr_warmup_iters, 0, "Number of linear warmup iterations.");
DEFINE_double(lr_warmup_init, 0.0, "Initial learning rate at the start of warmup.");
DEFINE_int64(lr_decay_iters, 0, "Number of iterations to decay LR over (0 = num_iteration).");

获得上述参数后，构造相应的 TrainingLRSchedulerConfig 结构体，然后同优化器一并传给 CreateLRScheduler() 构造得到对应的学习率调度器。

设计上的修改：
a. 新增 LRScheduler 基类以及各种调度策略对应的派生类，与 torch 实现以及使用方式均对齐。训练循环中在 optimizer.step() 之后再调用 scheduler.step() 即完成学习率更新。
b. LRScheduler 需要与 Optimizer 交互来获取、更新、同步学习率，所以给 Optimizer 基类也加了对应的 setter 和 getter。
c. LRScheduler::State() 的部分仍是一个较 naive 的实现，后续等 ckpt 机制完成以后再进一步修改。

…r accessors, passthrough SetLearningRate/GetLearningRate, and add initial_learning_rate and it's accessors

…StepLR, LinearLR, LambdaLR and SequentialLR

…base class, add factory method Create<T>() with two-phase init and update all tests to use Create<T>() factory method. - Change Step() to virtual with default implementation - Add pure virtual ComputeLR() for subclasses to implement. - Adapt test helpers (IdentityScheduler, LinearDecayScheduler) to implement ComputeLR() instead of Step(). - All existing tests pass without behavioral changes. BREAKING CHANGE: Subclasses must implement ComputeLR() instead of Step().

…closed and chained form, adjust LinearLR、SequentialLR - enhance LRScheduler with chained and closed form learning rate methods - adapt methods(Step, InitialStep, GetClosedFormLR, GetChainedFormLR) to match PyTorch‘s design - add tests for consistency - refactor LinearLR: add end_factor, and rename this class - add SequentialLR InitialStep and UndoChildInitialSteps BREAKING CHANGE: Subclasses must implement GetClosedFormLR instead of ComputeLR(). Should use LinearLR instead of LinearwarmupLR.

- Add LRSchedulerConfig struct with parameters for all basic schedulers(constant, linear, step) - Add CreateLRScheduler() factory function - Support automatic warmup wrapping via SequentialLR when warmup_steps > 0 - Adapt test files

…ogs, and integrate scheduler into training loop

…s, add validation tests for learning rate schedulers - it now only be used for learning rate recovery when using loadstate

kinorw and others added 9 commits April 2, 2026 00:06

refactor(optimizer): hoist learning_rate_ to Optimizer base and add l…

21ace39

…r accessors, passthrough SetLearningRate/GetLearningRate, and add initial_learning_rate and it's accessors

feat(lr_scheduler): add LRScheduler abstract base class, ConstantLR, …

8a7a4f6

…StepLR, LinearLR, LambdaLR and SequentialLR

refactor(examples): add scheduler placeholder and use runtime lr in l…

831f55e

…ogs, and integrate scheduler into training loop

style: apply clang-format to all legacy files

f0012be

refactor: rename current_lr_ to recover_lr_ and update related method…

327d263

…s, add validation tests for learning rate schedulers - it now only be used for learning rate recovery when using loadstate

fix: adapt to megatron-style arguments

355d1ef

Chamberlain0w0 self-assigned this May 8, 2026

Chamberlain0w0 added 2 commits May 8, 2026 10:53

style: fix setter/getter name

315ebbb

fix: add lr_scheduler test group in config

91d722e

JYMiracle305 self-requested a review May 8, 2026 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: lr scheduler#151

feat: lr scheduler#151
Chamberlain0w0 wants to merge 11 commits intomasterfrom
feat/lr_scheduler

Chamberlain0w0 commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chamberlain0w0 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chamberlain0w0 commented May 8, 2026 •

edited

Loading