diff --git a/docs.json b/docs.json
index 68746c58a..73c0a1b48 100644
--- a/docs.json
+++ b/docs.json
@@ -126,6 +126,12 @@
"tutorials/basic/multiple-loras"
]
},
+ {
+ "group": "Training",
+ "pages": [
+ "tutorials/training/lora-training"
+ ]
+ },
{
"group": "ControlNet",
"pages": [
@@ -868,6 +874,12 @@
"zh-CN/tutorials/basic/multiple-loras"
]
},
+ {
+ "group": "训练",
+ "pages": [
+ "zh-CN/tutorials/training/lora-training"
+ ]
+ },
{
"group": "ControlNet",
"pages": [
diff --git a/tutorials/training/lora-training.mdx b/tutorials/training/lora-training.mdx
new file mode 100644
index 000000000..c3aab33bf
--- /dev/null
+++ b/tutorials/training/lora-training.mdx
@@ -0,0 +1,192 @@
+---
+title: "Native LoRA training"
+sidebarTitle: "LoRA Training"
+description: "Train LoRA models directly in ComfyUI using built-in training nodes"
+---
+
+ComfyUI includes native support for training LoRA (Low-Rank Adaptation) models without requiring external tools or custom nodes. This guide covers how to use the built-in training nodes to create your own LoRAs.
+
+
+The training nodes are marked as **experimental**. Features and behavior may change in future releases.
+
+
+## Overview
+
+The native LoRA training system consists of four nodes:
+
+| Node | Category | Purpose |
+|------|----------|---------|
+| **Train LoRA** | training | Trains a LoRA model from latents and conditioning |
+| **Load LoRA Model** | loaders | Applies trained LoRA weights to a model |
+| **Save LoRA Weights** | loaders | Exports LoRA weights to a safetensors file |
+| **Plot Loss Graph** | training | Visualizes training loss over time |
+
+## Requirements
+
+- A GPU with sufficient VRAM (training typically requires more memory than inference)
+- Latent images (encoded from your training dataset)
+- Text conditioning (captions for your training images)
+
+## Basic training workflow
+
+
+
+Encode your training images to latents using a VAE Encode node. Create text conditioning for each image using CLIP Text Encode.
+
+
+For best results, use high-quality images that represent the style or subject you want to train.
+
+
+
+
+Connect your model, latents, and conditioning to the Train LoRA node. Set the training parameters:
+
+- **batch_size**: Number of samples per training step (default: 1)
+- **steps**: Total training iterations (default: 16)
+- **learning_rate**: How quickly the model adapts (default: 0.0005)
+- **rank**: LoRA rank - higher values capture more detail but use more memory (default: 8)
+
+
+
+Execute the workflow. The node will output:
+- **lora**: The trained LoRA weights
+- **loss_map**: Training loss history
+- **steps**: Total steps completed
+
+
+
+Connect the output to **Save LoRA Weights** to export your trained LoRA. Use **Load LoRA Model** to apply it during inference.
+
+
+
+## Train LoRA node
+
+The main training node that creates LoRA weights from your dataset.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `model` | MODEL | - | Base model to train the LoRA on |
+| `latents` | LATENT | - | Encoded training images |
+| `positive` | CONDITIONING | - | Text conditioning for training |
+| `batch_size` | INT | 1 | Samples per step (1-10000) |
+| `grad_accumulation_steps` | INT | 1 | Gradient accumulation steps (1-1024) |
+| `steps` | INT | 16 | Training iterations (1-100000) |
+| `learning_rate` | FLOAT | 0.0005 | Learning rate (0.0000001-1.0) |
+| `rank` | INT | 8 | LoRA rank (1-128) |
+| `optimizer` | COMBO | AdamW | Optimizer: AdamW, Adam, SGD, RMSprop |
+| `loss_function` | COMBO | MSE | Loss function: MSE, L1, Huber, SmoothL1 |
+| `seed` | INT | 0 | Random seed for reproducibility |
+| `training_dtype` | COMBO | bf16 | Training precision: bf16, fp32 |
+| `lora_dtype` | COMBO | bf16 | LoRA weight precision: bf16, fp32 |
+| `algorithm` | COMBO | lora | Training algorithm (lora, lokr, oft, etc.) |
+| `gradient_checkpointing` | BOOLEAN | true | Reduces VRAM usage during training |
+| `checkpoint_depth` | INT | 1 | Depth level for gradient checkpointing (1-5) |
+| `offloading` | BOOLEAN | false | Offload model to RAM (requires bypass mode) |
+| `existing_lora` | COMBO | [None] | Continue training from existing LoRA |
+| `bucket_mode` | BOOLEAN | false | Enable resolution bucketing for multi-resolution datasets |
+| `bypass_mode` | BOOLEAN | false | Apply adapters via hooks instead of weight modification |
+
+### Outputs
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `lora` | LORA_MODEL | Trained LoRA weights |
+| `loss_map` | LOSS_MAP | Training loss history |
+| `steps` | INT | Total training steps completed |
+
+## Load LoRA Model node
+
+Applies trained LoRA weights to a diffusion model. Use this instead of the standard Load LoRA node when working with LoRA weights directly from the Train LoRA node.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `model` | MODEL | - | Base diffusion model |
+| `lora` | LORA_MODEL | - | Trained LoRA weights |
+| `strength_model` | FLOAT | 1.0 | LoRA strength (-100 to 100) |
+| `bypass` | BOOLEAN | false | Apply LoRA without modifying base weights |
+
+### Output
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `model` | MODEL | Model with LoRA applied |
+
+## Save LoRA Weights node
+
+Exports trained LoRA weights to a safetensors file in your output folder.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `lora` | LORA_MODEL | - | Trained LoRA weights to save |
+| `prefix` | STRING | loras/ComfyUI_trained_lora | Output filename prefix |
+| `steps` | INT | (optional) | Training steps for filename |
+
+The saved file will be named `{prefix}_{steps}_steps_{counter}.safetensors` and placed in your `ComfyUI/output/loras/` folder.
+
+## Plot Loss Graph node
+
+Visualizes training progress by plotting loss values over training steps.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `loss` | LOSS_MAP | - | Loss history from Train LoRA |
+| `filename_prefix` | STRING | loss_graph | Output filename prefix |
+
+## Training tips
+
+### VRAM optimization
+
+- Enable **gradient_checkpointing** to significantly reduce VRAM usage (enabled by default)
+- Use **bypass_mode** when working with quantized models (FP8)
+- Enable **offloading** to move the model to RAM during training (requires bypass_mode)
+- Lower the **batch_size** if you encounter out-of-memory errors
+
+### Dataset preparation
+
+- Use consistent image dimensions when possible, or enable **bucket_mode** for multi-resolution training
+- Match the number of conditioning inputs to the number of latent images
+- Quality matters more than quantity—start with 10-20 high-quality images
+
+### Training parameters
+
+- **rank**: Start with 8-16 for most use cases. Higher ranks (32-64) capture more detail but may overfit
+- **steps**: Start with 100-500 steps and monitor the loss graph
+- **learning_rate**: The default 0.0005 works well for most cases. Lower values (0.0001) for more stable training
+
+### Continuing training
+
+Select an existing LoRA from the **existing_lora** dropdown to continue training from a previously saved checkpoint. The total step count will accumulate.
+
+## Supported algorithms
+
+The **algorithm** parameter supports multiple weight adapter types:
+
+- **lora**: Standard Low-Rank Adaptation (recommended)
+- **lokr**: LoCon with Kronecker product decomposition
+- **oft**: Orthogonal Fine-Tuning
+
+## Example: Single-subject LoRA
+
+A minimal workflow for training a LoRA on a specific subject:
+
+1. Load your training images with **Load Image**
+2. Encode images to latents with **VAE Encode**
+3. Create captions with **CLIP Text Encode** (e.g., "a photo of [subject]")
+4. Connect to **Train LoRA** with:
+ - steps: 200
+ - rank: 16
+ - learning_rate: 0.0001
+5. Save with **Save LoRA Weights**
+6. Test with **Load LoRA Model** connected to your inference workflow
+
+
+For training on multiple images with different captions, connect multiple conditioning inputs to match your latent batch size.
+
diff --git a/zh-CN/tutorials/training/lora-training.mdx b/zh-CN/tutorials/training/lora-training.mdx
new file mode 100644
index 000000000..66c04fd76
--- /dev/null
+++ b/zh-CN/tutorials/training/lora-training.mdx
@@ -0,0 +1,192 @@
+---
+title: "原生 LoRA 训练"
+sidebarTitle: "LoRA 训练"
+description: "使用内置训练节点直接在 ComfyUI 中训练 LoRA 模型"
+---
+
+ComfyUI 原生支持训练 LoRA(Low-Rank Adaptation)模型,无需外部工具或自定义节点。本指南介绍如何使用内置训练节点创建自己的 LoRA。
+
+
+训练节点目前标记为**实验性功能**。功能和行为可能会在未来版本中发生变化。
+
+
+## 概述
+
+原生 LoRA 训练系统包含四个节点:
+
+| 节点 | 类别 | 用途 |
+|------|------|------|
+| **Train LoRA** | training | 从潜空间图像和条件训练 LoRA 模型 |
+| **Load LoRA Model** | loaders | 将训练好的 LoRA 权重应用到模型 |
+| **Save LoRA Weights** | loaders | 将 LoRA 权重导出为 safetensors 文件 |
+| **Plot Loss Graph** | training | 可视化训练过程中的损失变化 |
+
+## 系统要求
+
+- 具有足够显存的 GPU(训练通常比推理需要更多内存)
+- 潜空间图像(从训练数据集编码而来)
+- 文本条件(训练图像的描述文字)
+
+## 基础训练流程
+
+
+
+使用 VAE Encode 节点将训练图像编码为潜空间表示。使用 CLIP Text Encode 为每张图像创建文本条件。
+
+
+为获得最佳效果,请使用能代表您想要训练的风格或主题的高质量图像。
+
+
+
+
+将模型、潜空间图像和条件连接到 Train LoRA 节点。设置训练参数:
+
+- **batch_size**:每个训练步骤的样本数(默认:1)
+- **steps**:总训练迭代次数(默认:16)
+- **learning_rate**:模型适应速度(默认:0.0005)
+- **rank**:LoRA 秩 - 更高的值可以捕获更多细节但使用更多内存(默认:8)
+
+
+
+执行工作流。节点将输出:
+- **lora**:训练好的 LoRA 权重
+- **loss_map**:训练损失历史
+- **steps**:完成的总步数
+
+
+
+将输出连接到 **Save LoRA Weights** 以导出训练好的 LoRA。使用 **Load LoRA Model** 在推理时应用它。
+
+
+
+## Train LoRA 节点
+
+从数据集创建 LoRA 权重的主要训练节点。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `model` | MODEL | - | 用于训练 LoRA 的基础模型 |
+| `latents` | LATENT | - | 编码后的训练图像 |
+| `positive` | CONDITIONING | - | 训练用的文本条件 |
+| `batch_size` | INT | 1 | 每步样本数(1-10000) |
+| `grad_accumulation_steps` | INT | 1 | 梯度累积步数(1-1024) |
+| `steps` | INT | 16 | 训练迭代次数(1-100000) |
+| `learning_rate` | FLOAT | 0.0005 | 学习率(0.0000001-1.0) |
+| `rank` | INT | 8 | LoRA 秩(1-128) |
+| `optimizer` | COMBO | AdamW | 优化器:AdamW、Adam、SGD、RMSprop |
+| `loss_function` | COMBO | MSE | 损失函数:MSE、L1、Huber、SmoothL1 |
+| `seed` | INT | 0 | 随机种子,用于可复现性 |
+| `training_dtype` | COMBO | bf16 | 训练精度:bf16、fp32 |
+| `lora_dtype` | COMBO | bf16 | LoRA 权重精度:bf16、fp32 |
+| `algorithm` | COMBO | lora | 训练算法(lora、lokr、oft 等) |
+| `gradient_checkpointing` | BOOLEAN | true | 训练时减少显存使用 |
+| `checkpoint_depth` | INT | 1 | 梯度检查点深度级别(1-5) |
+| `offloading` | BOOLEAN | false | 将模型卸载到内存(需要 bypass 模式) |
+| `existing_lora` | COMBO | [None] | 从现有 LoRA 继续训练 |
+| `bucket_mode` | BOOLEAN | false | 启用分辨率分桶以支持多分辨率数据集 |
+| `bypass_mode` | BOOLEAN | false | 通过钩子应用适配器而非修改权重 |
+
+### 输出
+
+| 输出 | 类型 | 描述 |
+|------|------|------|
+| `lora` | LORA_MODEL | 训练好的 LoRA 权重 |
+| `loss_map` | LOSS_MAP | 训练损失历史 |
+| `steps` | INT | 完成的总训练步数 |
+
+## Load LoRA Model 节点
+
+将训练好的 LoRA 权重应用到扩散模型。当使用来自 Train LoRA 节点的 LoRA 权重时,请使用此节点而非标准的 Load LoRA 节点。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `model` | MODEL | - | 基础扩散模型 |
+| `lora` | LORA_MODEL | - | 训练好的 LoRA 权重 |
+| `strength_model` | FLOAT | 1.0 | LoRA 强度(-100 到 100) |
+| `bypass` | BOOLEAN | false | 不修改基础权重直接应用 LoRA |
+
+### 输出
+
+| 输出 | 类型 | 描述 |
+|------|------|------|
+| `model` | MODEL | 应用了 LoRA 的模型 |
+
+## Save LoRA Weights 节点
+
+将训练好的 LoRA 权重导出为 safetensors 文件到输出文件夹。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `lora` | LORA_MODEL | - | 要保存的训练好的 LoRA 权重 |
+| `prefix` | STRING | loras/ComfyUI_trained_lora | 输出文件名前缀 |
+| `steps` | INT | (可选) | 用于文件名的训练步数 |
+
+保存的文件将命名为 `{prefix}_{steps}_steps_{counter}.safetensors` 并放置在 `ComfyUI/output/loras/` 文件夹中。
+
+## Plot Loss Graph 节点
+
+通过绘制训练步骤中的损失值来可视化训练进度。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `loss` | LOSS_MAP | - | 来自 Train LoRA 的损失历史 |
+| `filename_prefix` | STRING | loss_graph | 输出文件名前缀 |
+
+## 训练技巧
+
+### 显存优化
+
+- 启用 **gradient_checkpointing** 可显著减少显存使用(默认已启用)
+- 使用量化模型(FP8)时使用 **bypass_mode**
+- 启用 **offloading** 在训练期间将模型移至内存(需要 bypass_mode)
+- 如果遇到内存不足错误,请降低 **batch_size**
+
+### 数据集准备
+
+- 尽可能使用一致的图像尺寸,或启用 **bucket_mode** 进行多分辨率训练
+- 确保条件输入数量与潜空间图像数量匹配
+- 质量比数量更重要——从 10-20 张高质量图像开始
+
+### 训练参数
+
+- **rank**:大多数情况从 8-16 开始。更高的秩(32-64)可捕获更多细节但可能过拟合
+- **steps**:从 100-500 步开始,监控损失图
+- **learning_rate**:默认值 0.0005 适用于大多数情况。更低的值(0.0001)可获得更稳定的训练
+
+### 继续训练
+
+从 **existing_lora** 下拉菜单中选择现有 LoRA 以从之前保存的检查点继续训练。总步数将累积。
+
+## 支持的算法
+
+**algorithm** 参数支持多种权重适配器类型:
+
+- **lora**:标准低秩适应(推荐)
+- **lokr**:带 Kronecker 积分解的 LoCon
+- **oft**:正交微调
+
+## 示例:单主题 LoRA
+
+训练特定主题 LoRA 的最小工作流:
+
+1. 使用 **Load Image** 加载训练图像
+2. 使用 **VAE Encode** 将图像编码为潜空间表示
+3. 使用 **CLIP Text Encode** 创建描述文字(例如 "a photo of [subject]")
+4. 连接到 **Train LoRA** 并设置:
+ - steps: 200
+ - rank: 16
+ - learning_rate: 0.0001
+5. 使用 **Save LoRA Weights** 保存
+6. 使用 **Load LoRA Model** 连接到推理工作流进行测试
+
+
+当使用不同描述训练多张图像时,请连接多个条件输入以匹配潜空间批次大小。
+