CLI (Command Line Interface) provides terminal-based interaction with the program, enabling efficient and flexible execution of model training, inference, and evaluation tasks through parameterized configurations.
WebUI (Web User Interface) offers a browser-based visual interface that allows users to perform model training, chatting, and deployment without coding or complex commands, making it ideal for non-technical users and rapid prototyping.
This document details the usage of CLI tools and WebUI in the ERNIE model toolkit, covering core functionalities:
- 📈 Model Fine-tuning: SFT/LoRA/DPO fine-tuning with built-in/custom datasets
- 🗣️ Chat Interaction: Load models for multi-turn conversation testing
- 📊 Performance Evaluation: Validate models on built-in/custom datasets
- 📁 Model Export: Convert trained models to deployable formats
Whether you're a developer seeking script-based customization or prefer graphical interfaces for quick experimentation, both approaches are supported.
Installation
Run in the erniekit root directory:
python -m pip install -e .Verify installation:
erniekit helpExpected output:
------------------------------------------------------------
| Usage: |
| erniekit train -h: model finetuning |
| erniekit export -h: model export |
| erniekit split -h: model split |
| erniekit eval -h: model evaluation |
| erniekit server -h: model deployment |
| erniekit chat -h: launch a chat interface in CLI |
| erniekit webui -h: launch webui |
| erniekit version: show version info |
| erniekit help: show helping info |
------------------------------------------------------------
GPU Configuration
By default, all available gpus are used in CLI/WebUI. If you wan to specify certain gpus, please set CUDA_VISIBLE_DEVICES before running CLI/WebUI:
# Single GPU
export CUDA_VISIBLE_DEVICES=0
# Multi GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# Single XPU
export XPU_VISIBLE_DEVICES=0
# Multi XPUs
export XPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# Single NPU
export ASCEND_RT_VISIBLE_DEVICES=0
# Multi NPUs
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7- Note: In
Chatmodule, the number of gpus configured by CUDA_VISIBLE_DEVICES should be equal totensor_parallel_degreein the config. Alternatively, you can also unset CUDA_VISIBLE_DEVICES.
Examples using ERNIE-4.5-0.3B model:
# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Load model and start service
erniekit server examples/configs/ERNIE-4.5-0.3B/run_chat.yaml
# Launch CLI chat interface
erniekit chat examples/configs/ERNIE-4.5-0.3B/run_chat.yaml- Note: the command-line dialogue for VL-model only supports pure text input.
# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yaml
# Example 2: 32K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_32k.yaml
# Example 3: 8K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_8k.yaml
# Example 4: 32K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_8k.yaml
# Example 2: 32K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_32k.yaml
# Example 3: 8K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_8k.yaml
# Example 4: 32K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_32k.yamlerniekit eval examples/configs/ERNIE-4.5-0.3B/run_eval.yamlerniekit export examples/configs/ERNIE-4.5-0.3B/run_export.yamlNNODES={num_nodes} MASTER_ADDR={your_master_addr} MASTER_PORT={your_master_port} CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 erniekit train examples/configs/ERNIE-4.5-300B-A47B/sft/run_sft_lora_8k.yamlLaunch WebUI:
erniekit webui
# Specify port: GRADIO_SERVER_PORT=8080 erniekit webuiWebUI contains five modules: Basic Info, Training, Chat, Evaluation, and Export.
Default model name is Customization. Custom models support local paths (relative/absolute).
If using a multimodal model, you need to select Customization_VL.
If empty, training will auto-generate paths like ./output/ERNIE-4.5-0.3B_SFT_LoRA_2025_06_29_12_03_36. Evaluation/chat/export default to ./output.
Displays GPU count (read-only).
| WebUI Param | Variable | Description |
|---|---|---|
| Fine-tuning | fine_tuning | LoRA or Full-parameter |
| Compute Type | compute_type | bf16, fp16, fp8 (NVIDIA H-series only), wint8, wint4/8 |
| AMP Master Grad | amp_master_grad | For AMP O2, uses fp32 weight gradients (default: keep unchanged) |
| Disable CKPT Quant | disable_ckpt_quant | Disables weight quantization |
| LoRA Rank | lora_rank | LoRA rank dimension |
| LoRA Alpha | lora_alpha | LoRA scaling factor |
| LoRA+ Scale | lora_plus_scale | LoRA B scale in LoRA+ |
| RSLoRA | rslora | Enable RSLoRA |
| WebUI Param | Variable | Description |
|---|---|---|
| Tensor Parallel | tensor_parallel_degree | Tensor parallelism degree |
| Pipeline Parallel | pipeline_parallel_degree | Pipeline parallelism degree |
| Sharding Parallel | sharding_parallel_degree | Sharding parallelism degree |
| Pipeline Config | pipeline_parallel_config | Recommended: "disable_partial_send_recv enable_clear_every_step_cache enable_delay_scale_loss enable_overlap_p2p_comm best_unbalanced_scheduler" |
| PP Seg Method | pp_seg_method | Pipeline layer segmentation |
| Sharding | sharding | Sharding stage: stage1 (optimizer), stage2 (gradients), stage3 (model) |
| Use SP Callback | use_sp_callback | Skips redundant gradient calculations |
| MoE Group | moe_group | MoE communication group ("mp" or "dummy") |
Default SFT/DPO configurations for ERNIE-4.5-0.3B-Paddle are provided under "Switch SFT/DPO Presets".
After setting dataset paths/probabilities, click "Preview Dataset" for visualization. Click "Preview" to show configurations, "Start" to begin training, and "Stop" to interrupt.
| WebUI Param | Variable | Description |
|---|---|---|
| Max Sequence Length | max_seq_len | Token limit (adjust lower with larger GBS to avoid OOM) |
| Max Prompt Length | max_prompt_len | For DPO (max: max_seq_len-10) |
| Virtual Epoch Size | num_samples_each_epoch | Recommended default |
| Recompute | recompute | Gradient checkpointing to save memory |
| Training Epochs | num_train_epochs | Overridden by max_steps if both set |
| Max Steps | max_steps | Total training steps |
| Batch Size | batch_size | Micro batch size |
| Gradient Accumulation | gradient_accumulation_steps | Steps for gradient accumulation |
Choose built-in (demo/HuggingFace) or custom datasets (mixed by probability):
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | train_dataset_path | Training dataset path |
| Dataset Probability | train_dataset_prob | Sampling probability |
| Data Type | train_dataset_type | Supported: erniekit, alpaca |
- Note: Multimodal models can additionally be configured with text-only datasets, allowing for mixed training with both multimodal and text-only data. You can adjust the data ratio through a sliding window interface.
Same options as training dataset:
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | eval_dataset_path | Evaluation dataset path |
| Dataset Probability | eval_dataset_prob | Sampling probability |
| Data Type | eval_dataset_type | Supported: erniekit, alpaca |
| WebUI Param | Variable | Description |
|---|---|---|
| Workers | dataloader_num_workers | Subprocess count (0 to disable) |
| Distributed | distributed_dataloader | Saves memory for large datasets |
| WebUI Param | Variable | Description |
|---|---|---|
| LR Scheduler | lr_scheduler_type | linear/cosine/polynomial/constant/constant_with_warmup |
| Learning Rate | learning_rate | Suggested: 3e-5 (SFT), 1e-6 (DPO), 3e-4 (SFT-LoRA), 1e-5 (DPO-LoRA) |
| Min LR | min_lr | For cosine scheduler only |
| Layerwise Decay | layerwise_lr_decay_bound | (0, 1], 1=no decay |
| Warmup Steps | warmup_steps | Typically 1-10% of max_steps |
| Optimizer | optim | Default: adamw |
| Offload Optim | offload_optim | Offload to CPU |
| Release Grads | release_grads | Reduces peak memory (recommended: True) |
| Loss Scaling | scale_loss | For float16 training |
| Weight Decay | weight_decay | AdamW parameter |
| Adam Epsilon | adam_epsilon | AdamW parameter |
| Adam Beta1 | adam_beta1 | AdamW parameter |
| Adam Beta2 | adam_beta2 | AdamW parameter |
| WebUI Param | Variable | Description |
|---|---|---|
| Logging Steps | logging_steps | Log interval |
| Eval Steps | eval_steps | Evaluation interval |
| Eval Strategy | evaluation_strategy | "steps" enables periodic evaluation |
| Save Steps | save_steps | Checkpoint interval (when save_strategy=="steps") |
| Save Strategy | save_strategy | Checkpoint saving method |
| Save Limit | save_total_limit | Max checkpoints to keep |
Load models from Basic Info section. Click "Verify Model Loading" to check status, and "Unload" to release models.
*Note: Full-parameter checkpoints in output_dir take priority for deployment.
After successful loading:
- Enter prompts in the input box
- Set roles/system prompts
- 【VL model】 Select "Enable VL Thought Mode" to enable thinking mode
- 【VL model】 You can drag and drop to upload images or videos, or click to upload, or enter a URL
- Click "Submit" to start chatting
- View history in "Chat History"
- "Clear" resets conversation
- "Stop" interrupts generation
| WebUI Param | Variable | Description |
|---|---|---|
| Max Length | max_model_len | Input+output token limit |
| Port | port | Service port |
| Max New Tokens | max_new_tokens | Generation limit |
| Top-p | top_p | Nucleus sampling (higher=more diverse) |
| Temperature | temperature | Controls randomness (higher=more creative) |
Select model in Basic Info (latest checkpoint in export dir used by default).
Choose evaluation dataset (built-in/custom). Click "Preview Eval Dataset" for visualization.
"Preview Command" shows configurations. "Start" begins evaluation, "Stop" interrupts.
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | eval_dataset_path | Evaluation dataset path |
| Dataset Probability | eval_dataset_prob | Sampling probability |
| Data Type | eval_dataset_type | Supported: erniekit, alpaca |
Two functions:
- LoRA weight merging
- Model weight splitting (safetensors format only)
LoRA Merging
Set export directory to training output dir. Click "Start Merge LoRA Weights" to merge into original model (saved in export_dir/export).
Weight Splitting
For large safetensors files, click "Start Split Model" to split weights (saved in export_dir/split_export).
| WebUI Param | Variable | Description |
|---|---|---|
| Max Shard Size (GB) | max_shard_size | Split file size limit |




