Bidirectional sync of AI models and datasets between HuggingFace and ModelScope
HF-MS Sync is a GitHub Action that automatically synchronizes AI models and datasets between HuggingFace Hub and ModelScope. It provides a flexible, configuration-driven approach to keeping your model and dataset repositories in sync across both platforms.
- Bidirectional Sync - Sync from HuggingFace to ModelScope, ModelScope to HuggingFace, or both directions
- Model & Dataset Support - Sync any type of repository: models, datasets, and more
- Smart Change Detection - Only transfers files that have actually changed, saving bandwidth and time
- Conflict Resolution - Configurable strategies: newer-wins, platform priority, or skip
- Multiple Triggers - Scheduled (cron), manual (workflow_dispatch), and webhook (repository_dispatch)
- Dry Run Mode - Preview what would be synced without actually transferring files
- Large File Support - Git LFS compatible, handles multi-GB model weights efficiently
- State Persistence - Tracks sync history across runs via GitHub Actions artifacts
- GitHub Job Summary - Generates detailed Markdown reports in your workflow run summary
Create config/sync_config.yaml in your repository:
global:
sync_direction: "hf_to_ms" # hf_to_ms | ms_to_hf | bidirectional
conflict_strategy: "newer_wins" # newer_wins | hf_priority | ms_priority | skip
max_file_size_gb: 50
max_parallel_downloads: 4
models:
- name: "qwen2.5-7b"
hf_repo_id: "Qwen/Qwen2.5-7B-Instruct"
ms_repo_id: "Qwen/Qwen2.5-7B-Instruct"
direction: "hf_to_ms"
include_patterns:
- "*.safetensors"
- "*.json"
- "tokenizer*"
exclude_patterns:
- "*.msgpack"
enabled: true
datasets:
- name: "my-dataset"
hf_repo_id: "my-org/my-dataset"
ms_repo_id: "my-org/my-dataset"
direction: "hf_to_ms"
enabled: trueAdd these secrets to your GitHub repository (Settings > Secrets and variables > Actions):
| Secret | Description |
|---|---|
HF_TOKEN |
HuggingFace API token with write access |
MODELSCOPE_TOKEN |
ModelScope API token with write access |
Create .github/workflows/sync.yml:
name: Sync HF <-> ModelScope
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
workflow_dispatch:
inputs:
sync_target:
description: 'Specific item to sync (empty = all)'
required: false
direction:
description: 'Override direction'
type: choice
options: [config, hf_to_ms, ms_to_hf, bidirectional]
dry_run:
description: 'Dry run'
type: boolean
default: false
jobs:
sync:
runs-on: ubuntu-latest
timeout-minutes: 360
steps:
- uses: actions/checkout@v4
- uses: kubeservice-stack/hf-sync-action@v1
with:
config: config/sync_config.yaml
direction: ${{ inputs.direction || 'config' }}
dry_run: ${{ inputs.dry_run || 'false' }}
target: ${{ inputs.sync_target || '' }}
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
MODELSCOPE_TOKEN: ${{ secrets.MODELSCOPE_TOKEN }}- uses: kubeservice-stack/hf-sync-action@v1
with:
config: 'config/sync_config.yaml'
direction: 'hf_to_ms'
dry_run: 'false'
target: ''
log_level: 'INFO'
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
MODELSCOPE_TOKEN: ${{ secrets.MODELSCOPE_TOKEN }}# Install dependencies
pip install -r requirements.txt
# Run sync
HF_TOKEN=hf_xxx MODELSCOPE_TOKEN=ms_xxx \
python -m src.sync_engine \
--config config/sync_config.yaml \
--state-dir .sync_state/
# Dry run
python -m src.sync_engine --config config/sync_config.yaml --dry-run trueTrigger sync from an external service:
curl -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/repos/{owner}/{repo}/dispatches \
-d '{
"event_type": "sync-trigger",
"client_payload": {
"target": "qwen2.5-7b",
"direction": "hf_to_ms",
"platform": "hf"
}
}'| Setting | Type | Default | Description |
|---|---|---|---|
sync_direction |
string | bidirectional |
Default sync direction (hf_to_ms, ms_to_hf, bidirectional) |
conflict_strategy |
string | newer_wins |
How to resolve conflicts (newer_wins, hf_priority, ms_priority, skip) |
max_file_size_gb |
float | 50 |
Skip files larger than this size |
retry_attempts |
int | 3 |
Number of retry attempts for failed transfers |
retry_delay_seconds |
int | 30 |
Delay between retries |
max_parallel_downloads |
int | 4 |
Maximum concurrent downloads |
max_parallel_uploads |
int | 2 |
Maximum concurrent uploads |
delete_orphaned |
bool | false |
Delete files on target that don't exist on source |
| Setting | Type | Default | Description |
|---|---|---|---|
name |
string | required | Unique identifier for this sync item |
hf_repo_id |
string | required | HuggingFace repository ID (e.g., org/model-name) |
ms_repo_id |
string | required | ModelScope repository ID |
direction |
string | global | Override global direction for this item |
include_patterns |
list | ["*"] |
Glob patterns for files to include |
exclude_patterns |
list | [] |
Glob patterns for files to exclude |
enabled |
bool | true |
Enable/disable this sync item |
| Strategy | Behavior |
|---|---|
newer_wins |
The file with the more recent modification timestamp wins |
hf_priority |
HuggingFace version always wins on conflict |
ms_priority |
ModelScope version always wins on conflict |
skip |
Skip conflicting files (keep both versions as-is) |
| Input | Required | Default | Description |
|---|---|---|---|
config |
No | config/sync_config.yaml |
Path to configuration file |
direction |
No | config |
Override sync direction |
dry_run |
No | false |
Preview changes without transferring |
target |
No | (all) | Sync only a specific item by name |
log_level |
No | INFO |
Logging verbosity |
Triggers (cron / manual / webhook)
│
▼
Sync Engine (Python)
├── Config Loader ← YAML configuration
├── Change Detector ← Compares file snapshots between platforms
└── Transfer Executor ← Downloads from source, uploads to target
│
┌────┴────┐
▼ ▼
HF Adapter MS Adapter ← Platform-specific API wrappers
│ │
▼ ▼
HuggingFace ModelScope ← Remote platforms
- Snapshot - Fetches file lists and metadata from both platforms
- Detect - Compares snapshots to identify new, updated, and deleted files
- Transfer - Downloads changed files from source and uploads to target
- State - Persists sync state as GitHub Actions artifacts for next run
- Report - Generates a Markdown summary in the GitHub Actions run
The examples/ directory contains ready-to-use configurations and workflows for common scenarios:
| Example | Scenario |
|---|---|
| 01-hf-to-ms-mirror.yaml | One-way HF -> MS mirror for popular models |
| 02-bidirectional-sync.yaml | Bidirectional sync with conflict resolution |
| 03-multi-model-batch.yaml | Batch sync many models at once |
| 04-dataset-sync.yaml | Dataset sync (parquet, jsonl, images) |
| 05-minimal-external.yaml | Minimal config for external projects |
| 06-selective-patterns.yaml | Selective file patterns (inference-ready only) |
| Example | Scenario |
|---|---|
| simple-mirror.yml | Scheduled one-way sync with manual trigger |
| bidirectional-with-notify.yml | Bidirectional + DingTalk/Feishu notifications |
| webhook-triggered.yml | Event-driven sync via webhook |
| matrix-sync.yml | Parallel matrix sync (one job per model) |
# 1. Copy config
cp examples/05-minimal-external.yaml config/sync_config.yaml
# Edit with your repo IDs
# 2. Copy workflow
cp examples/workflows/simple-mirror.yml .github/workflows/sync.yml
# 3. Add secrets HF_TOKEN and MODELSCOPE_TOKEN in GitHub Settings
# 4. Push
git add config/ .github/workflows/sync.yml
git commit -m "Add HF-MS sync"
git pushSee examples/README.md for full details.
The project includes an end-to-end test workflow that performs a real sync with a tiny model and verifies file consistency.
- Go to Actions > E2E Test - Bidirectional Sync
- Click Run workflow
- Fill in:
- HF Repo:
sshleifer/tiny-gpt2(default, ~60MB) - MS Repo: your test repo ID (e.g.,
your-org/e2e-test) - Direction:
hf_to_ms,ms_to_hf, orbidirectional - Cleanup:
trueto delete the MS test repo after verification
- HF Repo:
| Stage | Check |
|---|---|
| Unit tests | All 48 unit tests pass |
| Sync execution | Real file transfer between HF and MS |
| File verification | File lists match on both platforms |
| Idempotency | Second sync detects zero changes |
| Cleanup | Optionally deletes test repo |
On pull requests, the E2E workflow also runs a lightweight validation (no secrets needed):
- Config parsing and validation
- Module import checks for all adapters
git clone https://github.com/kubeservice-stack/hf-sync-action.git
cd hf-sync-action
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"pytest tests/ -v├── .github/workflows/ # CI/CD workflows (sync, e2e-test, webhook)
├── src/
│ ├── adapters/ # Platform adapters (HF, MS)
│ ├── config.py # Configuration loading
│ ├── change_detector.py # Change detection logic
│ ├── sync_engine.py # Main sync orchestrator
│ ├── models.py # Data models
│ ├── report.py # Report generation
│ └── utils.py # Utilities
├── config/ # Default sync configuration
├── examples/ # Ready-to-use configs and workflows
│ ├── workflows/ # Example GitHub Action workflows
│ └── *.yaml # Scenario-based config examples
├── tests/ # Test suite
│ ├── e2e/ # End-to-end test scripts
│ └── test_*.py # Unit tests
├── action.yml # GitHub Action definition
├── Dockerfile # Action runtime
└── pyproject.toml # Python project config
Q: How long does syncing take? A: Depends on the number and size of files. Initial sync of a 7B model (~14GB) typically takes 20-40 minutes. Subsequent syncs only transfer changed files.
Q: Does it support private repositories? A: Yes, as long as your API tokens have the necessary permissions for both platforms.
Q: What happens if a sync is interrupted? A: The sync state is persisted after each successful run. On the next run, it will only transfer files that have changed since the last successful sync.
Q: Can I sync only specific file types?
A: Yes, use include_patterns and exclude_patterns in your configuration. For example, sync only *.safetensors files.
Q: What if both platforms have different versions of a file?
A: The conflict_strategy setting determines the behavior. newer_wins uses timestamps, or you can set platform priority.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request