Update dependency accelerate to v1.12.0 by renovate[bot] · Pull Request #97 · IBM/FailureSensorIQ

renovate · 2025-07-16T17:14:19Z

This PR contains the following updates:

Package	Change	Age	Confidence
accelerate	`==1.8.1` → `==1.12.0`

Release Notes

huggingface/accelerate (accelerate)

`v1.12.0`: : Deepspeed Ulysses/ALST

Compare Source

Deepspeed Ulysses/ALST integration

Deepspeed Ulysses/ALST is an efficient way of training on long sequences by employing sequence parallelism and attention head parallelism. You can learn more about this technology in this paper https://arxiv.org/abs/2506.13996 or this deepspeed tutorial https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/.

To enable Deepspeed Ulysses, you first need to create ParallelismConfig and setting sp related args:

parallelism_config = ParallelismConfig(
    sp_backend="deepspeed",
    sp_size=2,
    sp_handler=DeepSpeedSequenceParallelConfig(...),
)

Then, you need to make sure to compute the correct loss as described on our docs

        ...
        losses_per_rank = torch.distributed.nn.functional.all_gather(loss, group=sp_group)
        good_tokens = (shift_labels != -100).view(-1).sum()
        good_tokens_per_rank = torch.distributed.nn.functional.all_gather(good_tokens, group=sp_group)
        total_loss = sum(
            losses_per_rank[rank] * good_tokens_per_rank[rank]
            for rank in range(sp_world_size)
            if good_tokens_per_rank[rank] > 0
        )
        total_good_tokens = sum(good_tokens_per_rank)
        loss = total_loss / max(total_good_tokens, 1)

Thanks @S1ro1 for starting this work and for @stas00 for finishing this work. Also thanks @kashif for adding docs and reviewing/testing this PR !

This feature will also be available in HF Trainer thanks for this PR from @stas00: huggingface/transformers#41832

Minor changes

Remove warning for cpu_ram_efficient_loading by @SunMarc in #3816
update typo in bnb quantisation 4bit flag docstring by @hbraith in #3828
ArXiv -> HF Papers by @qgallouedec in #3834
Fix typo in broadcast_object_list docstring by @wsntxxn in #3823
[Bug] Update torch.optim.Optimizer parameter states after tensor parallelism by @naomili0924 in #3835
use self hosted runner by @SunMarc in #3841
device type helper by @kashif in #3843

New Contributors

@hbraith made their first contribution in #3828
@wsntxxn made their first contribution in #3823
@naomili0924 made their first contribution in #3835

Full Changelog: huggingface/accelerate@v1.11.0...v1.12.0

`v1.11.0`: : TE MXFP8, FP16/BF16 with MPS, Python 3.10

Compare Source

TE MXFP8 support

We've added support for MXFP8 in our TransformerEngine integration. To use that, you need to set use_mxfp8_block_scaling in fp8_config. See nvidia docs [here]. (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling)

Add support for TE MXFP8 recipe in accelerate by @pstjohn in #3688

FP16/BF16 Training for MPS devices

BF16 and FP16 support for MPS devices is finally here. You can now pass mixed_precision = "fp16" or "bf16" when training on a mac (fp16 requires torch 2.8 and bf16 requires torch 2.6)

Add bf16/fp16 support for amp with mps device by @SunMarc in #3373

FSDP updates

The following PRs add respectively support to ignored_params and no_sync() for FSDPv2:

feat: add ignored_params support for fsdp2 by @kmehant in #3731
fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 by @EquationWalker in #3762

Mixed precision can now be passed as a dtype string from accelerate cli flag or fsdp_config in accelerate config file:

feat: allow mixed precision policy as dtype by @kmehant in #3751

Nd-parallel updates

Some minor updates concerning nd-parallelism.

Context Parallelism docs typos fixed by @sergiopaniego in #3761
Feat: add to_json by @S1ro1 in #3743
make torch_native_parallelism examples device agnostic by @yao-matrix in #3759
[ND Parallel] Update examples, cleanup by @S1ro1 in #3737

Bump to Python 3.10

We've dropped support for python 3.9 as it reached EOL in October.

Bump to python3.10 + update linter by @SunMarc in #3809

Lots of minor fixes:

fix: CPU RAM efficient loading for nd or HSDP parallelisms by @kmehant in #3740
xpu INT64 all_gather issue fixed in 2.9 by @yao-matrix in #3756
Specify device_ids in torch.distributed.barrier for PartialState by @qgallouedec in #3744
fix: specify device for process_tensor in example usage by @qgallouedec in #3755
Lower complexity of get_balanced_memory by adding a set by @SamuelBarryCS in #3776
Fix (skip) cuda cache flush when origin device is cpu and offloaded to meta by @Qubitium in #3796
Fix convert LayerNorm without bias to fp8 by @mjun0812 in #3725
Add optional typing by @cyyever in #3769
refactor: Use with in Accelerator.autocast()instead of __enter__() and __exit__() for more elegant style. by @EquationWalker in #3767
switch XPU ccl backend to torch-builtin xccl in test_zero3_integration by @yao-matrix in #3773
fix FSDP2 test case failure on XPU by @yao-matrix in #3771
Fix tests by @SunMarc in #3722
Protect import for device_mesh by @SunMarc in #3742
Fix SWANLAB_MODE by @SunMarc in #3808
Fix tracking swanlab by @SunMarc in #3810
refactor: nit change for get_parameters_from_modules (code debt) by @kmehant in #3815
Remove deprecated FindTiedParametersResult by @cyyever in #3786
Add optional typing by @cyyever in #3769
remove mlflow from testing by @SunMarc in #3783
enable 2 model hook ut cases on XPU by @yao-matrix in #3774
Added Tip for better rendering by @sergiopaniego in #3781
Fix typos by @cyyever in #3753
fix: torch_npu import error in some envs by @yanyongyu in #3764
Fix: typo makes tests fail by @S1ro1 in #3765
fix Muti node CUDA error: invalid device ordinal #3775 by @RicardoDominguez in #3779
use reset_peak_memory_stats on xpu by @yao-matrix in #3772

New Contributors

@mjun0812 made their first contribution in #3725
@sergiopaniego made their first contribution in #3761
@EquationWalker made their first contribution in #3762
@yanyongyu made their first contribution in #3764
@RicardoDominguez made their first contribution in #3779
@SamuelBarryCS made their first contribution in #3776
@Qubitium made their first contribution in #3796

Full Changelog: huggingface/accelerate@v1.10.1...v1.11.0

`v1.10.1`: : Patchfix

Compare Source

Feat: add to_json by @S1ro1 in #3743
Protect import for device_mesh by @SunMarc in #3742.

Full Changelog: huggingface/accelerate@v1.10.0...v1.10.1

`v1.10.0`: : N-D Parallelism

Compare Source

N-D Parallelism

Training large models across multiple GPUs can be complex, especially when combining different parallelism strategies (e.g TP, CP, DP). To simplify this process, we've collaborated with Axolotl to introduce an easy-to-use integration that allows you to apply any combination of parallelism strategies directly in your training script. Just pass a ParallelismConfig specifying the size of each parallelism type—it's that simple.
Learn more about how it works in our latest blogpost.

parallelism_config = ParallelismConfig(
    dp_shard_size=2,
    dp_replicate_size=2,
    cp_size=2,
    tp_size=2,
)
accelerator = Accelerator(
    parallelism_config=parallelism_config,
   ...
)
model = AutoModelForCausalLM.from_pretrained("your-model-name", device_mesh=accelerator.torch_device_mesh)
model = accelerator.prepare(model)

Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) by @SalmanMohammadi in #3682
Feat: context parallel v2.0 by @S1ro1 in #3700
set default submesh_tp_size to prevent unset local variable error by @winglian in #3687
Add Parallelism getter property to Accelerator class by @WoosungMyung in #3703
Fix: prepare works even if nothing except tp specified (rare) by @S1ro1 in #3707
Set parallelism_config in constructor due to Trainer reset of State by @winglian in #3713
Fix: tp size wouldn't read from env by @S1ro1 in #3716
Remove ParallelismConfig from PartialState by @SunMarc in #3720

FSDP improvements

We've fixed ignored modules attribute. With this, it is now possible to train PEFT model that moe layers that contrains q_proj and v_proj parameters. This is especially important for fine-tuning gpt-oss model.

ENH: Allow FSDP ignored modules to be regex by @BenjaminBossan in #3698
TST Add test for FSDP ignored_modules as str by @BenjaminBossan in #3719

Minor improvements

feature: CpuOffload pre_forward don't attempt to move if already on device by @JoeGaffney in #3695
Fix: Ensure environment variable values are case-insensitive in Accelerate by @jp1924 in #3712
remove use_ipex by @SunMarc in #3721

New Contributors

@SalmanMohammadi made their first contribution in #3682
@WoosungMyung made their first contribution in #3703
@jp1924 made their first contribution in #3712
@JoeGaffney made their first contribution in #3695

Full Changelog: huggingface/accelerate@v1.9.0...v1.10.0

`v1.9.0`: : Trackio support, Model loading speedup, Minor distributed improvements

Compare Source

Trackio tracker support

We've added support for a trackio, lightweight, 💯 free experiment tracking Python library built on top of 🤗 Datasets and Spaces.

Main features are:

Local-first design: dashboard runs locally by default. You can also host it on Spaces by specifying a space_id.
Persists logs locally (or in a private Hugging Face Dataset)
Visualize experiments with a Gradio dashboard locally (or on Hugging Face Spaces)
Everything here, including hosting on Hugging Faces, is free!

To use it with accelerate, you need to set log_with and initialize the trackers

accelerator = Accelerator(log_with="trackio")
config={"learning_rate": 0.001, "batch_size": 32}

# init_kwargs in order to host the dashboard on spaces
init_kwargs = {"trackio": {"space_id": "hf_username/space_name"}
accelerator.init_trackers("example_project", config=config, init_kwargs=init_kwargs})

Thanks @pcuenca for the integration !

trackio by @pcuenca in #3669

Model loading speedup when relying `set_module_tensor_to_device`

Setting tensor while clearing cache is very slow, so we added clear_device option to disable it.
Another small optimization is using non_blocking everywhere and syncing just before returning control to the user. This makes the loading slightly faster.

Speedup model loading by 4-5x in Diffusers ⚡ by @a-r-r-o-w in #3674

FDSP, Deepspeed, FP8 minor improvements

Add support for e5e2 and default to hybrid when launcher is used by @IlyasMoutawwakil in #3640
Fix FP8 tests, enable FP8 to be used without direct Accelerator() configuring by @pstjohn in #3677
Bunch of FSDP improvements by @S1ro1 in #3671
Fix: properly error when DDP + Dtensor model by @S1ro1 in #3629
Fix fsdp2 example typo by @shimizust in #3657
Added a check in no_sync() to avoid errors when using deepspeed zero2/3 by @xliu0105 in #3656

🚨🚨🚨 Breaking changes 🚨🚨🚨

find_executable_batch_size() will no longer halves the batch after every OOM. Instead, we will multiply the batch size by 0.9. This should help user not waste gpu capacity.

“Stop Halving My Batch!” · Default back-off 0.5 → 0.9 by @SunMarc in #3684

What's Changed

[typo] shards instead of shard by @SunMarc in #3645
Docs: Fix typos in gradient accumulation guide by @kilavvy in #3649
xpu enablement on left cases by @yao-matrix in #3654
unpin datasets in examples requirements by @SunMarc in #3681
fix: wandb config not saved in offline mode by @ved1beta in #3648
accelerate/data_loader.py: do not yield if the base_dataloader is empty by @0xnightwind in #3659
warn for invalid keys by @ved1beta in #3613
Update Gaudi runner image to latest SynapseAI and enable previously disabled tests by @IlyasMoutawwakil in #3653

New Contributors

@kilavvy made their first contribution in #3649
@shimizust made their first contribution in #3657
@xliu0105 made their first contribution in #3656
@0xnightwind made their first contribution in #3659

Full Changelog: huggingface/accelerate@v1.8.1...v1.9.0

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot changed the title ~~Update dependency accelerate to v1.9.0~~ Update dependency accelerate to v1.10.0 Aug 7, 2025

renovate bot force-pushed the renovate/accelerate-1.x branch from 43f17a7 to ce6ced7 Compare August 7, 2025 12:43

renovate bot force-pushed the renovate/accelerate-1.x branch from ce6ced7 to 167048b Compare August 25, 2025 18:52

renovate bot changed the title ~~Update dependency accelerate to v1.10.0~~ Update dependency accelerate to v1.10.1 Aug 25, 2025

renovate bot changed the title ~~Update dependency accelerate to v1.10.1~~ Update dependency accelerate to v1.11.0 Oct 20, 2025

renovate bot force-pushed the renovate/accelerate-1.x branch from 167048b to 5e2530c Compare October 20, 2025 17:57

renovate bot force-pushed the renovate/accelerate-1.x branch from 5e2530c to a3e6c13 Compare November 21, 2025 13:40

renovate bot changed the title ~~Update dependency accelerate to v1.11.0~~ Update dependency accelerate to v1.12.0 Nov 21, 2025

Update dependency accelerate to v1.12.0

1a5afb0

renovate bot force-pushed the renovate/accelerate-1.x branch from a3e6c13 to 1a5afb0 Compare December 15, 2025 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency accelerate to v1.12.0#97

Update dependency accelerate to v1.12.0#97
renovate[bot] wants to merge 1 commit intomainfrom
renovate/accelerate-1.x

renovate bot commented Jul 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

renovate bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v1.12.0: : Deepspeed Ulysses/ALST

Deepspeed Ulysses/ALST integration

Minor changes

New Contributors

v1.11.0: : TE MXFP8, FP16/BF16 with MPS, Python 3.10

TE MXFP8 support

FP16/BF16 Training for MPS devices

FSDP updates

Nd-parallel updates

Bump to Python 3.10

Lots of minor fixes:

New Contributors

v1.10.1: : Patchfix

v1.10.0: : N-D Parallelism

N-D Parallelism

FSDP improvements

Minor improvements

New Contributors

v1.9.0: : Trackio support, Model loading speedup, Minor distributed improvements

Trackio tracker support

Model loading speedup when relying set_module_tensor_to_device

FDSP, Deepspeed, FP8 minor improvements

🚨🚨🚨 Breaking changes 🚨🚨🚨

What's Changed

New Contributors

Configuration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

renovate bot commented Jul 16, 2025 •

edited

Loading

`v1.12.0`: : Deepspeed Ulysses/ALST

`v1.11.0`: : TE MXFP8, FP16/BF16 with MPS, Python 3.10

`v1.10.1`: : Patchfix

`v1.10.0`: : N-D Parallelism

`v1.9.0`: : Trackio support, Model loading speedup, Minor distributed improvements

Model loading speedup when relying `set_module_tensor_to_device`