Open
Conversation
43f17a7 to
ce6ced7
Compare
ce6ced7 to
167048b
Compare
167048b to
5e2530c
Compare
5e2530c to
a3e6c13
Compare
a3e6c13 to
1a5afb0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==1.8.1→==1.12.0Release Notes
huggingface/accelerate (accelerate)
v1.12.0: : Deepspeed Ulysses/ALSTCompare Source
Deepspeed Ulysses/ALST integration
Deepspeed Ulysses/ALST is an efficient way of training on long sequences by employing sequence parallelism and attention head parallelism. You can learn more about this technology in this paper https://arxiv.org/abs/2506.13996 or this deepspeed tutorial https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/.
To enable Deepspeed Ulysses, you first need to create
ParallelismConfigand settingsprelated args:Then, you need to make sure to compute the correct loss as described on our docs
... losses_per_rank = torch.distributed.nn.functional.all_gather(loss, group=sp_group) good_tokens = (shift_labels != -100).view(-1).sum() good_tokens_per_rank = torch.distributed.nn.functional.all_gather(good_tokens, group=sp_group) total_loss = sum( losses_per_rank[rank] * good_tokens_per_rank[rank] for rank in range(sp_world_size) if good_tokens_per_rank[rank] > 0 ) total_good_tokens = sum(good_tokens_per_rank) loss = total_loss / max(total_good_tokens, 1)Thanks @S1ro1 for starting this work and for @stas00 for finishing this work. Also thanks @kashif for adding docs and reviewing/testing this PR !
This feature will also be available in HF Trainer thanks for this PR from @stas00: huggingface/transformers#41832
Minor changes
cpu_ram_efficient_loadingby @SunMarc in #3816New Contributors
Full Changelog: huggingface/accelerate@v1.11.0...v1.12.0
v1.11.0: : TE MXFP8, FP16/BF16 with MPS, Python 3.10Compare Source
TE MXFP8 support
We've added support for MXFP8 in our TransformerEngine integration. To use that, you need to set
use_mxfp8_block_scalinginfp8_config. See nvidia docs [here]. (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#MXFP8-and-block-scaling)FP16/BF16 Training for MPS devices
BF16 and FP16 support for MPS devices is finally here. You can now pass
mixed_precision = "fp16" or "bf16"when training on a mac (fp16requires torch 2.8 andbf16requires torch 2.6)FSDP updates
The following PRs add respectively support to
ignored_paramsandno_sync()for FSDPv2:Mixed precision can now be passed as a dtype string from accelerate cli flag or
fsdp_configin accelerate config file:Nd-parallel updates
Some minor updates concerning nd-parallelism.
Bump to Python 3.10
We've dropped support for python 3.9 as it reached EOL in October.
Lots of minor fixes:
cpuand offloaded tometaby @Qubitium in #3796within Accelerator.autocast()instead of__enter__()and__exit__()for more elegant style. by @EquationWalker in #3767SWANLAB_MODEby @SunMarc in #3808New Contributors
Full Changelog: huggingface/accelerate@v1.10.1...v1.11.0
v1.10.1: : PatchfixCompare Source
Full Changelog: huggingface/accelerate@v1.10.0...v1.10.1
v1.10.0: : N-D ParallelismCompare Source
N-D Parallelism
Training large models across multiple GPUs can be complex, especially when combining different parallelism strategies (e.g TP, CP, DP). To simplify this process, we've collaborated with Axolotl to introduce an easy-to-use integration that allows you to apply any combination of parallelism strategies directly in your training script. Just pass a
ParallelismConfigspecifying the size of each parallelism type—it's that simple.Learn more about how it works in our latest blogpost.
ParallelismConfigfromPartialStateby @SunMarc in #3720FSDP improvements
We've fixed ignored modules attribute. With this, it is now possible to train PEFT model that moe layers that contrains
q_projandv_projparameters. This is especially important for fine-tuninggpt-ossmodel.Minor improvements
New Contributors
Full Changelog: huggingface/accelerate@v1.9.0...v1.10.0
v1.9.0: : Trackio support, Model loading speedup, Minor distributed improvementsCompare Source
Trackio tracker support
We've added support for a trackio, lightweight, 💯 free experiment tracking Python library built on top of 🤗 Datasets and Spaces.
Main features are:
space_id.To use it with accelerate, you need to set
log_withand initialize the trackersThanks @pcuenca for the integration !
Model loading speedup when relying
set_module_tensor_to_deviceSetting tensor while clearing cache is very slow, so we added
clear_deviceoption to disable it.Another small optimization is using
non_blockingeverywhere and syncing just before returning control to the user. This makes the loading slightly faster.FDSP, Deepspeed, FP8 minor improvements
Accelerator()configuring by @pstjohn in #3677🚨🚨🚨 Breaking changes 🚨🚨🚨
find_executable_batch_size()will no longer halves the batch after every OOM. Instead, we will multiply the batch size by 0.9. This should help user not waste gpu capacity.What's Changed
New Contributors
Full Changelog: huggingface/accelerate@v1.8.1...v1.9.0
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.