Aggregator tracker

This issue tracks some candidate aggregators that could maybe be added to TorchJD. Feel free to suggest other methods, or to add more information about the methods in the table. If you're interested in implementing one, please tell it so that we can investigate the method in a bit more details (in particular, we need to verify that it can be implemented as an aggregator), and so that we don't work concurrently on the same thing.

| Name | Ref | Stateful | Existing implementations | Special Remarks |
|-|-|-|-|-|
| CR-MOGM (Correlation-Reduced Multi-Objective Gradient Manipulation) | [On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond](https://proceedings.neurips.cc/paper_files/paper/2022/file/f91bd64a3620aad8e70a27ad9cb3ca57-Paper-Conference.pdf) (NeurIPS 2022, 76 citations) | Yes (weights EMA) | None found | Could probably be implemented as a wrapper that would apply to any Weighting. |
| MoCo (Multi-Objective Gradient Correction) | [Mitigating gradient bias in multi-objective learning](https://openreview.net/pdf?id=dLAYGdKTi2) (ICML 2023 oral, 95 citations) | Yes (Jacobian EMA) | [LibMTL](https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/MoCo.py) | |
| MoDo (Multi-Objective gradient with DOuble Sampling) | [Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance.](https://jmlr.org/papers/volume25/23-1287/23-1287.pdf) (NeurIPS 2023, 33 citations) | Yes (weights EMA) | [official](https://github.com/heshandevaka/Trade-Off-MOL/blob/main/LibMTL/LibMTL/weighting/MoDo.py), [LibMTL](https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/MoDo.py) | |
| SDMGrad (Stochastic Direction-oriented Multi-objective Gradient descent) | [Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms](https://proceedings.neurips.cc/paper_files/paper/2023/file/0e5b96f97c1813bb75f6c28532c2ecc7-Paper-Conference.pdf) (NeurIPS 2023, 50 citations) | ? | [official](https://github.com/OptMN-Lab/SDMGrad/blob/main/methods/weight_methods.py#L770-L904), [LibMTL](https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/SDMGrad.py) | I think their implementation is very different from the algorithm in the paper. I'm also not sure that it can be implemented just as an aggregator. Note that they also have SDMGrad-OS (I don't know the difference). Need more investigation before starting to implement this.
| FairGrad | [Fair Resource Allocation in Multi-Task Learning](https://arxiv.org/pdf/2402.15638) (ICML 2024, 60 citations) | No | [official](https://github.com/OptMN-Lab/FairGrad/blob/bb7e8a606634408901969cf7a83467cee032cd76/methods/weight_methods.py#L770-L872), [LibMTL](https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/FairGrad.py) | We already started working on this with @PierreQuinton a long time ago. I think we could reuse our old implementation quite easily, or just adapt the official implementation. |
| M-ConFIG | [ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks](https://openreview.net/pdf?id=APojAzJQiq) (ICLR 2025 spotlight, 57 citations) | Yes | [official](https://github.com/tum-pbs/ConFIG/blob/db65d3670f86f0bd049adb337e116e783a7a838c/conflictfree/momentum_operator.py#L137-L263) | ConFIG is already in TorchJD, but not the momentum-based version M-ConFIG, which I think can be implemented as a wrapper working for any aggregator. |
| DB-MTL (Dual Balancing Multi-Task Learning) | [Dual-balancing for multi-task learning](https://arxiv.org/pdf/2308.12029) (Neural Networks n°195, 9 citations) | Yes (Jacobian EMA) | [LibMTL (official)](https://github.com/median-research-group/LibMTL/blob/main/LibMTL/weighting/DB_MTL.py) | The paper combines 3 ideas: applying a log to the losses (not something we need TorchJD for), normalizing the gradients (this should be a Normalizer rather than a part of the aggregator) and using the EMA of the gradients. This last idea is also common with some other methods, so maybe we could implement it as a wrapper or something. So in the end I don't think there is anything to implement specifically for DB-MTL. |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregator tracker #665

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Name	Ref	Stateful	Existing implementations	Special Remarks
CR-MOGM (Correlation-Reduced Multi-Objective Gradient Manipulation)	On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond (NeurIPS 2022, 76 citations)	Yes (weights EMA)	None found	Could probably be implemented as a wrapper that would apply to any Weighting.
MoCo (Multi-Objective Gradient Correction)	Mitigating gradient bias in multi-objective learning (ICML 2023 oral, 95 citations)	Yes (Jacobian EMA)	LibMTL
MoDo (Multi-Objective gradient with DOuble Sampling)	Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance. (NeurIPS 2023, 33 citations)	Yes (weights EMA)	official, LibMTL
SDMGrad (Stochastic Direction-oriented Multi-objective Gradient descent)	Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms (NeurIPS 2023, 50 citations)	?	official, LibMTL	I think their implementation is very different from the algorithm in the paper. I'm also not sure that it can be implemented just as an aggregator. Note that they also have SDMGrad-OS (I don't know the difference). Need more investigation before starting to implement this.
FairGrad	Fair Resource Allocation in Multi-Task Learning (ICML 2024, 60 citations)	No	official, LibMTL	We already started working on this with @PierreQuinton a long time ago. I think we could reuse our old implementation quite easily, or just adapt the official implementation.
M-ConFIG	ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks (ICLR 2025 spotlight, 57 citations)	Yes	official	ConFIG is already in TorchJD, but not the momentum-based version M-ConFIG, which I think can be implemented as a wrapper working for any aggregator.
DB-MTL (Dual Balancing Multi-Task Learning)	Dual-balancing for multi-task learning (Neural Networks n°195, 9 citations)	Yes (Jacobian EMA)	LibMTL (official)	The paper combines 3 ideas: applying a log to the losses (not something we need TorchJD for), normalizing the gradients (this should be a Normalizer rather than a part of the aggregator) and using the EMA of the gradients. This last idea is also common with some other methods, so maybe we could implement it as a wrapper or something. So in the end I don't think there is anything to implement specifically for DB-MTL.

Aggregator tracker #665

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions