Vanishing Jacobian problem

Dear TorchJD maintainers,

First, thank you for this incredible work, as I could easily integrate it into my research on multi-task learning for pre-training foundation models. I’m not sure if this is an issue you have encountered before, or even well-known, but I have seemed to run into the Jacobian analogue of the vanishing gradient problem. After several training steps while pre-training my foundation model with TorchJD, the Jacobian values of the tensors connected to the shared features have turned into NaNs during backpropagation:

```
 File ".local/lib/python3.10/site-packages/torchjd/autojac/mtl_backward.py", line 117, in mtl_backward
    backward_transform(EmptyTensorDict())
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/base.py", line 79, in __call__
    return self.outer(intermediate)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/base.py", line 79, in __call__
    return self.outer(intermediate)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/base.py", line 78, in __call__
    intermediate = self.inner(input)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/aggregate.py", line 27, in __call__
    return self.transform(input)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/base.py", line 79, in __call__
    return self.outer(intermediate)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/base.py", line 78, in __call__
    intermediate = self.inner(input)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/aggregate.py", line 49, in __call__
    return self._aggregate_group(ordered_matrices, self.aggregator)
  File ".local/lib/python3.10/site-packages/torchjd/autojac/_transform/aggregate.py", line 83, in _aggregate_group
    united_gradient_vector = aggregator(united_jacobian_matrix)
  File ".local/lib/python3.10/site-packages/torchjd/aggregation/bases.py", line 36, in __call__
    return super().__call__(matrix)
  File ".local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File ".local/lib/python3.10/site-packages/torchjd/aggregation/bases.py", line 89, in forward
    self._check_is_finite(matrix)
  File ".local/lib/python3.10/site-packages/torchjd/aggregation/bases.py", line 23, in _check_is_finite
    raise ValueError(
ValueError: Parameter `matrix` should be a tensor of finite elements (no nan, inf or -inf values). Found `matrix = tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0')`. Matrix shape: torch.Size([6, 106799]).
```

Note: I modified the `bases.py` file to print out the shapes of the offending matrices when the error is thrown. 

Have you encountered this issue before, whether this is a bug on my end or a well-known problem in Jacobian descent? 

I currently have multiple contrastive losses during pre-training that manipulate embedding outputs at different layers of the model, i.e., intermediate losses between layers. These losses specify the "shared_features" as the embeddings outputted before the first loss, which I suspect could be something I am implementing wrong on my end. Let me know if you have any other questions about my training setup to help understand the issue, but I’m afraid I’m not allowed to share everything.

Sincerely,
Matthew Chen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vanishing Jacobian problem #340

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vanishing Jacobian problem #340

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions