MLA with `cudnn_flash_jax` leads to double scaling

### Bug report

`attention_mla.MLA` scales queries after projection, https://github.com/AI-Hypercomputer/maxtext/blob/3a17530c0aab1486f71718803647ef3bdd44715f/src/MaxText/layers/attention_mla.py#L804 however `cudnn_jax_flash_attention` (implementation used when `attention=cudnn_flash_jax`) also hardcodes the scale https://github.com/AI-Hypercomputer/maxtext/blob/3a17530c0aab1486f71718803647ef3bdd44715f/src/MaxText/layers/attention_op.py#L1509

This leads to incorrect attention results that do not match `attention=dot_product` and other implementations.

### Logs/Output

_No response_

### Environment Information

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLA with `cudnn_flash_jax` leads to double scaling #3138

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MLA with cudnn_flash_jax leads to double scaling #3138

Description

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

MLA with `cudnn_flash_jax` leads to double scaling #3138