Skip to content

Add 16bit xtensa depthwise conv kernel support#3481

Merged
veblush merged 3 commits intotensorflow:mainfrom
narrietal:Add_16x8_deptwise_conv_xtensa_opt_kernel
Mar 18, 2026
Merged

Add 16bit xtensa depthwise conv kernel support#3481
veblush merged 3 commits intotensorflow:mainfrom
narrietal:Add_16x8_deptwise_conv_xtensa_opt_kernel

Conversation

@narrietal
Copy link
Copy Markdown
Contributor

@narrietal narrietal commented Feb 16, 2026

This PR adds support for the optimized Xtensa depthwise convolution kernel when using 16-bit activations and 8-bit weights. Previously, this configuration would fall back to the reference implementation.

Changes:

  • Removed hardcoded if-else logic in the Prepare function that restricted inputs to int8 activations only
  • Removed TF_LITE_ENSURE_EQ assertion enforcing int8-only inputs
  • Renamed the existing int8 evaluation function for clarity
  • Added a new evaluation function to support int16 activations with int8 weights

bug=fixes #3484

@narrietal narrietal requested a review from a team as a code owner February 16, 2026 15:39
@rameshkunasi rameshkunasi added the ci:full Triggers the comprehensive cross-platform test suite. label Mar 4, 2026
@rameshkunasi
Copy link
Copy Markdown
Contributor

Hi @narrietal,

Thank you for this PR. Can you please resolve the failed test cases for Hifi3z and Fusion F1 platforms and update the PR?

@narrietal narrietal force-pushed the Add_16x8_deptwise_conv_xtensa_opt_kernel branch from 914c0dd to dac7d79 Compare March 8, 2026 13:00
@narrietal narrietal temporarily deployed to integration-test March 8, 2026 13:00 — with GitHub Actions Inactive
@narrietal
Copy link
Copy Markdown
Contributor Author

@rameshkunasi I pushed a new commit which should solve the previous conflicts. Could you approve the execution of the automated test suite to verify it?

@narrietal narrietal temporarily deployed to integration-test March 16, 2026 18:10 — with GitHub Actions Inactive
@narrietal
Copy link
Copy Markdown
Contributor Author

@rameshkunasi I just pushed a commit with the formatted code. Could you approve the execution of the automated test?

Thanks.

@narrietal
Copy link
Copy Markdown
Contributor Author

@rameshkunasi the CI/CD test pipeline went through 👍 It seems ready to be merged

@rameshkunasi
Copy link
Copy Markdown
Contributor

@unmeshna017 Can you please have a look into these changes?

@veblush veblush added this pull request to the merge queue Mar 18, 2026
Merged via the queue into tensorflow:main with commit f5302ed Mar 18, 2026
40 checks passed
@unmeshna017
Copy link
Copy Markdown
Contributor

unmeshna017 commented Mar 23, 2026

Hi,

  1. In depthwise_conv_hifi.cc, the fused activation variant of the Conv2D depthwise kernel (xa_nn_conv2d_depthwise_v2_per_chan_sym8sxsym16s) can be used.
  2. In xtensa_depthwise_conv.h, the REF eval function declaration for INT16 precision (DepthwiseConvReferenceEvalInt16) can be added.

Adding @vp-cad and @joshih-cad as watchers.

@rameshkunasi
Copy link
Copy Markdown
Contributor

Hi @narrietal,

Can you please create a new PR with the suggested changes?

@narrietal
Copy link
Copy Markdown
Contributor Author

Hi @unmeshna017 and @rameshkunasi,

  1. I believe the current implementation is correct. The function xa_nn_conv2d_depthwise_v2_per_chan_sym8sxsym16s is called within xa_nn_conv2d_depthwise_per_chan_sym8sxsym16s. Additionally, the code includes an if–else check on the data format to ensure the appropriate function is called for each case.

  2. The file xtensa_depthwise_conv.h corresponds to the HiFi kernels, not the reference implementations. For reference kernels, the appropriate header is tensorflow/lite/micro/kernels/depthwise_conv.h. I noticed that xtensa_depthwise_conv.h declares a DepthwiseConvReferenceEvalInt8 function, which may be the source of the confusion. However, there does not appear to be any implementation of this function, suggesting it may be leftover or legacy code.

Given this, I would suggest removing the DepthwiseConvReferenceEvalInt8 declaration from xtensa_depthwise_conv.h to avoid any future misunderstanding.

@unmeshna017
Copy link
Copy Markdown
Contributor

Hi @narrietal, apologies for the delayed response.

  1. In the mapping layer, we are always passing the input data format as '0' which internally calls v2 API. Additionally, replacing current API call with the v2 version, the subsequent activation function call won't be required, as "v2" is a fused activation variant of the kernel.

  2. In xtensa_depthwise_conv.h, your observation is correct; it is okay if we don't add the REF eval function declaration for INT16 precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:full Triggers the comprehensive cross-platform test suite.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing depthwise conv 16 bit xtensa kernel

4 participants