Add 16bit xtensa depthwise conv kernel support by narrietal · Pull Request #3481 · tensorflow/tflite-micro

narrietal · 2026-02-16T15:39:50Z

This PR adds support for the optimized Xtensa depthwise convolution kernel when using 16-bit activations and 8-bit weights. Previously, this configuration would fall back to the reference implementation.

Changes:

Removed hardcoded if-else logic in the Prepare function that restricted inputs to int8 activations only
Removed TF_LITE_ENSURE_EQ assertion enforcing int8-only inputs
Renamed the existing int8 evaluation function for clarity
Added a new evaluation function to support int16 activations with int8 weights

bug=fixes #3484

rameshkunasi · 2026-03-04T17:33:29Z

Hi @narrietal,

Thank you for this PR. Can you please resolve the failed test cases for Hifi3z and Fusion F1 platforms and update the PR?

narrietal · 2026-03-09T12:02:57Z

@rameshkunasi I pushed a new commit which should solve the previous conflicts. Could you approve the execution of the automated test suite to verify it?

narrietal · 2026-03-16T18:12:25Z

@rameshkunasi I just pushed a commit with the formatted code. Could you approve the execution of the automated test?

Thanks.

narrietal · 2026-03-17T11:55:00Z

@rameshkunasi the CI/CD test pipeline went through 👍 It seems ready to be merged

rameshkunasi · 2026-03-18T06:52:51Z

@unmeshna017 Can you please have a look into these changes?

unmeshna017 · 2026-03-23T10:25:17Z

Hi,

In depthwise_conv_hifi.cc, the fused activation variant of the Conv2D depthwise kernel (xa_nn_conv2d_depthwise_v2_per_chan_sym8sxsym16s) can be used.
In xtensa_depthwise_conv.h, the REF eval function declaration for INT16 precision (DepthwiseConvReferenceEvalInt16) can be added.

Adding @vp-cad and @joshih-cad as watchers.

rameshkunasi · 2026-03-24T05:56:47Z

Hi @narrietal,

Can you please create a new PR with the suggested changes?

narrietal · 2026-03-24T16:34:39Z

Hi @unmeshna017 and @rameshkunasi,

I believe the current implementation is correct. The function xa_nn_conv2d_depthwise_v2_per_chan_sym8sxsym16s is called within xa_nn_conv2d_depthwise_per_chan_sym8sxsym16s. Additionally, the code includes an if–else check on the data format to ensure the appropriate function is called for each case.
The file xtensa_depthwise_conv.h corresponds to the HiFi kernels, not the reference implementations. For reference kernels, the appropriate header is tensorflow/lite/micro/kernels/depthwise_conv.h. I noticed that xtensa_depthwise_conv.h declares a DepthwiseConvReferenceEvalInt8 function, which may be the source of the confusion. However, there does not appear to be any implementation of this function, suggesting it may be leftover or legacy code.

Given this, I would suggest removing the DepthwiseConvReferenceEvalInt8 declaration from xtensa_depthwise_conv.h to avoid any future misunderstanding.

unmeshna017 · 2026-04-08T06:45:50Z

Hi @narrietal, apologies for the delayed response.

In the mapping layer, we are always passing the input data format as '0' which internally calls v2 API. Additionally, replacing current API call with the v2 version, the subsequent activation function call won't be required, as "v2" is a fused activation variant of the kernel.
In xtensa_depthwise_conv.h, your observation is correct; it is okay if we don't add the REF eval function declaration for INT16 precision.

Added 16x8 xtensa depthwise conv kernel

ed9c961

narrietal requested a review from a team as a code owner February 16, 2026 15:39

rameshkunasi added the ci:full Triggers the comprehensive cross-platform test suite. label Mar 4, 2026

rameshkunasi temporarily deployed to integration-test March 4, 2026 16:37 — with GitHub Actions Inactive

narrietal had a problem deploying to integration-test March 8, 2026 12:50 — with GitHub Actions Error

Added int16 support only for HIFI5

dac7d79

narrietal force-pushed the Add_16x8_deptwise_conv_xtensa_opt_kernel branch from 914c0dd to dac7d79 Compare March 8, 2026 13:00

narrietal temporarily deployed to integration-test March 8, 2026 13:00 — with GitHub Actions Inactive

Code formatting

0bfd8e9

narrietal temporarily deployed to integration-test March 16, 2026 18:10 — with GitHub Actions Inactive

veblush approved these changes Mar 18, 2026

View reviewed changes

veblush added this pull request to the merge queue Mar 18, 2026

Merged via the queue into tensorflow:main with commit f5302ed Mar 18, 2026
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 16bit xtensa depthwise conv kernel support#3481

Add 16bit xtensa depthwise conv kernel support#3481
veblush merged 3 commits intotensorflow:mainfrom
narrietal:Add_16x8_deptwise_conv_xtensa_opt_kernel

narrietal commented Feb 16, 2026 •

edited

Loading

Uh oh!

rameshkunasi commented Mar 4, 2026

Uh oh!

narrietal commented Mar 9, 2026

Uh oh!

narrietal commented Mar 16, 2026

Uh oh!

narrietal commented Mar 17, 2026

Uh oh!

rameshkunasi commented Mar 18, 2026

Uh oh!

Uh oh!

unmeshna017 commented Mar 23, 2026 •

edited

Loading

Uh oh!

rameshkunasi commented Mar 24, 2026

Uh oh!

narrietal commented Mar 24, 2026

Uh oh!

unmeshna017 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

narrietal commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rameshkunasi commented Mar 4, 2026

Uh oh!

narrietal commented Mar 9, 2026

Uh oh!

narrietal commented Mar 16, 2026

Uh oh!

narrietal commented Mar 17, 2026

Uh oh!

rameshkunasi commented Mar 18, 2026

Uh oh!

Uh oh!

unmeshna017 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rameshkunasi commented Mar 24, 2026

Uh oh!

narrietal commented Mar 24, 2026

Uh oh!

unmeshna017 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

narrietal commented Feb 16, 2026 •

edited

Loading

unmeshna017 commented Mar 23, 2026 •

edited

Loading