Skip to content

crypto: skcipher - per-request multi-data-unit batching#965

Open
blktests-ci[bot] wants to merge 3 commits into
linus-master_basefrom
series/1111636=>linus-master
Open

crypto: skcipher - per-request multi-data-unit batching#965
blktests-ci[bot] wants to merge 3 commits into
linus-master_basefrom
series/1111636=>linus-master

Conversation

@blktests-ci

@blktests-ci blktests-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

Pull request for series with
subject: crypto: skcipher - per-request multi-data-unit batching
version: 4
url: https://patchwork.kernel.org/project/linux-block/list/?series=1111636

Leonid Ravich added 3 commits June 15, 2026 12:24
Add a data_unit_size field to struct skcipher_request that lets a
caller submit several data units (typically 512..4096-byte sectors)
sharing one starting IV in a single request.  Algorithms derive each
data unit's IV from the caller-supplied IV by treating it as a
128-bit little-endian counter and adding the data-unit index, which
matches the layout produced by dm-crypt's plain64 IV mode and by
typical inline-encryption hardware.

This mirrors the data_unit_size concept already exposed by
struct blk_crypto_config for inline encryption.

The crypto API auto-splits a multi-data-unit request into per-DU
sub-requests when the underlying algorithm does not advertise
CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU (a type-specific cra_flags bit,
defined in crypto/internal/skcipher.h).  A consumer sets
data_unit_size and submits: a native driver handles all units in one
pass, otherwise the core splits transparently.  The split derives
per-DU IVs as a 128-bit LE counter, so this is correct only for
algorithms using that IV convention (e.g. XTS with plain64-style
IVs); callers are responsible for that match, as they already are for
the IV itself.

skcipher_request_set_tfm() resets the field to 0 so a request reused
from a pool or stack defaults to single-data-unit semantics; callers
that want batching set it explicitly via
skcipher_request_set_data_unit_size() after configuring the tfm.

crypto_skcipher_encrypt()/decrypt() call
crypto_skcipher_validate_multi_du() before any algorithm dispatch.
data_unit_size must be a power of two when non-zero (realistic sizes
are 512..4096, letting the per-DU loop and the cryptlen alignment
check use a mask instead of a divide) and cryptlen a positive
multiple of it; a malformed geometry is rejected with -EINVAL.  A
target that cannot do multi-DU - ivsize != SKCIPHER_MDU_IVSIZE (16),
an lskcipher, or an async algorithm without the native flag - is
rejected with -EOPNOTSUPP so a caller can fall back.  Async is
excluded because the splitter dispatches synchronously: an
-EINPROGRESS return would leave later units unsubmitted while the
driver still owned the request's scatterlists and IV.  The check
gates the native path too, so algorithms never see a malformed
multi-DU request.

No in-tree algorithm sets CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU yet;
subsequent patches add the testmgr coverage and the dm-crypt
consumer.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
Add a test that runs on every skcipher with ivsize == 16.  It
encrypts random plaintext two ways and compares:

  1. one batched request with skcipher_request_set_data_unit_size()
     set, over a deliberately fragmented scatterlist whose entries do
     not align to the data-unit size (so per-DU views cross SG entries
     and exercise the scatter_walk cursor), and
  2. an independent reference of N single-DU requests with IVs walked
     as a 128-bit LE counter, matching the convention documented in
     skcipher_request_set_data_unit_size().

The two must produce byte-identical ciphertext; this pins the IV
convention rather than only checking encrypt/decrypt symmetry.  The
batched ciphertext is then round-tripped back to plaintext, and the
caller IV is checked unchanged.  Iterates over typical data unit
sizes (512, 1024, 2048, 4096).

Algorithms the validator rejects for multi-DU return -EOPNOTSUPP on
the first call and skip cleanly; a genuine mismatch returns -EBADMSG
so it cannot be confused with a skip.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
Submit one skcipher request per bio with
skcipher_request_set_data_unit_size(req, cc->sector_size) instead of
issuing one request per sector.  This removes per-sector overhead in
the crypto API hot path: request allocation, callback dispatch,
completion handling, and SG setup.

The optimisation is enabled automatically at table load when all
of the following hold:

 - the cipher is non-aead (i.e. skcipher), sync, tfms_count 1;
 - the IV mode advertises sector_iv_le128, i.e. its per-sector IV
   advances as a 128-bit LE counter, matching the convention
   documented in skcipher_request_set_data_unit_size().  Only plain64
   sets it today (its 64-bit LE counter extends correctly); plain is
   excluded as its 32-bit counter wraps differently across a
   2^32-sector boundary;
 - ivsize is 16 (the core rejects other sizes with -EOPNOTSUPP);
 - the iv_gen_ops->post() hook is unset;
 - dm-integrity is not stacked (no integrity tag or integrity IV).

The cipher driver does not need to advertise anything: the crypto
API auto-splits multi-data-unit requests for drivers that cannot
handle them natively, so dm-crypt sees the same fast batched
submission contract regardless of the underlying driver.

A new CRYPT_MULTI_DATA_UNIT cipher_flag, set once at construction
time, gates the multi-data-unit dispatch.  The existing per-sector
path in crypt_convert_block_skcipher() is unchanged; the new
crypt_convert_block_skcipher_multi() is reached from a small
dispatch in crypt_convert() and shares the same backlog/-EBUSY/
-EINPROGRESS flow control with the per-sector path.

Heap-allocated scatterlists are stashed in dm_crypt_request and
freed in crypt_free_req_skcipher() to avoid races between the
synchronous-success free path and async-completion reuse from the
request pool.  On scatterlist allocation failure the helper returns
-EAGAIN, and the core returns -EOPNOTSUPP if a driver turns out
unable to do multi-DU; crypt_convert() handles both by clearing its
local multi_du flag and falling back to the per-sector path for the
rest of the current crypt_convert() invocation, ensuring forward progress
on the swap-out-to-dm-crypt path even under total memory exhaustion
(the per-sector path uses only cc->req_pool, a mempool with
reservoir set up at table-load time, and the inline
dmreq->sg_in[]/sg_out[] arrays — no allocation that could fail).

Verified end-to-end with a byte-equivalence test: encrypted output
of plain64 dm-crypt with the multi-data-unit path matches output of
the single-data-unit path bit-for-bit over a 256 MB device, with
xts-aes-aesni driving the auto-split path.

Signed-off-by: Leonid Ravich <lravich@amazon.com>
@blktests-ci

blktests-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Author

Upstream branch: 062871f
series: https://patchwork.kernel.org/project/linux-block/list/?series=1111636
version: 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants