From b6c22b4fb11e820e2aa763232be64a5eee58f54a Mon Sep 17 00:00:00 2001
From: manu-sj <manu.joseph@logicalclocks.com>
Date: Sun, 17 May 2026 02:18:46 +0200
Subject: [PATCH 1/4] FSTORE-1938: Document chained Transformation Functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Add Chained Transformation Functions section to the main TF user guide.
- Add chained examples to the model-dependent and on-demand transformation
  pages.
- New Transformation Functions — Performance Tuning page covering n_processes
  semantics, auto-vectorize behavior, warmup_online_workers, the thread
  safety scope, and a benchmark-derived latency reference.
- Migration guide entry covering the stricter input type validation, the
  deprecated transformation_functions= alias, and the auto-vectorize edge
  case opt-out.
- mkdocs.yml: register the new performance tuning page in the nav so it is
  reachable from the site.
---
 .../on_demand_transformations.md              | 36 ++++++++++
 .../model-dependent-transformations.md        | 36 ++++++++++
 .../fs/transformation_functions.md            | 43 ++++++++++++
 .../transformation_functions_performance.md   | 70 +++++++++++++++++++
 docs/user_guides/migration/40_migration.md    | 19 +++++
 mkdocs.yml                                    |  1 +
 6 files changed, 205 insertions(+)
 create mode 100644 docs/user_guides/fs/transformation_functions_performance.md

diff --git a/docs/user_guides/fs/feature_group/on_demand_transformations.md b/docs/user_guides/fs/feature_group/on_demand_transformations.md
index 9eadc8c45c..4a18d8685d 100644
--- a/docs/user_guides/fs/feature_group/on_demand_transformations.md
+++ b/docs/user_guides/fs/feature_group/on_demand_transformations.md
@@ -270,3 +270,39 @@ On-demand transformation functions can also be accessed and executed as normal f
             "on_demand_feature1"
         ](feature_vector["transaction_time"], datetime.now())
         ```
+
+## Chaining On-Demand Transformations
+
+On-demand transformations attached to the same feature group can be chained — one ODT's output column can serve as another ODT's input.
+The execution order is resolved automatically; the DAG is visible from the feature group overview page in the Hopsworks UI.
+
+An ODT's output column becomes a regular feature in the feature group, which a downstream feature view can consume and pass into a model-dependent transformation.
+This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to configure on either side.
+
+!!! example "ODT that consumes an upstream ODT's output"
+    === "Python"
+
+        ```python
+        from hopsworks.hsfs.hopsworks_udf import udf
+
+
+        @udf(int)
+        def add_one(col):
+            return col + 1
+
+
+        @udf(int)
+        def double(col):
+            return col * 2
+
+
+        fg = fs.create_feature_group(
+            name="chained_odt_fg",
+            version=1,
+            primary_key=["id"],
+            transformation_functions=[
+                add_one("raw").alias("raw_plus_one"),
+                double("raw_plus_one").alias("raw_plus_one_doubled"),
+            ],
+        )
+        ```
diff --git a/docs/user_guides/fs/feature_view/model-dependent-transformations.md b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
index bed6b1137a..7eaf691b7d 100644
--- a/docs/user_guides/fs/feature_view/model-dependent-transformations.md
+++ b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
@@ -175,3 +175,39 @@ To achieve this, set the `transform` parameter to False.
         # Fetching untransformed batch data.
         untransformed_batch_data = feature_view.get_batch_data(transform=False)
         ```
+
+## Chaining Model-Dependent Transformations
+
+A model-dependent transformation can consume another MDT's output as its input.
+The DAG is resolved automatically at execution time, so producers always run before consumers.
+
+!!! example "Chaining two normalizers and a sum"
+    === "Python"
+
+        ```python
+        from hopsworks.hsfs.hopsworks_udf import udf
+
+
+        @udf(int)
+        def add_one(col):
+            return col + 1
+
+
+        @udf(int)
+        def add(a, b):
+            return a + b
+
+
+        fv = fs.create_feature_view(
+            name="chained_mdt_fv",
+            query=fg.select_all(),
+            transformation_functions=[
+                add_one("data1").alias("data1_plus_one"),
+                add_one("data2").alias("data2_plus_one"),
+                add("data1_plus_one", "data2_plus_one").alias("sum_plus_two"),
+            ],
+            version=1,
+        )
+        ```
+
+See [Transformation Functions — Performance Tuning][transformation-functions-performance-tuning] for `n_processes` semantics on chained DAGs.
diff --git a/docs/user_guides/fs/transformation_functions.md b/docs/user_guides/fs/transformation_functions.md
index 4e2487f3dd..d96168f938 100644
--- a/docs/user_guides/fs/transformation_functions.md
+++ b/docs/user_guides/fs/transformation_functions.md
@@ -345,3 +345,46 @@ If only the `name` is provided, then the version will default to 1.
 ## Using transformation functions
 
 Transformation functions can be used by attaching it to a feature view to [create model-dependent transformations](./feature_view/model-dependent-transformations.md) or attached to feature groups to  [create on-demand transformations](./feature_group/on_demand_transformations.md)
+
+## Chained Transformation Functions
+
+Transformation functions can be chained — the output column of one transformation function can serve as the input to another.
+Hopsworks resolves the execution order automatically using a topological sort of the resulting DAG, so dependencies always run before their consumers.
+Chaining works for both on-demand transformations attached to a feature group and model-dependent transformations attached to a feature view.
+
+!!! example "Chained MDTs on a feature view"
+    === "Python"
+
+        ```python
+        from hopsworks.hsfs.hopsworks_udf import udf
+
+
+        @udf(int)
+        def add_one(col):
+            return col + 1
+
+
+        @udf(int)
+        def add(a, b):
+            return a + b
+
+
+        fv = fs.create_feature_view(
+            name="chained_mdts_fv",
+            query=fg.select_all(),
+            transformation_functions=[
+                add_one("data1").alias("data1_plus_one"),
+                add_one("data2").alias("data2_plus_one"),
+                add("data1_plus_one", "data2_plus_one").alias("sum_plus_two"),
+            ],
+            version=1,
+        )
+        ```
+
+The DAG is visible from the Hopsworks UI on both the feature view and feature group overview pages under "Transformation execution DAG."
+The same DAG drives offline training data generation and online feature vector retrieval, so chains apply uniformly across both inference paths.
+
+Cross-DAG chaining is implicit: an on-demand transformation's output column becomes a feature in its feature group, which a feature view can consume and feed into a model-dependent transformation.
+No additional setup is required.
+
+For tuning parallelism, see [Transformation Functions — Performance Tuning][transformation-functions-performance-tuning].
diff --git a/docs/user_guides/fs/transformation_functions_performance.md b/docs/user_guides/fs/transformation_functions_performance.md
new file mode 100644
index 0000000000..400a6dd535
--- /dev/null
+++ b/docs/user_guides/fs/transformation_functions_performance.md
@@ -0,0 +1,70 @@
+# Transformation Functions — Performance Tuning
+
+This page covers how to tune transformation function execution for offline and online workloads, when to set `n_processes`, and the latency trade-offs of the different paths.
+
+## When parallelism helps
+
+Transformation function execution is sequential by default.
+A worker pool is spawned only when the workload justifies the overhead — small DAGs and small inputs run faster sequentially because pool spawn and shared-memory setup cost more than the work itself.
+
+Hopsworks applies these defaults when `n_processes` is not provided:
+
+- `dict` or `list[dict]` input (single online vector, online batch): sequential.
+- DataFrame input with fewer than 10 000 rows or fewer than two TFs in the chain: sequential.
+- DataFrame input large enough or chain wide enough: parallel, capped at the DAG's maximum width.
+- Spark DataFrames: ignored; the DAG is pushed down to Spark.
+
+Callers can always force a specific value by passing `n_processes` explicitly.
+
+## Online single vector with `n_processes`
+
+`get_feature_vector(entry)` and `get_feature_vectors(entries)` accept `n_processes`.
+The interpretation depends on the call shape and the declared UDF execution mode:
+
+- `get_feature_vector(entry)` with `n_processes >= 2`: DAG-node parallelism. Independent transformations within the chain run in parallel. If the DAG is purely linear (no fan-out), the engine logs a DEBUG message and falls through to the sequential path — there is nothing to parallelize.
+- `get_feature_vectors(entries)` with all Pandas-mode UDFs: the engine auto-vectorizes — `list[dict]` rows are converted to a small DataFrame and routed through the dataframe path so each UDF receives a Series rather than scalar values. Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to opt out.
+- `get_feature_vectors(entries)` with any Python-mode UDF in the chain: rows are chunked across `n_processes` workers, each worker running the chain sequentially on its slice.
+
+## Warming the worker pool
+
+The first call that requires the pool pays process spawn and engine-init latency (tens to hundreds of milliseconds depending on platform and imported modules).
+Pre-spawn the pool at deployment startup to avoid that latency on the first user request:
+
+```python
+from hopsworks.hsfs.core.transformation_function_engine import (
+    TransformationFunctionEngine,
+)
+
+
+TransformationFunctionEngine.warmup_online_workers(n_processes=4)
+```
+
+After a worker failure or `BrokenProcessPool` the engine resets the pool so the next call rebuilds a fresh one — call `warmup_online_workers` again if you need to amortize the spawn cost across requests.
+
+## Thread safety
+
+The Hopsworks Python client is not currently thread-safe.
+Do not share `FeatureView` or `FeatureGroup` instances across threads when calling `get_feature_vector(s)` with `n_processes >= 2`.
+Use one client per thread, or serialize access.
+This is a known limitation tracked separately.
+
+## Latency reference
+
+The chained-TF online latency benchmark in the `loadtest` repository records p50/p95/p99 across UDF styles and call shapes.
+Absolute latency depends strongly on host CPU and UDF cost, so this guide does not publish fixed numbers — running the benchmark against your own deployment gives a meaningful baseline.
+
+Run the benchmark with:
+
+```bash
+pytest -m e2e_performance \
+  tests/performance/feature_store/python_driver/test_online_batch_chaining_benchmark.py
+```
+
+The benchmark sweeps four UDF styles (vectorized Pandas, fake Pandas with an internal loop, scalar Python, and CPU-heavy Python), two batch sizes (1 and 100), and two `n_processes` values (1 and 2).
+It writes `online_batch_chaining_benchmark.csv` with p50/p95/p99 for each cell.
+Consult that CSV when tuning `n_processes` for your deployment.
+
+Two patterns hold across deployments and are worth knowing up front:
+
+- For vectorized Pandas UDFs at small batch sizes, sequential is at least as fast as parallel — the framework auto-vectorize wins, pool overhead loses.
+- For CPU-heavy Python UDFs in a chain with parallelism (diamond DAGs, multi-output, batch sizes above ~10), `n_processes >= 2` typically delivers a meaningful p99 reduction.
diff --git a/docs/user_guides/migration/40_migration.md b/docs/user_guides/migration/40_migration.md
index 58268d31f6..4a7d01c556 100644
--- a/docs/user_guides/migration/40_migration.md
+++ b/docs/user_guides/migration/40_migration.md
@@ -110,3 +110,22 @@ The following is how transformation functions were used in previous versions of
     ```
 
 Note that the number of lines of code required has been significantly reduced using the “@hopsworks.udf” python decorator.
+
+## FSTORE-1938 — Chained Transformation Functions
+
+This release adds support for chained transformation functions: a transformation function's output column can serve as the input to another transformation function in the same feature group or feature view.
+The DAG is resolved automatically by topological sort at execution time.
+
+The following behaviors changed and may affect existing pipelines:
+
+- Stricter validation of transformation function input types on feature view create or update.
+  Pre-FSTORE-1938 versions stored an empty string when an input feature's type could not be resolved.
+  This release validates types strictly on create and update and raises `TRANSFORMATION_FUNCTION_INPUT_TYPE_UNRESOLVABLE` if a typoed or missing feature reference produces an empty type.
+  Read paths continue to tolerate empty types so existing detail pages do not break on upgrade.
+- The internal `transformation_functions=` keyword argument on `TransformationFunctionEngine.apply_transformation_functions` is deprecated in favor of `execution_graph=`.
+  The old name still works and emits a `DeprecationWarning`; passing both at once raises `FeatureStoreException`.
+- Rare edge case for auto-vectorization on `get_feature_vectors`: a Pandas-declared UDF that branches on `len(series)` may behave differently when the framework converts a single-row batch into a 1-element Series.
+  Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to keep the previous per-scalar behavior for affected UDFs.
+
+No minimum backend version is required for non-chained usage — existing feature views without chains continue to work unchanged after upgrade.
+Creating a chained feature view requires both backend and SDK on this release.
diff --git a/mkdocs.yml b/mkdocs.yml
index f8dc7bf80f..fee553fd8c 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -113,6 +113,7 @@ nav:
               - Feature Logging: user_guides/fs/feature_view/feature_logging.md
           - Vector Similarity Search: user_guides/fs/vector_similarity_search.md
           - Transformation Functions: user_guides/fs/transformation_functions.md
+          - Transformation Functions — Performance Tuning: user_guides/fs/transformation_functions_performance.md
           - Compute Engines: user_guides/fs/compute_engines.md
           - Client Integrations:
               - user_guides/integrations/index.md

From 3fbfd43e0dd07a20615bf4314ecab6c02e66a2bc Mon Sep 17 00:00:00 2001
From: manu-sj <manu.joseph@logicalclocks.com>
Date: Mon, 18 May 2026 22:27:40 +0200
Subject: [PATCH 2/4] fixes and improvements

---
 .../on_demand_transformations.md              | 11 +++-
 .../model-dependent-transformations.md        | 11 +++-
 .../fs/transformation_functions.md            |  7 ++-
 .../transformation_functions_performance.md   | 58 ++++++++-----------
 docs/user_guides/migration/40_migration.md    |  6 +-
 mkdocs.yml                                    |  2 +-
 6 files changed, 51 insertions(+), 44 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/on_demand_transformations.md b/docs/user_guides/fs/feature_group/on_demand_transformations.md
index 4a18d8685d..28558dd8d8 100644
--- a/docs/user_guides/fs/feature_group/on_demand_transformations.md
+++ b/docs/user_guides/fs/feature_group/on_demand_transformations.md
@@ -43,6 +43,7 @@ If no feature names are provided, the transformation function will default to us
             event_time="event_time",
             transformation_functions=[transaction_age, stripped_strings],
         )
+
         ```
 
 ### Specifying input features
@@ -64,6 +65,7 @@ The features to be used by the on-demand transformation function can be specifie
                 age_transaction("transaction_time", "current_time")
             ],
         )
+
         ```
 
 ## Usage
@@ -103,6 +105,7 @@ These on-demand features are equivalent to regular features, and [model-dependen
                 min_max_scaler("on_demand_feature3"),
             ],
         )
+
         ```
 
 ### Computing on-demand features
@@ -134,6 +137,7 @@ The on-demand features in the feature vector can be computed using real-time dat
                 "current_time": datetime.now(),
             },
         )
+
         ```
 
 #### Retrieving feature vectors
@@ -168,6 +172,7 @@ The `request_parameter` in this case, can be a list of dictionaries that specifi
                 "current_time": datetime.now(),
             },
         )
+
         ```
 
 #### Retrieving feature vector without on-demand features
@@ -185,6 +190,7 @@ To achieve this, set the  parameters `transform` and `on_demand_features` to `Fa
         untransformed_feature_vectors = feature_view.get_feature_vectors(
             entry=[{"id": 1}, {"id": 2}], transform=False, on_demand_features=False
         )
+
         ```
 
 #### Compute all on-demand features
@@ -247,6 +253,7 @@ The `request_parameter` in this case, can be a list of dictionaries that specifi
 
         # Applying model dependent transformations
         encoded_feature_vector = fv.transform(feature_vectors_with_on_demand_features)
+
         ```
 
 #### Compute one on-demand feature
@@ -269,6 +276,7 @@ On-demand transformation functions can also be accessed and executed as normal f
         feature_vector["on_demand_feature1"] = fv.on_demand_transformations[
             "on_demand_feature1"
         ](feature_vector["transaction_time"], datetime.now())
+
         ```
 
 ## Chaining On-Demand Transformations
@@ -283,7 +291,7 @@ This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to
     === "Python"
 
         ```python
-        from hopsworks.hsfs.hopsworks_udf import udf
+        from hopsworks import udf
 
 
         @udf(int)
@@ -305,4 +313,5 @@ This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to
                 double("raw_plus_one").alias("raw_plus_one_doubled"),
             ],
         )
+
         ```
diff --git a/docs/user_guides/fs/feature_view/model-dependent-transformations.md b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
index 7eaf691b7d..77f8237664 100644
--- a/docs/user_guides/fs/feature_view/model-dependent-transformations.md
+++ b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
@@ -54,6 +54,7 @@ Additionally, Hopsworks also allows users to specify custom names for transforme
             labels=["fraud_label"],
             transformation_functions=[add_two, add_one_multiple],
         )
+
         ```
 
 ### Specifying input features
@@ -74,6 +75,7 @@ The features to be used by a model-dependent transformation function can be spec
                 add_one_multiple("feature_5", "feature_6", "feature_7"),
             ],
         )
+
         ```
 
 ### Using built-in transformations
@@ -101,6 +103,7 @@ The only difference is that they can either be retrieved from the Hopsworks or i
                 standard_scaler("age_at_transaction"),
             ],
         )
+
         ```
 
 To attach built-in transformation functions from the `hopsworks` module they can be directly imported into the code from `hopsworks.builtin_transformations`.
@@ -127,6 +130,7 @@ To attach built-in transformation functions from the `hopsworks` module they can
                 standard_scaler("age_at_transaction"),
             ],
         )
+
         ```
 
 ## Using Model Dependent Transformations
@@ -151,6 +155,7 @@ Model-dependent transformation functions can also be manually applied to a featu
 
         # Apply Model Dependent transformations
         encoded_feature_vector = fv.transform(feature_vector)
+
         ```
 
 ### Retrieving untransformed feature vector and batch inference data
@@ -174,6 +179,7 @@ To achieve this, set the `transform` parameter to False.
 
         # Fetching untransformed batch data.
         untransformed_batch_data = feature_view.get_batch_data(transform=False)
+
         ```
 
 ## Chaining Model-Dependent Transformations
@@ -185,7 +191,7 @@ The DAG is resolved automatically at execution time, so producers always run bef
     === "Python"
 
         ```python
-        from hopsworks.hsfs.hopsworks_udf import udf
+        from hopsworks import udf
 
 
         @udf(int)
@@ -208,6 +214,7 @@ The DAG is resolved automatically at execution time, so producers always run bef
             ],
             version=1,
         )
+
         ```
 
-See [Transformation Functions — Performance Tuning][transformation-functions-performance-tuning] for `n_processes` semantics on chained DAGs.
+See [Transformation Functions Performance Tuning][transformation-functions-performance-tuning] for `n_processes` semantics on chained DAGs.
diff --git a/docs/user_guides/fs/transformation_functions.md b/docs/user_guides/fs/transformation_functions.md
index d96168f938..471cf41bf2 100644
--- a/docs/user_guides/fs/transformation_functions.md
+++ b/docs/user_guides/fs/transformation_functions.md
@@ -319,6 +319,7 @@ The save function will throw an error if another transformation function with th
             transformation_function=add_one, version=1
         )
         plus_one_meta.save()
+
         ```
 
 ## Retrieval from the Feature Store
@@ -340,6 +341,7 @@ If only the `name` is provided, then the version will default to 1.
 
         # get transformation function by name and version.
         plus_one_fn = fs.get_transformation_function(name="plus_one", version=2)
+
         ```
 
 ## Using transformation functions
@@ -356,7 +358,7 @@ Chaining works for both on-demand transformations attached to a feature group an
     === "Python"
 
         ```python
-        from hopsworks.hsfs.hopsworks_udf import udf
+        from hopsworks import udf
 
 
         @udf(int)
@@ -379,6 +381,7 @@ Chaining works for both on-demand transformations attached to a feature group an
             ],
             version=1,
         )
+
         ```
 
 The DAG is visible from the Hopsworks UI on both the feature view and feature group overview pages under "Transformation execution DAG."
@@ -387,4 +390,4 @@ The same DAG drives offline training data generation and online feature vector r
 Cross-DAG chaining is implicit: an on-demand transformation's output column becomes a feature in its feature group, which a feature view can consume and feed into a model-dependent transformation.
 No additional setup is required.
 
-For tuning parallelism, see [Transformation Functions — Performance Tuning][transformation-functions-performance-tuning].
+For tuning parallelism, see [Transformation Functions Performance Tuning][transformation-functions-performance-tuning].
diff --git a/docs/user_guides/fs/transformation_functions_performance.md b/docs/user_guides/fs/transformation_functions_performance.md
index 400a6dd535..5f1537dcb5 100644
--- a/docs/user_guides/fs/transformation_functions_performance.md
+++ b/docs/user_guides/fs/transformation_functions_performance.md
@@ -1,4 +1,4 @@
-# Transformation Functions — Performance Tuning
+# Transformation Functions Performance Tuning
 
 This page covers how to tune transformation function execution for offline and online workloads, when to set `n_processes`, and the latency trade-offs of the different paths.
 
@@ -9,48 +9,34 @@ A worker pool is spawned only when the workload justifies the overhead — small
 
 Hopsworks applies these defaults when `n_processes` is not provided:
 
-- `dict` or `list[dict]` input (single online vector, online batch): sequential.
-- DataFrame input with fewer than 10 000 rows or fewer than two TFs in the chain: sequential.
+- DataFrame input with fewer than 10000 rows or fewer than two TFs in the chain: sequential.
 - DataFrame input large enough or chain wide enough: parallel, capped at the DAG's maximum width.
-- Spark DataFrames: ignored; the DAG is pushed down to Spark.
+- `dict` or `list[dict]` input (online vector and batch paths): sequential by default.
+  Per-row UDF work is usually cheaper than process-pool overhead, so the engine keeps the safe default and lets the caller opt in.
+- Spark DataFrames: `n_processes` is ignored; the DAG is pushed down to Spark.
 
 Callers can always force a specific value by passing `n_processes` explicitly.
 
-## Online single vector with `n_processes`
+## Where `n_processes` takes effect
 
-`get_feature_vector(entry)` and `get_feature_vectors(entries)` accept `n_processes`.
-The interpretation depends on the call shape and the declared UDF execution mode:
+`n_processes` is honored on every Python-engine entry point — DataFrames, single dicts, and lists of dicts — but the parallelism axis differs by input shape:
 
-- `get_feature_vector(entry)` with `n_processes >= 2`: DAG-node parallelism. Independent transformations within the chain run in parallel. If the DAG is purely linear (no fan-out), the engine logs a DEBUG message and falls through to the sequential path — there is nothing to parallelize.
-- `get_feature_vectors(entries)` with all Pandas-mode UDFs: the engine auto-vectorizes — `list[dict]` rows are converted to a small DataFrame and routed through the dataframe path so each UDF receives a Series rather than scalar values. Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to opt out.
-- `get_feature_vectors(entries)` with any Python-mode UDF in the chain: rows are chunked across `n_processes` workers, each worker running the chain sequentially on its slice.
+- **DataFrame input.**
+  Independent TFs in the DAG run concurrently in the process pool; dependent TFs are submitted as soon as their predecessors complete.
+  This is the offline path used by `training_data`, `get_batch_data`, and `execute_mdts(dataframe)`.
+- **Single dict (`get_feature_vector(entry)`).**
+  When the chain has independent branches (`max_parallelism >= 2`), the engine submits those branches concurrently for the single row.
+  A strictly linear chain has nothing to parallelize and the call falls through to the sequential path.
+- **List of dicts (`get_feature_vectors(entries)`).**
+  Rows are chunked across workers; each worker runs the chain sequentially on its slice.
+  This is the most useful parallelism axis for batched online calls with CPU-heavy chained UDFs.
 
-## Warming the worker pool
-
-The first call that requires the pool pays process spawn and engine-init latency (tens to hundreds of milliseconds depending on platform and imported modules).
-Pre-spawn the pool at deployment startup to avoid that latency on the first user request:
-
-```python
-from hopsworks.hsfs.core.transformation_function_engine import (
-    TransformationFunctionEngine,
-)
-
-
-TransformationFunctionEngine.warmup_online_workers(n_processes=4)
-```
-
-After a worker failure or `BrokenProcessPool` the engine resets the pool so the next call rebuilds a fresh one — call `warmup_online_workers` again if you need to amortize the spawn cost across requests.
-
-## Thread safety
-
-The Hopsworks Python client is not currently thread-safe.
-Do not share `FeatureView` or `FeatureGroup` instances across threads when calling `get_feature_vector(s)` with `n_processes >= 2`.
-Use one client per thread, or serialize access.
-This is a known limitation tracked separately.
+For `get_feature_vectors(entries)` with all-Pandas-mode UDFs, the engine auto-vectorizes the batch into a DataFrame so the DataFrame path applies instead.
+Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to opt out and force the per-row dict path.
 
 ## Latency reference
 
-The chained-TF online latency benchmark in the `loadtest` repository records p50/p95/p99 across UDF styles and call shapes.
+The chained-TF online latency benchmark in the `loadtest` repository records p50, p95, and p99 across UDF styles and call shapes.
 Absolute latency depends strongly on host CPU and UDF cost, so this guide does not publish fixed numbers — running the benchmark against your own deployment gives a meaningful baseline.
 
 Run the benchmark with:
@@ -61,10 +47,12 @@ pytest -m e2e_performance \
 ```
 
 The benchmark sweeps four UDF styles (vectorized Pandas, fake Pandas with an internal loop, scalar Python, and CPU-heavy Python), two batch sizes (1 and 100), and two `n_processes` values (1 and 2).
-It writes `online_batch_chaining_benchmark.csv` with p50/p95/p99 for each cell.
+It writes `online_batch_chaining_benchmark.csv` with p50, p95, and p99 for each cell.
 Consult that CSV when tuning `n_processes` for your deployment.
 
 Two patterns hold across deployments and are worth knowing up front:
 
-- For vectorized Pandas UDFs at small batch sizes, sequential is at least as fast as parallel — the framework auto-vectorize wins, pool overhead loses.
+- For vectorized Pandas UDFs at small batch sizes, sequential is at least as fast as parallel.
+  The framework's auto-vectorization wins; the pool overhead loses.
 - For CPU-heavy Python UDFs in a chain with parallelism (diamond DAGs, multi-output, batch sizes above ~10), `n_processes >= 2` typically delivers a meaningful p99 reduction.
+  For a chained single-vector call with two or more independent branches, setting `n_processes=2` overlaps those branches and reduces tail latency on CPU-heavy UDFs; if the branches dominate the chain runtime, the speedup approaches their concurrency.
diff --git a/docs/user_guides/migration/40_migration.md b/docs/user_guides/migration/40_migration.md
index 4a7d01c556..3ce44403b3 100644
--- a/docs/user_guides/migration/40_migration.md
+++ b/docs/user_guides/migration/40_migration.md
@@ -81,6 +81,7 @@ The following is how transformation functions were used in previous versions of
         },
         labels=["target"],
     )
+
     ```
 
 === "4.0"
@@ -107,6 +108,7 @@ The following is how transformation functions were used in previous versions of
         ],
         labels=["target"],
     )
+
     ```
 
 Note that the number of lines of code required has been significantly reduced using the “@hopsworks.udf” python decorator.
@@ -118,12 +120,10 @@ The DAG is resolved automatically by topological sort at execution time.
 
 The following behaviors changed and may affect existing pipelines:
 
-- Stricter validation of transformation function input types on feature view create or update.
+- Stricter validation of transformation function input types on feature view and feature group create or update.
   Pre-FSTORE-1938 versions stored an empty string when an input feature's type could not be resolved.
   This release validates types strictly on create and update and raises `TRANSFORMATION_FUNCTION_INPUT_TYPE_UNRESOLVABLE` if a typoed or missing feature reference produces an empty type.
   Read paths continue to tolerate empty types so existing detail pages do not break on upgrade.
-- The internal `transformation_functions=` keyword argument on `TransformationFunctionEngine.apply_transformation_functions` is deprecated in favor of `execution_graph=`.
-  The old name still works and emits a `DeprecationWarning`; passing both at once raises `FeatureStoreException`.
 - Rare edge case for auto-vectorization on `get_feature_vectors`: a Pandas-declared UDF that branches on `len(series)` may behave differently when the framework converts a single-row batch into a 1-element Series.
   Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to keep the previous per-scalar behavior for affected UDFs.
 
diff --git a/mkdocs.yml b/mkdocs.yml
index fee553fd8c..408a57de7e 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -113,7 +113,7 @@ nav:
               - Feature Logging: user_guides/fs/feature_view/feature_logging.md
           - Vector Similarity Search: user_guides/fs/vector_similarity_search.md
           - Transformation Functions: user_guides/fs/transformation_functions.md
-          - Transformation Functions — Performance Tuning: user_guides/fs/transformation_functions_performance.md
+          - Transformation Functions Performance Tuning: user_guides/fs/transformation_functions_performance.md
           - Compute Engines: user_guides/fs/compute_engines.md
           - Client Integrations:
               - user_guides/integrations/index.md

From a984112d25f67b81d9c39f1a347a10da1b6966ea Mon Sep 17 00:00:00 2001
From: manu-sj <manu.joseph@logicalclocks.com>
Date: Thu, 21 May 2026 15:14:40 +0200
Subject: [PATCH 3/4] [FSTORE-1938] snakeoil: strip trailing blanks inside
 Python code blocks

`hopsworks-docs snakeoil` strips trailing blank lines inside fenced
Python code blocks after running the formatter, then checks the
docs tree is clean. Drops the stray blank line before the closing
fence in four of the FSTORE-1938 docs additions so the check
passes. No content change.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../fs/feature_group/on_demand_transformations.md        | 9 ---------
 .../fs/feature_view/model-dependent-transformations.md   | 7 -------
 docs/user_guides/fs/transformation_functions.md          | 3 ---
 docs/user_guides/migration/40_migration.md               | 2 --
 4 files changed, 21 deletions(-)

diff --git a/docs/user_guides/fs/feature_group/on_demand_transformations.md b/docs/user_guides/fs/feature_group/on_demand_transformations.md
index 28558dd8d8..9e883e86af 100644
--- a/docs/user_guides/fs/feature_group/on_demand_transformations.md
+++ b/docs/user_guides/fs/feature_group/on_demand_transformations.md
@@ -43,7 +43,6 @@ If no feature names are provided, the transformation function will default to us
             event_time="event_time",
             transformation_functions=[transaction_age, stripped_strings],
         )
-
         ```
 
 ### Specifying input features
@@ -65,7 +64,6 @@ The features to be used by the on-demand transformation function can be specifie
                 age_transaction("transaction_time", "current_time")
             ],
         )
-
         ```
 
 ## Usage
@@ -105,7 +103,6 @@ These on-demand features are equivalent to regular features, and [model-dependen
                 min_max_scaler("on_demand_feature3"),
             ],
         )
-
         ```
 
 ### Computing on-demand features
@@ -137,7 +134,6 @@ The on-demand features in the feature vector can be computed using real-time dat
                 "current_time": datetime.now(),
             },
         )
-
         ```
 
 #### Retrieving feature vectors
@@ -172,7 +168,6 @@ The `request_parameter` in this case, can be a list of dictionaries that specifi
                 "current_time": datetime.now(),
             },
         )
-
         ```
 
 #### Retrieving feature vector without on-demand features
@@ -190,7 +185,6 @@ To achieve this, set the  parameters `transform` and `on_demand_features` to `Fa
         untransformed_feature_vectors = feature_view.get_feature_vectors(
             entry=[{"id": 1}, {"id": 2}], transform=False, on_demand_features=False
         )
-
         ```
 
 #### Compute all on-demand features
@@ -253,7 +247,6 @@ The `request_parameter` in this case, can be a list of dictionaries that specifi
 
         # Applying model dependent transformations
         encoded_feature_vector = fv.transform(feature_vectors_with_on_demand_features)
-
         ```
 
 #### Compute one on-demand feature
@@ -276,7 +269,6 @@ On-demand transformation functions can also be accessed and executed as normal f
         feature_vector["on_demand_feature1"] = fv.on_demand_transformations[
             "on_demand_feature1"
         ](feature_vector["transaction_time"], datetime.now())
-
         ```
 
 ## Chaining On-Demand Transformations
@@ -313,5 +305,4 @@ This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to
                 double("raw_plus_one").alias("raw_plus_one_doubled"),
             ],
         )
-
         ```
diff --git a/docs/user_guides/fs/feature_view/model-dependent-transformations.md b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
index 77f8237664..1e0211507c 100644
--- a/docs/user_guides/fs/feature_view/model-dependent-transformations.md
+++ b/docs/user_guides/fs/feature_view/model-dependent-transformations.md
@@ -54,7 +54,6 @@ Additionally, Hopsworks also allows users to specify custom names for transforme
             labels=["fraud_label"],
             transformation_functions=[add_two, add_one_multiple],
         )
-
         ```
 
 ### Specifying input features
@@ -75,7 +74,6 @@ The features to be used by a model-dependent transformation function can be spec
                 add_one_multiple("feature_5", "feature_6", "feature_7"),
             ],
         )
-
         ```
 
 ### Using built-in transformations
@@ -103,7 +101,6 @@ The only difference is that they can either be retrieved from the Hopsworks or i
                 standard_scaler("age_at_transaction"),
             ],
         )
-
         ```
 
 To attach built-in transformation functions from the `hopsworks` module they can be directly imported into the code from `hopsworks.builtin_transformations`.
@@ -130,7 +127,6 @@ To attach built-in transformation functions from the `hopsworks` module they can
                 standard_scaler("age_at_transaction"),
             ],
         )
-
         ```
 
 ## Using Model Dependent Transformations
@@ -155,7 +151,6 @@ Model-dependent transformation functions can also be manually applied to a featu
 
         # Apply Model Dependent transformations
         encoded_feature_vector = fv.transform(feature_vector)
-
         ```
 
 ### Retrieving untransformed feature vector and batch inference data
@@ -179,7 +174,6 @@ To achieve this, set the `transform` parameter to False.
 
         # Fetching untransformed batch data.
         untransformed_batch_data = feature_view.get_batch_data(transform=False)
-
         ```
 
 ## Chaining Model-Dependent Transformations
@@ -214,7 +208,6 @@ The DAG is resolved automatically at execution time, so producers always run bef
             ],
             version=1,
         )
-
         ```
 
 See [Transformation Functions Performance Tuning][transformation-functions-performance-tuning] for `n_processes` semantics on chained DAGs.
diff --git a/docs/user_guides/fs/transformation_functions.md b/docs/user_guides/fs/transformation_functions.md
index 471cf41bf2..25332f48d4 100644
--- a/docs/user_guides/fs/transformation_functions.md
+++ b/docs/user_guides/fs/transformation_functions.md
@@ -319,7 +319,6 @@ The save function will throw an error if another transformation function with th
             transformation_function=add_one, version=1
         )
         plus_one_meta.save()
-
         ```
 
 ## Retrieval from the Feature Store
@@ -341,7 +340,6 @@ If only the `name` is provided, then the version will default to 1.
 
         # get transformation function by name and version.
         plus_one_fn = fs.get_transformation_function(name="plus_one", version=2)
-
         ```
 
 ## Using transformation functions
@@ -381,7 +379,6 @@ Chaining works for both on-demand transformations attached to a feature group an
             ],
             version=1,
         )
-
         ```
 
 The DAG is visible from the Hopsworks UI on both the feature view and feature group overview pages under "Transformation execution DAG."
diff --git a/docs/user_guides/migration/40_migration.md b/docs/user_guides/migration/40_migration.md
index 3ce44403b3..23e7352b4e 100644
--- a/docs/user_guides/migration/40_migration.md
+++ b/docs/user_guides/migration/40_migration.md
@@ -81,7 +81,6 @@ The following is how transformation functions were used in previous versions of
         },
         labels=["target"],
     )
-
     ```
 
 === "4.0"
@@ -108,7 +107,6 @@ The following is how transformation functions were used in previous versions of
         ],
         labels=["target"],
     )
-
     ```
 
 Note that the number of lines of code required has been significantly reduced using the “@hopsworks.udf” python decorator.

From b7700500dfd4aa8a2600e9987c936d750f562eec Mon Sep 17 00:00:00 2001
From: manu-sj <manu.joseph@logicalclocks.com>
Date: Thu, 21 May 2026 15:59:43 +0200
Subject: [PATCH 4/4] [FSTORE-1938] strike auto-vectorize claims from docs

Remove references to HSFS_DISABLE_PANDAS_AUTO_VECTORIZE and to an
auto-vectorize code path on get_feature_vectors. Neither the env
var nor the auto-vectorize code path exist in the SDK on this
branch, so the docs were promising behavior that does not ship.

Migration page drops the bullet about the env var. Performance
tuning page drops the two-sentence paragraph about batch
auto-vectorization. The rest of both pages is accurate as written.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/user_guides/fs/transformation_functions_performance.md | 3 ---
 docs/user_guides/migration/40_migration.md                  | 2 --
 2 files changed, 5 deletions(-)

diff --git a/docs/user_guides/fs/transformation_functions_performance.md b/docs/user_guides/fs/transformation_functions_performance.md
index 5f1537dcb5..3f2c4d5e27 100644
--- a/docs/user_guides/fs/transformation_functions_performance.md
+++ b/docs/user_guides/fs/transformation_functions_performance.md
@@ -31,9 +31,6 @@ Callers can always force a specific value by passing `n_processes` explicitly.
   Rows are chunked across workers; each worker runs the chain sequentially on its slice.
   This is the most useful parallelism axis for batched online calls with CPU-heavy chained UDFs.
 
-For `get_feature_vectors(entries)` with all-Pandas-mode UDFs, the engine auto-vectorizes the batch into a DataFrame so the DataFrame path applies instead.
-Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to opt out and force the per-row dict path.
-
 ## Latency reference
 
 The chained-TF online latency benchmark in the `loadtest` repository records p50, p95, and p99 across UDF styles and call shapes.
diff --git a/docs/user_guides/migration/40_migration.md b/docs/user_guides/migration/40_migration.md
index 23e7352b4e..193bebb8a8 100644
--- a/docs/user_guides/migration/40_migration.md
+++ b/docs/user_guides/migration/40_migration.md
@@ -122,8 +122,6 @@ The following behaviors changed and may affect existing pipelines:
   Pre-FSTORE-1938 versions stored an empty string when an input feature's type could not be resolved.
   This release validates types strictly on create and update and raises `TRANSFORMATION_FUNCTION_INPUT_TYPE_UNRESOLVABLE` if a typoed or missing feature reference produces an empty type.
   Read paths continue to tolerate empty types so existing detail pages do not break on upgrade.
-- Rare edge case for auto-vectorization on `get_feature_vectors`: a Pandas-declared UDF that branches on `len(series)` may behave differently when the framework converts a single-row batch into a 1-element Series.
-  Set `HSFS_DISABLE_PANDAS_AUTO_VECTORIZE=1` to keep the previous per-scalar behavior for affected UDFs.
 
 No minimum backend version is required for non-chained usage — existing feature views without chains continue to work unchanged after upgrade.
 Creating a chained feature view requires both backend and SDK on this release.