Payload limit configuration and validation #1288

jmaeagle99 · 2026-01-20T22:02:27Z

What was changed

Update the SDK to check the size of payload collections and issue warnings and errors when the size exceeds limits:

Exceeding the error threshold will raise a PayloadsSizeError. When running within a worker, this exception is caught, logged as a warning, and the corresponding workflow task or activity is failed.
Exceeding the warning threshold will log a warning.

This is done by:

Create a PayloadLimitsConfig class for configuring the warning and error limits. The error limit are not defined by default whereas the warning limit is set to 512 KiB.
Add a payload_limits property to DataConverter of type PayloadLimitsConfig.
Add non-public helper methods to DataConverter for encoding and decoding payloads in all forms. The encoding methods call _validate_payload_limits before returning.
The rest of the SDK is updated to use the non-public helper methods on DataConveter instead of using payload_codec directly.

Examples

Log output when an activity attempts to return a result that exceeds the error limit:

13:26:25 [ WARNING] Activity task failed: payloads size exceeded the error limit. Size: 12325 bytes, Limit: 5120 bytes ({'activity_id': '1', 'activity_type': 'large_payload_activity', 'attempt': 3, 'namespace': 'default', 'task_queue': 'b6a43612-b0dc-4e74-9fc8-f112bb64f264', 'workflow_id': 'workflow-e645f290-513d-44a1-8a7e-6ef8d72d63e4', 'workflow_run_id': '019bdd4d-3268-7395-bb2e-451b692ec217', 'workflow_type': 'LargePayloadWorkflow'}) (_activity.py:387)

Log output when a workflow attempts to provide an activity input that exceeds the error limit:

16:20:22 [ WARNING] Workflow task failed: payloads size exceeded the error limit. Size: 12346 bytes, Limit: 5120 bytes (_workflow.py:371)

Note that the above example is missing the extra context that the activity result failure example has. This is due to the available logging infra where these errors are raised and can be fixed separately with some log refactoring (see deferred items).

Log output when a workflow attempts to provide activity input that is above the warning threshold but below the error limit:

16:24:46 [ WARNING] Payloads size has exceeded the warning limit. Size: 4154 bytes, Limit: 1024 bytes (converter.py:1291)

Same note about the missing extra context.

Deferred

Work that has been deferred to later PRs (unless requested to pull back in):

Use a new WorkflowTaskFailedCause to indicate the specific cause of the failure scenario. Pending Add a new workflow failure cause for oversized payloads api#697, integration into sdk-core, and sdk-core into sdk-python.
Use a context-aware logger into _validate_payload_limits to get rich information about the execution context when issuing a warning.
Use a context-aware logger in _WorkflowWorker::_handle_activation to get rich information about the execution context when issuing a warning when exceeding the payload error limit.
Get payload error limits from server to be used as defaults for workers.

Why?

Users need to know when payload sizes are approaching or have exceeded size limits. This will help prevent workflow outages and inform users to adjust their workflows to make use of alternate storage methods or to break down their payloads more granularly.

Checklist

Closes Warn if the SDK tried to send a payload above a specific size #1284
Closes SDK should fail workflow task if payloads size it known to be too large #1285
How was this tested: Unit tests
Any docs updates needed? Yes

temporalio/converter.py

temporalio/exceptions.py

cretz · 2026-01-21T14:54:26Z

temporalio/worker/_activity.py

+                elif isinstance(
+                    err,
+                    temporalio.exceptions.PayloadsSizeError,
+                ):
+                    temporalio.activity.logger.warning(
+                        "Activity task failed: payloads size exceeded the error limit. Size: %d bytes, Limit: %d bytes",
+                        err.payloads_size,
+                        err.payloads_limit,
+                        extra={"__temporal_error_identifier": "ActivityFailure"},
+                    )
+                    await data_converter.encode_failure(
+                        temporalio.exceptions.ApplicationError(
+                            type="PayloadsTooLarge",
+                            message="Payloads size has exceeded the error limit.",
+                        ),
+                        completion.result.failed.failure,
+                    )
+                    # TODO: Add force_cause to activity Failure bridge proto?
+                    # TODO: Add WORKFLOW_TASK_FAILED_CAUSE_PAYLOADS_TOO_LARGE to API
+                    # completion.result.failed.force_cause = WorkflowTaskFailedCause.WORKFLOW_TASK_FAILED_CAUSE_PAYLOADS_TOO_LARGE


I think we should let the traditional error path run here. Assuming a readable error message on the error, I think the only reason I can see not doing so is to have the ApplicationError.type be PayloadsTooLarge instead of PayloadsSizeError, but we can either alter the failure converter, or make just that slight specialization in the catch here.

Are you suggesting that PayloadSizeError should extend from ApplicationError?

We want a specialized warning message that better describes what the issue is and give better guidance in the worker output as to what went wrong rather and how to fix it than the standard "Completing activity as failed" message. My understanding is that log messages from sdk-core won't be surfaced in the worker unless a logger is configured via the telemetry configuration. And even then, I haven't seen activity failures get routed through there (see https://github.com/temporalio/sdk-python/pull/1288/changes#diff-3162a4b842d45d546da93b825218f7f863b6b481684d2b9570a38b04facb266bR8833), but maybe I'm doing something wrong with that.

On extending from ApplicationError, note that PayloadSizeError will be thrown from Client invocations if payload limits are configured. This is in contrast to the description of the ApplicationError which states "Error raised during workflow/activity execution."

Are you suggesting that PayloadSizeError should extend from ApplicationError?

No, in these situations, all non-Temporal-failure exceptions automatically convert to application error with their unqualified class name as the error type

We want a specialized warning message that better describes what the issue is and give better guidance in the worker output as to what went wrong rather and how to fix it than the standard "Completing activity as failed" message.

It sounds like everyone should get this message and not just this log statement. Therefore, such a message should be part of the error, not the log.

temporalio/worker/_activity.py

cretz · 2026-01-21T16:01:25Z

temporalio/worker/_activity.py

                        temporalio.exceptions.CancelledError("Cancelled"),
                        completion.result.cancelled.failure,
                    )
+                elif isinstance(


Note, this doesn't catch cases where encoding the failure includes payloads too large, may want to handle this error in the outer except. This can happen when application error details are too large, or when stack trace is too large and it's moved to encoded attributes (see temporalio/features#597).

temporalio/converter.py

cretz · 2026-01-21T20:31:51Z

temporalio/worker/_activity.py

+                elif isinstance(
+                    err,
+                    temporalio.exceptions.PayloadsSizeError,
+                ):
+                    temporalio.activity.logger.warning(
+                        "Activity task failed: payloads size exceeded the error limit. Size: %d bytes, Limit: %d bytes",
+                        err.payloads_size,
+                        err.payloads_limit,
+                        extra={"__temporal_error_identifier": "ActivityFailure"},
+                    )
+                    await data_converter.encode_failure(
+                        temporalio.exceptions.ApplicationError(
+                            type="PayloadsTooLarge",
+                            message="Payloads size has exceeded the error limit.",
+                        ),
+                        completion.result.failed.failure,
+                    )
+                    # TODO: Add force_cause to activity Failure bridge proto?
+                    # TODO: Add WORKFLOW_TASK_FAILED_CAUSE_PAYLOADS_TOO_LARGE to API
+                    # completion.result.failed.force_cause = WorkflowTaskFailedCause.WORKFLOW_TASK_FAILED_CAUSE_PAYLOADS_TOO_LARGE


Are you suggesting that PayloadSizeError should extend from ApplicationError?

No, in these situations, all non-Temporal-failure exceptions automatically convert to application error with their unqualified class name as the error type

We want a specialized warning message that better describes what the issue is and give better guidance in the worker output as to what went wrong rather and how to fix it than the standard "Completing activity as failed" message.

It sounds like everyone should get this message and not just this log statement. Therefore, such a message should be part of the error, not the log.

temporalio/exceptions.py

drewhoskins-temporal · 2026-01-24T03:05:58Z

temporalio/converter.py

+            )
+
+        if warning_limit and warning_limit > 0 and total_size > warning_limit:
+            # TODO: Use a context aware logger to log extra information about workflow/activity/etc


still TODO right?

drewhoskins-temporal · 2026-01-24T03:13:33Z

temporalio/converter.py

+            payloads,
+            self.payload_limits.payload_upload_error_limit,
+            self.payload_limits.payload_upload_warning_limit,
+            "Payloads size exceeded the warning limit.",


Context: of what? upload/download? Workflow task? Which endpoint?

Actionability: suggest sending to a stable docs URL. Perhaps https://docs.temporal.io/troubleshooting/blob-size-limit-error, or a stable shortened URL, and then we can improve that page over time.

drewhoskins-temporal · 2026-01-24T03:15:29Z

temporalio/exceptions.py

+            size: Actual payloads size in bytes.
+            limit: Payloads size limit in bytes.
+        """
+        super().__init__("Payloads size exceeded the error limit")


similar points here.

temporalio/converter.py

drewhoskins-temporal · 2026-01-24T03:26:21Z

tests/worker/test_workflow.py

+    config["data_converter"] = dataclasses.replace(
+        temporalio.converter.default(),
+        payload_limits=PayloadLimitsConfig(
+            payload_upload_error_limit=error_limit, payload_upload_warning_limit=1024
+        ),
+    )


This seems like an elaborate way to override a config. It's not going to be super common but I wonder if we can make it easier.
In any case, can we document the correct procedure on the class so that people who go-to-definition on it can figure out how to override it?

This seems like an elaborate way to override a config. It's not going to be super common but I wonder if we can make it easier.

The way I have it in test (other than all of the property settings being on one line) is the documented way for specifying overrides using the default data converter as the basis. See examples in https://docs.temporal.io/develop/python/converters-and-encryption. Maybe it looks better when it's on separate lines:

data_converter = dataclasses.replace( temporalio.converter.default(), payload_limits=PayloadLimitsConfig( payload_upload_error_limit=4 * 1024 * 1024, payload_upload_warning_limit=1024 ), )

It you are talking about the names of the fields on the PayloadLimitsConfig, note that there are four of them:

memo_upload_error_limit

memo_upload_warning_limit

payload_upload_error_limit

payload_upload_warning_limit

The memo_ settings are a specialization specifically for Memo fields. These fields are validated by the server by checking the summation of the payload size of each key to see if it is over the memo size limit (which is separately configurable from the general payload limit).

The upload part is to help disambiguate if we need some type of limits for downloads in the future.

I'm up for changing any of the names; just explaining the current rationale.

In any case, can we document the correct procedure on the class so that people who go-to-definition on it can figure out how to override it?

What would the user go-to-definition on specifically in their pursuit to (a) find that they can override something and (b) know how to override it? Starting with a basic client configuration, it's not obvious where that starting point is. Would you GTD on load_client_connect_config, then GTD on the return type (if you can figure it out), and then you are looking at the ClientConfig (which is hardly documented), etc. It seems to me that samples or developer docs show the way to do "how do I override this specific aspect", not the SDK code itself.

That being said, I would totally agree with an effort to make GTD more useful in terms of discovering what is possible and how to use all of the options, but I don't know if making that effort here would bear fruit if the top level parts of the discoverability chain don't support it.

I thin kthe Top of the funnel here is when they get the error message or warning, they can either read the message or find the line from the callstack and then grep for the threshold variable name.

temporalio/converter.py

jmaeagle99 marked this pull request as ready for review January 21, 2026 05:04

jmaeagle99 requested a review from a team as a code owner January 21, 2026 05:04

cretz reviewed Jan 21, 2026

View reviewed changes

jmaeagle99 marked this pull request as draft January 22, 2026 04:54

drewhoskins-temporal reviewed Jan 24, 2026

View reviewed changes

Payload limit configuration and validation

04a1c6d

jmaeagle99 force-pushed the payload-limits branch from bd14d85 to 04a1c6d Compare January 27, 2026 18:54

jmaeagle99 added 5 commits January 27, 2026 21:49

Change error limits to disabling fields

9ca018d

Use error limits from server

fb7f857

Remove tests that do not test real code paths

98b1eb2

Don't pass error limits to workflow replayer

861269b

Fix formatting

526f595

Payload limit configuration and validation #1288

Are you sure you want to change the base?

Payload limit configuration and validation #1288

Conversation

jmaeagle99 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Examples

Deferred

Why?

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmaeagle99 commented Jan 20, 2026 •

edited

Loading