Skip to content

Entire request payload is read into memory without size checks #5481

@Mattral

Description

@Mattral

Issue: Entire request payload is read into memory without size checks

Summary

Inference input requests are fully read into memory without validation, which may cause memory pressure or crashes for large payloads.


Affected File

sagemaker-serve/src/sagemaker/serve/model_server/tensorflow_serving/inference.py

Problem Description

In input_handler:

read_data = data.read()

This reads the entire request body into memory with no size limits or safeguards.

For large requests (e.g. large batches or binary payloads), this may:

  • Increase memory usage significantly
  • Lead to out-of-memory errors
  • Reduce system stability

Expected Behavior

Input payload sizes should be validated or limited to prevent excessive memory usage.


Suggested Fix

Add a simple size check or limit before deserialization, for example:

if len(read_data) > MAX_PAYLOAD_BYTES:
    raise ValueError("Request payload is too large")

This provides safer handling without changing inference behavior for normal requests.


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions