-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Issue: Entire request payload is read into memory without size checks
Summary
Inference input requests are fully read into memory without validation, which may cause memory pressure or crashes for large payloads.
Affected File
sagemaker-serve/src/sagemaker/serve/model_server/tensorflow_serving/inference.py
Problem Description
In input_handler:
read_data = data.read()This reads the entire request body into memory with no size limits or safeguards.
For large requests (e.g. large batches or binary payloads), this may:
- Increase memory usage significantly
- Lead to out-of-memory errors
- Reduce system stability
Expected Behavior
Input payload sizes should be validated or limited to prevent excessive memory usage.
Suggested Fix
Add a simple size check or limit before deserialization, for example:
if len(read_data) > MAX_PAYLOAD_BYTES:
raise ValueError("Request payload is too large")This provides safer handling without changing inference behavior for normal requests.
Metadata
Metadata
Assignees
Labels
No labels