Entire request payload is read into memory without size checks

##  Issue: Entire request payload is read into memory without size checks

### Summary

Inference input requests are fully read into memory without validation, which may cause memory pressure or crashes for large payloads.

---

### Affected File

```
sagemaker-serve/src/sagemaker/serve/model_server/tensorflow_serving/inference.py
```

---

### Problem Description

In `input_handler`:

```python
read_data = data.read()
```

This reads the entire request body into memory with no size limits or safeguards.

For large requests (e.g. large batches or binary payloads), this may:

* Increase memory usage significantly
* Lead to out-of-memory errors
* Reduce system stability

---

### Expected Behavior

Input payload sizes should be validated or limited to prevent excessive memory usage.

---

### Suggested Fix

Add a simple size check or limit before deserialization, for example:

```python
if len(read_data) > MAX_PAYLOAD_BYTES:
    raise ValueError("Request payload is too large")
```

This provides safer handling without changing inference behavior for normal requests.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Entire request payload is read into memory without size checks #5481

Issue: Entire request payload is read into memory without size checks

Summary

Affected File

Problem Description

Expected Behavior

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Entire request payload is read into memory without size checks #5481

Description

Issue: Entire request payload is read into memory without size checks

Summary

Affected File

Problem Description

Expected Behavior

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions