|
| 1 | +# SDK Engineering Investigation: Connection Drops / UND_ERR_SOCKET Handling |
| 2 | + |
| 3 | +**Context:** Customer (Berlitz) experienced intermittent Node.js build crashes when the Contentstack SDK (v3.17.1) fetched from the CDA during AWS CodeBuild. The process terminated with `TypeError: terminated` and `[cause]: SocketError: other side closed (code: UND_ERR_SOCKET)`. |
| 4 | + |
| 5 | +**Scope:** Investigate how the Contentstack SDK and Node 22’s undici fetch layer handle connection drops/socket closures, and whether the SDK can catch these errors to retry or return a formatted error and prevent process crash. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 1. Request Flow: SDK → Fetch → Undici |
| 10 | + |
| 11 | +| Layer | Component | Role | |
| 12 | +|-------|-----------|------| |
| 13 | +| App | Customer code (e.g. Astro build) | Calls `Stack.ContentType(...).Query().find()` or `.fetch()` | |
| 14 | +| SDK | `src/core/lib/request.js` → `fetchRetry()` | Builds URL/options, calls `fetch()`, handles response and retries | |
| 15 | +| Runtime | `src/runtime/node/http.js` | Re-exports global `fetch` (Node 18+ built-in) | |
| 16 | +| Node | Built-in `fetch` | Implemented by **undici** (bundled in Node 18+) | |
| 17 | +| Undici | Fetch / TLSSocket | Performs HTTP, surfaces errors via rejected promise and `error.cause` | |
| 18 | + |
| 19 | +- In **Node 22**, the global `fetch` is provided by Node’s bundled **undici**. The SDK does not import undici directly; it uses whatever `fetch` the Node runtime exposes (`runtime/http.js` → global `fetch`). |
| 20 | +- When the **remote server closes the TLS connection** (e.g. CDN/edge closes the socket), undici: |
| 21 | + - Emits the error internally (e.g. `Fetch.onAborted`, `Fetch.terminate`). |
| 22 | + - Rejects the **fetch promise** with a `TypeError('terminated', { cause: SocketError })`, where `cause.code === 'UND_ERR_SOCKET'`. |
| 23 | + - Alternatively, if the connection closes **after** the response object is returned but **during** body consumption, the promise returned by **`response.json()`** (or `response.text()` / body read) rejects with the same kind of error. |
| 24 | + |
| 25 | +So: |
| 26 | +- **Fetch-level:** The `fetch(url, options)` promise rejects with `TypeError: terminated` and `error.cause.code === 'UND_ERR_SOCKET'` (or `UND_ERR_ABORTED`). |
| 27 | +- **Body-read-level:** The `response.json()` promise rejects with the same when the socket is closed while reading the body. |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## 2. Previous SDK Behavior (Gaps) |
| 32 | + |
| 33 | +### 2.1 Where the crash came from |
| 34 | + |
| 35 | +- **Unhandled rejection in the 200 branch** |
| 36 | + For `response.ok && response.status === 200`, the SDK did: |
| 37 | + - `const data = response.json();` |
| 38 | + - `data.then(json => { ... resolve(json); });` |
| 39 | + - **No `.catch()`** on that `data` promise. |
| 40 | + If the remote closed the connection **during** body read, `response.json()` rejected with `TypeError: terminated`. That rejection was **unhandled** and could trigger Node’s unhandled-rejection behavior and **terminate the process**. |
| 41 | + |
| 42 | +- **Fetch-level rejection was caught but not retried** |
| 43 | + The outer `fetch(...).catch((error) => { reject(error); })` did catch fetch-level errors (e.g. connection closed before/during response). So the **fetch** rejection itself did not leave an unhandled rejection. However: |
| 44 | + - Socket/abort errors were **not** retried; only HTTP status–based retries (e.g. 408, 429) were done via `retryCondition`. |
| 45 | + - So a single UND_ERR_SOCKET led to one rejected promise. If the **caller** did not handle that rejection (e.g. missing `.catch()` on a parallel or fire-and-forget call), it could still crash the process. |
| 46 | + |
| 47 | +- **Non-200 branch** |
| 48 | + The non-200 path had `.catch(() => reject({ status, statusText }))` on `data.then(...)`, so body-read failures were caught, but: |
| 49 | + - The real error was discarded (no retry for socket/abort, and the rejected value was a generic `{ status, statusText }`). |
| 50 | + |
| 51 | +### 2.2 Summary of previous behavior |
| 52 | + |
| 53 | +| Scenario | Handled? | Retried? | Result | |
| 54 | +|----------|----------|----------|--------| |
| 55 | +| Fetch rejects (e.g. socket closed before/during response) | Yes (outer .catch) | No | Reject once → crash if caller doesn’t handle | |
| 56 | +| `response.json()` rejects in 200 branch (socket closed during body) | **No** | No | **Unhandled rejection → process crash** | |
| 57 | +| `response.json()` rejects in non-200 branch | Yes | No | Reject with generic object | |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## 3. Current SDK Behavior (After Fix) |
| 62 | + |
| 63 | +The following is implemented in **`src/core/lib/request.js`** (same behavior for SDK versions that include this fix). |
| 64 | + |
| 65 | +### 3.1 Detecting socket/abort errors |
| 66 | + |
| 67 | +The SDK treats an error as a **socket/abort** error when: |
| 68 | + |
| 69 | +- `error.message === 'terminated'`, or |
| 70 | +- `error.cause && (error.cause.code === 'UND_ERR_SOCKET' || error.cause.code === 'UND_ERR_ABORTED')` |
| 71 | + |
| 72 | +This matches how Node 22 / undici surface connection drops and aborts. |
| 73 | + |
| 74 | +### 3.2 Catching and handling |
| 75 | + |
| 76 | +- **200 branch** |
| 77 | + - `data.then(...).catch((err) => { ... })` is attached to the promise from `response.json()`. |
| 78 | + - If that promise rejects (e.g. UND_ERR_SOCKET during body read): |
| 79 | + - The error is **caught** (no unhandled rejection). |
| 80 | + - If it is a socket/abort error and `retryLimit > 0`, the SDK calls `onError(err)` and **retries** with the existing backoff. |
| 81 | + - Otherwise it **rejects** the Request promise with the same `err`, so the caller gets a proper rejection they can handle. |
| 82 | + |
| 83 | +- **Non-200 branch** |
| 84 | + - `.catch((err) => { ... })` is used on the `data.then(...)` chain. |
| 85 | + - Same logic: socket/abort → retry when `retryLimit > 0`, else reject with `err` (or `{ status, statusText }` if `err` is missing). |
| 86 | + |
| 87 | +- **Fetch-level** |
| 88 | + - The outer `fetch(...).catch((error) => { ... })` still catches when the **fetch** promise rejects (e.g. connection closed before or during response). |
| 89 | + - If the error is socket/abort and `retryLimit > 0`, the SDK calls `onError(error)` and **retries**. |
| 90 | + - Otherwise it **rejects** with the same `error`. |
| 91 | + |
| 92 | +### 3.3 Retry behavior |
| 93 | + |
| 94 | +- Retries use the existing **fetchOptions**: `retryLimit` (default 5), `retryDelay` (default 300 ms), and optional `retryDelayOptions` (e.g. base or customBackoff). |
| 95 | +- No change to the existing retry contract; socket/abort errors are now **eligible** for the same retry path as other retriable failures. |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## 4. Conclusion |
| 100 | + |
| 101 | +| Question | Answer | |
| 102 | +|----------|--------| |
| 103 | +| How does the SDK interact with undici? | Via the global `fetch` in Node (Node runtime). The SDK does not use undici directly. | |
| 104 | +| How does Node 22 / undici surface connection drops? | By rejecting the `fetch` promise or the `response.json()` (body) promise with `TypeError('terminated', { cause: SocketError })` and `cause.code === 'UND_ERR_SOCKET'` (or `UND_ERR_ABORTED`). | |
| 105 | +| Can the SDK catch Fetch.onAborted / UND_ERR_SOCKET? | **Yes.** Both the fetch-level rejection and the body-read (e.g. `response.json()`) rejection are caught in `request.js`. | |
| 106 | +| Does the SDK initiate a retry for these errors? | **Yes.** When the error is identified as socket/abort and `retryLimit > 0`, the SDK uses the existing `onError()` path and retries with the configured delay/backoff. | |
| 107 | +| Does the SDK return a formatted error instead of crashing? | **Yes.** If retries are exhausted or the error is not socket/abort, the SDK **rejects** the Request promise with the same error object (so the caller can inspect `error.message`, `error.cause`, and `error.cause.code`). The process does not crash from an unhandled rejection in the SDK. | |
| 108 | + |
| 109 | +**Summary:** The SDK now catches connection drops and socket closures (Fetch.onAborted / UND_ERR_SOCKET) at both fetch and body-read level, retries them when possible using the existing retry mechanism, and otherwise rejects the returned promise with the underlying error. This prevents unhandled exceptions from crashing the user’s Node.js build process while keeping errors identifiable (e.g. for logging or 422 handling). |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +## 5. References |
| 114 | + |
| 115 | +- **Request implementation:** `src/core/lib/request.js` (fetchRetry, 200/non-200 branches, outer fetch .catch). |
| 116 | +- **Node runtime:** `src/runtime/node/http.js` (re-exports global `fetch`). |
| 117 | +- **Customer error:** `TypeError: terminated` with `[cause]: SocketError: other side closed`, `code: 'UND_ERR_SOCKET'` (e.g. from Node `internal/deps/undici`). |
| 118 | +- **Node 22:** Uses bundled undici for `fetch`; socket errors are surfaced as above. |
0 commit comments