Description
When multiple scale-up Lambda invocations run concurrently (common during burst workloads with 100+ workflow_job events), the GitHub App JWT generation produces byte-identical tokens. GitHub rejects these duplicates (likely replay protection), causing POST /app/installations/{id}/access_tokens to return HTTP 404. This triggers silent batch dropping (see related issue: silent batch drop) and permanently loses the affected jobs.
Root Cause
The @octokit/auth-app library (via universal-github-app-jwt) generates JWTs with only { iat, exp, iss } claims — no jti (JWT ID). The iat uses seconds precision (Math.floor(Date.now() / 1000)). When multiple Lambda invocations generate JWTs within the same second using the same App ID and private key, they produce byte-identical tokens.
GitHub rejects duplicate JWTs, causing the POST /app/installations/{id}/access_tokens request to be treated as unauthenticated. An unauthenticated request to this endpoint returns HTTP 404 (GitHub won't confirm resource existence to unauthorized callers).
The 404 is transient — the same installation ID succeeds on subsequent requests seconds later.
Observed Error
{
"level": "ERROR",
"message": "Error processing batch (size: 4): Not Found, ignoring batch",
"error": {
"name": "HttpError",
"status": 404,
"request": {
"method": "POST",
"url": "https://api.<redacted>.ghe.com/app/installations/<redacted>/access_tokens"
},
"response": { "status": 404, "data": { "message": "Not Found" } }
}
}
This repeats for multiple concurrent invocations in rapid succession while other invocations at the same time succeed with the same installation ID.
Impact
- Combined with the silent batch drop issue, this permanently loses SQS messages and their corresponding jobs.
- The failure rate correlates with burst size and number of runner type configurations. More configurations = more concurrent Lambdas = higher probability of same-second JWT generation.
- Small workloads work fine; large matrix workflows trigger this consistently.
Environment
- Module version:
~> 7.3
- GitHub: Enterprise Cloud with Data Residency (
ghe.com)
- Multi-runner module with 12+ runner type configurations
Suggested Fixes
Fix 1: Add jti claim to JWT generation
Add a unique jti (JWT ID) claim to the JWT payload to prevent byte-identical tokens. This can be done via @octokit/auth-app's createJwt callback:
import { randomUUID } from 'node:crypto';
const auth = createAppAuth({
appId,
createJwt: async ({ appId, privateKey }) => {
const now = Math.floor(Date.now() / 1000);
const payload = { iat: now - 60, exp: now + 600, iss: appId, jti: randomUUID() };
// ... sign with privateKey
},
});
This eliminates the root cause of the transient 404s.
Alternatively, this could be fixed upstream in universal-github-app-jwt by always including a jti claim.
Fix 2: Retry installation token API call with backoff
As a defense-in-depth measure, treat HTTP 404 on POST /app/installations/{id}/access_tokens as a transient error and retry within createGithubInstallationAuth():
const installationId = await getInstallationId(githubAppClient, enableOrgLevel, payload);
let ghAuth, retries = 0;
while (retries < 3) {
try {
ghAuth = await createGithubInstallationAuth(installationId, ghesApiUrl);
break;
} catch (e) {
if (e.status === 404 && retries < 2) {
retries++;
await new Promise(r => setTimeout(r, 1000 * retries));
continue;
}
throw e;
}
}
Our Workaround
We have applied both fixes locally:
- Custom
createJwt callback with jti claim via crypto.randomUUID()
- Retry with backoff on 404 in
createGithubInstallationAuth()
Description
When multiple scale-up Lambda invocations run concurrently (common during burst workloads with 100+
workflow_jobevents), the GitHub App JWT generation produces byte-identical tokens. GitHub rejects these duplicates (likely replay protection), causingPOST /app/installations/{id}/access_tokensto return HTTP 404. This triggers silent batch dropping (see related issue: silent batch drop) and permanently loses the affected jobs.Root Cause
The
@octokit/auth-applibrary (viauniversal-github-app-jwt) generates JWTs with only{ iat, exp, iss }claims — nojti(JWT ID). Theiatuses seconds precision (Math.floor(Date.now() / 1000)). When multiple Lambda invocations generate JWTs within the same second using the same App ID and private key, they produce byte-identical tokens.GitHub rejects duplicate JWTs, causing the
POST /app/installations/{id}/access_tokensrequest to be treated as unauthenticated. An unauthenticated request to this endpoint returns HTTP 404 (GitHub won't confirm resource existence to unauthorized callers).The 404 is transient — the same installation ID succeeds on subsequent requests seconds later.
Observed Error
{ "level": "ERROR", "message": "Error processing batch (size: 4): Not Found, ignoring batch", "error": { "name": "HttpError", "status": 404, "request": { "method": "POST", "url": "https://api.<redacted>.ghe.com/app/installations/<redacted>/access_tokens" }, "response": { "status": 404, "data": { "message": "Not Found" } } } }This repeats for multiple concurrent invocations in rapid succession while other invocations at the same time succeed with the same installation ID.
Impact
Environment
~> 7.3ghe.com)Suggested Fixes
Fix 1: Add
jticlaim to JWT generationAdd a unique
jti(JWT ID) claim to the JWT payload to prevent byte-identical tokens. This can be done via@octokit/auth-app'screateJwtcallback:This eliminates the root cause of the transient 404s.
Alternatively, this could be fixed upstream in
universal-github-app-jwtby always including ajticlaim.Fix 2: Retry installation token API call with backoff
As a defense-in-depth measure, treat HTTP 404 on
POST /app/installations/{id}/access_tokensas a transient error and retry withincreateGithubInstallationAuth():Our Workaround
We have applied both fixes locally:
createJwtcallback withjticlaim viacrypto.randomUUID()createGithubInstallationAuth()