Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 20 additions & 8 deletions docs/mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -1401,7 +1401,7 @@ Triggers a Provar Automation test run using the currently loaded properties file
| --------- | -------- | -------- | ------------------------------------------------------------------------ |
| `flags` | string[] | no | Raw CLI flags to forward (e.g. `["--project-path", "/path/to/project"]`) |

**Output** — `{ requestId, exitCode, stdout, stderr[, output_lines_suppressed][, steps][, details.warning] }`
**Output** — `{ requestId, exitCode, stdout, stderr[, output_lines_suppressed][, steps][, details.warning][, warnings] }`

The `stdout` field is filtered before returning: Java schema-validator lines (`com.networknt.schema.*`) and stale logger-lock `SEVERE` warnings are stripped. If any lines were suppressed, `output_lines_suppressed` contains the count.

Expand All @@ -1410,20 +1410,32 @@ After each run, the tool scans the results directory for JUnit XML files and add
```json
"steps": [
{ "testItemId": "1", "title": "TC-Login-001-LoginAndVerify.testcase", "status": "pass" },
{ "testItemId": "2", "title": "TC-Login-002-ForgotPassword.testcase", "status": "fail", "errorMessage": "TimeoutException: page did not load", "error_category": "TIMEOUT", "retryable": true }
{ "testItemId": "2", "title": "TC-Login-002-ForgotPassword.testcase", "status": "fail", "errorMessage": "TimeoutException: page did not load",
"error_category": "TIMEOUT", "retryable": true }
]
```

Each entry represents one test case. `status` is `"pass"`, `"fail"`, or `"skip"`. If the results directory cannot be located or contains no JUnit XML, `details.warning` explains why and `steps` is absent.
Each entry represents one test case. status is "pass", "fail", or "skip". If the results directory cannot be located or contains no JUnit XML,
details.warning explains why and steps is absent.

Failed steps may include two optional classification fields:

- `error_category` — one of `INFRASTRUCTURE`, `ASSERTION`, `LOCATOR`, `TIMEOUT`, `OTHER`, set when the failure text matches a known pattern.
- `retryable` — `true` when `error_category` is `INFRASTRUCTURE` or `TIMEOUT` (transient causes), `false` for `ASSERTION`/`LOCATOR`/`OTHER`. Absent when no pattern matched.
- error_category — one of INFRASTRUCTURE, ASSERTION, LOCATOR, TIMEOUT, OTHER, set when the failure text matches a known pattern.
- retryable — true when error_category is INFRASTRUCTURE or TIMEOUT (transient causes), false for ASSERTION/LOCATOR/OTHER. Absent when no
pattern matched.

**Error codes:** `AUTOMATION_TESTRUN_FAILED`, `SF_NOT_FOUND`
Zero-tests guard (RUN-001): when the sf command exits 0, the results directory was located, and at least one JUnit XML file parsed successfully
but contains zero executed test cases, the response includes a warnings[] array containing a RUN-001 (#warning-codes) message. This is almost
always a typo such as testCase vs testCases (or some other unknown key) in provardx-properties.json — the run silently selected nothing. The
warning is additive and never flips exitCode or sets isError; the failure surface remains driven by the underlying sf exit code.

---
▎ Why RUN-001 stays silent when no JUnit data is available: if the results directory cannot be located, contains no XML files, or every XML file
▎ fails to parse, the tool genuinely has no data on which to assert "zero tests ran" — the absence of parsed results is just "we don't know
▎ what ran". In those cases the response carries details.warning (explaining why structured step data is missing) and RUN-001 is suppressed to
▎ avoid misdirecting the agent toward a typo when the real issue is a missing/unreadable results dir.

Error codes: AUTOMATION_TESTRUN_FAILED, SF_NOT_FOUND
Warning codes: RUN-001 (zero tests executed despite success)
```

### `provar_automation_compile`

Expand Down
16 changes: 14 additions & 2 deletions src/mcp/tools/antTools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1024,6 +1024,12 @@ export function isStepRetryable(category: JUnitErrorCategory | undefined): boole
export interface JUnitParseResult {
steps: JUnitStepResult[];
warning?: string;
/**
* True iff at least one JUnit XML file was located AND parsed without throwing.
* Distinguishes "we have data and the test selector matched zero cases" (legit RUN-001 signal)
* from "we have no data because nothing parsed" (insufficient info — must stay silent).
*/
parsedAny: boolean;
}

function extractFailureText(el: unknown): string | undefined {
Expand Down Expand Up @@ -1112,14 +1118,15 @@ function findXmlFiles(dir: string): string[] {
*/
export function parseJUnitResults(resultsDir: string): JUnitParseResult {
if (!fs.existsSync(resultsDir)) {
return { steps: [], warning: `Results directory not found: ${resultsDir}` };
return { steps: [], warning: `Results directory not found: ${resultsDir}`, parsedAny: false };
}

const xmlFiles = findXmlFiles(resultsDir);
if (xmlFiles.length === 0) {
return {
steps: [],
warning: 'No JUnit XML files found in results directory — structured step output unavailable.',
parsedAny: false,
};
}

Expand Down Expand Up @@ -1152,20 +1159,25 @@ export function parseJUnitResults(resultsDir: string): JUnitParseResult {
return {
steps: [],
warning: 'JUnit XML files found but could not be parsed — structured step output unavailable.',
parsedAny: false,
};
}
if (allSteps.length === 0) {
// We did parse at least one file; the file just had zero <testcase> entries (or none we could
// recognise as steps). This is the legitimate "selector matched nothing" signal that RUN-001
// is built to catch.
return {
steps: [],
warning: 'JUnit XML found but no test steps could be extracted — files may not be standard JUnit format.',
parsedAny: true,
};
}

const warning =
parseFailures > 0
? `${parseFailures} JUnit XML file(s) could not be parsed — step data may be incomplete.`
: undefined;
return { steps: allSteps, warning };
return { steps: allSteps, warning, parsedAny: true };
}

// ── Registration ──────────────────────────────────────────────────────────────
Expand Down
74 changes: 63 additions & 11 deletions src/mcp/tools/automationTools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import { makeError, makeRequestId } from '../schemas/common.js';
import { log } from '../logging/logger.js';
import type { ServerConfig } from '../server.js';
import { assertPathAllowed, PathPolicyError } from '../security/pathPolicy.js';
import { WARNING_CODES, formatWarning } from '../utils/warningCodes.js';
import { parseJUnitResults } from './antTools.js';
import { runSfCommand } from './sfSpawn.js';
import { desc } from './descHelper.js';
Expand Down Expand Up @@ -226,6 +227,38 @@ function readResultsPathFromSfConfig(config: ServerConfig): string | null {

// ── Tool: provar_automation_testrun ───────────────────────────────────────────

/**
* JUnit introspection for the testrun response. Returns enough structure that
* downstream warning emitters (RUN-001 zero-tests, future JUNIT-001 expected-vs-
* actual mismatch) can read a single object instead of re-parsing.
*/
type JUnitIntrospection = {
steps: ReturnType<typeof parseJUnitResults>['steps'];
stepCount: number;
parseWarning: string | undefined;
resultsPathResolved: boolean;
/**
* True iff at least one JUnit XML file was located AND parsed without throwing.
* Gates RUN-001: a `stepCount === 0` only means "zero tests executed" when we know we
* actually have parseable data. With `parsedAny === false` the count is "we don't know",
* which must stay silent (details.warning already covers it).
*/
parsedAny: boolean;
};

function introspectJUnit(config: ServerConfig): JUnitIntrospection {
const resultsPath = readResultsPathFromSfConfig(config);
if (!resultsPath) {
return { steps: [], stepCount: 0, parseWarning: undefined, resultsPathResolved: false, parsedAny: false };
}
const { steps, warning, parsedAny } = parseJUnitResults(resultsPath);
return { steps, stepCount: steps.length, parseWarning: warning, resultsPathResolved: true, parsedAny };
}

const ZERO_TESTS_MESSAGE =
'Test run exited successfully but zero tests were executed. ' +
'Check the testCase / testCases (note spelling) field in provardx-properties.json.';

export function registerAutomationTestRun(server: McpServer, config: ServerConfig): void {
server.registerTool(
'provar_automation_testrun',
Expand All @@ -240,11 +273,12 @@ export function registerAutomationTestRun(server: McpServer, config: ServerConfi
'For grid/CI execution via Provar Quality Hub instead of running locally, use provar_qualityhub_testrun.',
'Output buffer: a 50 MB maxBuffer is set so ENOBUFS on verbose Provar runs is now rare.',
'If ENOBUFS still occurs (extremely verbose logging), run `sf provar automation test run --json` directly in the terminal and pipe or tail the output instead of retrying this tool.',
'Zero-tests guard: if the sf exit code is 0, the results directory was located, and at least one JUnit XML file parsed successfully but contains zero executed tests, a RUN-001 warning is added to `warnings[]` — usually a typo such as `testCase` vs `testCases` in provardx-properties.json. When no JUnit data is available (dir missing or all XML unparseable), `details.warning` is set instead and RUN-001 stays silent.',
'Typical local AI loop: config.load → compile → testrun → inspect results.',
'Each failed step in `steps[]` may include optional error_category (INFRASTRUCTURE|ASSERTION|LOCATOR|TIMEOUT|OTHER)',
'and retryable (boolean) fields when the failure text matches a known pattern — use these to drive automated retry policy.',
].join(' '),
'Run local Provar tests via sf CLI; requires config_load first.'
'Run local Provar tests via sf CLI; requires config_load first. Surfaces RUN-001 on zero-tests-executed.'
),
inputSchema: {
flags: z
Expand Down Expand Up @@ -276,11 +310,10 @@ export function registerAutomationTestRun(server: McpServer, config: ServerConfi
const result = runSfCommand(['provar', 'automation', 'test', 'run', ...flags], sf_path);
const { filtered, suppressed } = filterTestRunOutput(result.stdout);

// Attempt to enrich the response with structured step data from JUnit XML
const resultsPath = readResultsPathFromSfConfig(config);
const { steps, warning: junitWarning } = resultsPath
? parseJUnitResults(resultsPath)
: { steps: [], warning: undefined };
// Enrich the response with structured step data + warning hooks from JUnit XML.
// Single introspection call keeps the wiring extensible (e.g. future JUNIT-001
// expected-vs-actual mismatch can read stepCount from the same struct).
const junit = introspectJUnit(config);

if (result.exitCode !== 0) {
const { filtered: filteredErr, suppressed: suppressedErr } = filterTestRunOutput(
Expand All @@ -290,11 +323,11 @@ export function registerAutomationTestRun(server: McpServer, config: ServerConfi
...makeError('AUTOMATION_TESTRUN_FAILED', filteredErr, requestId),
...(suppressedErr > 0 ? { output_lines_suppressed: suppressedErr } : {}),
};
if (steps.length > 0) errBody['steps'] = steps;
if (!resultsPath || junitWarning) {
if (junit.steps.length > 0) errBody['steps'] = junit.steps;
if (!junit.resultsPathResolved || junit.parseWarning) {
errBody['details'] = {
warning:
junitWarning ??
junit.parseWarning ??
'Could not locate results directory — step-level output unavailable. Run provar_automation_config_load first.',
};
}
Expand All @@ -308,8 +341,27 @@ export function registerAutomationTestRun(server: McpServer, config: ServerConfi
stderr: result.stderr,
};
if (suppressed > 0) response['output_lines_suppressed'] = suppressed;
if (steps.length > 0) response['steps'] = steps;
if (junitWarning) response['details'] = { warning: junitWarning };
if (junit.steps.length > 0) response['steps'] = junit.steps;
if (junit.parseWarning) response['details'] = { warning: junit.parseWarning };

// RUN-001: sf reported success but zero tests actually executed.
// Almost always a typo in the testCase / testCases field of provardx-properties.json.
// Only fires when:
// 1. The results dir was located (resultsPathResolved), AND
// 2. At least one JUnit XML file was successfully parsed (parsedAny).
// Without (2) `stepCount === 0` just means "we don't have parseable data" — not
// "zero tests ran" — and the agent would be misdirected toward a typo when the
// real issue is a missing/unparseable results dir. That case is already surfaced
// via `details.warning` from the parse layer. With parsedAny === true and zero
// extracted steps, we know the selector genuinely matched nothing.
if (junit.resultsPathResolved && junit.parsedAny && junit.stepCount === 0) {
const warningStr = formatWarning(WARNING_CODES.RUN_001, ZERO_TESTS_MESSAGE);
// Append rather than overwrite so future warning emitters (e.g. JUNIT-001 mismatch
// in PDX-491) can coexist on the same response without stepping on each other.
const existing = response['warnings'] as string[] | undefined;
response['warnings'] = existing ? existing.concat(warningStr) : [warningStr];
}

return { content: [{ type: 'text' as const, text: JSON.stringify(response) }], structuredContent: response };
} catch (err) {
return handleSpawnError(err, requestId, 'provar_automation_testrun');
Expand Down
15 changes: 15 additions & 0 deletions test/unit/mcp/antTools.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -800,12 +800,14 @@ describe('parseJUnitResults', () => {
const result = parseJUnitResults(path.join(junitTmpDir, 'nonexistent'));
assert.deepEqual(result.steps, []);
assert.ok(result.warning?.includes('not found'));
assert.equal(result.parsedAny, false, 'parsedAny must be false when dir is missing');
});

it('returns warning when directory contains no XML files', () => {
const result = parseJUnitResults(junitTmpDir);
assert.deepEqual(result.steps, []);
assert.ok(result.warning?.includes('No JUnit XML'));
assert.equal(result.parsedAny, false, 'parsedAny must be false when no XML files exist');
});

it('extracts steps from a bare <testsuite> JUnit file', () => {
Expand All @@ -818,6 +820,7 @@ describe('parseJUnitResults', () => {
assert.equal(result.steps[1].status, 'fail');
assert.ok(result.steps[1].errorMessage?.includes('Element not found'));
assert.equal(result.warning, undefined);
assert.equal(result.parsedAny, true, 'parsedAny must be true when at least one file parsed');
});

it('extracts steps from a <testsuites> wrapper JUnit file', () => {
Expand All @@ -836,6 +839,18 @@ describe('parseJUnitResults', () => {
const result = parseJUnitResults(junitTmpDir);
assert.deepEqual(result.steps, []);
assert.ok((result.warning?.length ?? 0) > 0);
// parsedAny must be TRUE here: the file was readable and parsed, it just has zero
// <testcase> entries. This is the legitimate RUN-001 signal — distinct from "we have
// no data at all".
assert.equal(result.parsedAny, true, 'parsedAny must be true when XML parsed but had no steps');
});

it('returns parsedAny=false when all XML files fail to parse', () => {
fs.writeFileSync(path.join(junitTmpDir, 'broken.xml'), '<this is < not valid xml');
const result = parseJUnitResults(junitTmpDir);
assert.deepEqual(result.steps, []);
assert.ok(result.warning?.includes('could not be parsed'));
assert.equal(result.parsedAny, false, 'parsedAny must be false when every XML file throws');
});

it('combines message attribute and CDATA body in failure text', () => {
Expand Down
Loading
Loading