Rename fields in CaseRecord for consistency with langfuse evaluators#38
Rename fields in CaseRecord for consistency with langfuse evaluators#38
CaseRecord for consistency with langfuse evaluators#38Conversation
There was a problem hiding this comment.
Pull request overview
Renames CaseRecord fields to align with langfuse evaluator conventions (input, expected_output, output) and updates agent + tests accordingly.
Changes:
- Renamed
CaseRecordfields (case→input,groundtruth→expected_output,analysis→output). - Updated AML investigation agent logic to read/write/analyze using the new field names.
- Updated AML investigation test cases to assert against the renamed fields.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| implementations/aml_investigation/agent.py | Updates analysis pipeline, resume logic, and metrics computation to use input/expected_output/output. |
| aieng-eval-agents/aieng/agent_evals/aml_investigation/data/cases.py | Renames the CaseRecord schema and updates case-building helpers accordingly. |
| aieng-eval-agents/tests/aieng/agent_evals/aml_investigation/data/test_cases.py | Updates tests to reference the renamed CaseRecord fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| input: CaseFile = Field(..., description="Metadata for the laundering case.") | ||
| expected_output: GroundTruth = Field(..., description="Ground truth information for the laundering case.") | ||
| output: AnalystOutput | None = Field( | ||
| default=None, | ||
| description="Optional analyst output for the laundering case. Typically populated after investigation.", | ||
| ) |
There was a problem hiding this comment.
Renaming these serialized fields will break parsing of any existing JSON/JSONL persisted with the old keys (case, groundtruth, analysis). Since the agent reads prior results via CaseRecord.model_validate_json(...) for resume behavior, legacy lines will fail validation and be skipped. Consider adding Pydantic v2 compatibility via validation_alias (e.g., AliasChoices('input', 'case'), AliasChoices('expected_output', 'groundtruth'), AliasChoices('output', 'analysis')) so existing artifacts remain readable while emitting the new field names.
| input_records = _load_records(input_path) | ||
| existing_results = {record.case.case_id: record for record in _load_records(output_path)} | ||
| to_run = [r for r in input_records if existing_results.get(r.case.case_id, r).analysis is None] | ||
| existing_results = {record.input.case_id: record for record in _load_records(output_path)} |
There was a problem hiding this comment.
With the CaseRecord field rename, _load_records(output_path) will skip any legacy JSONL rows (old schema) as invalid, which can cause existing_results to be incomplete and make the resume logic re-run cases unnecessarily (and potentially overwrite/duplicate outputs). After adding schema aliases (or a migration path), it would also help to make legacy parsing failures more explicit here (e.g., fail fast or log a clear warning/count) so users don’t silently lose resume behavior.
| existing_results = {record.input.case_id: record for record in _load_records(output_path)} | |
| # Load existing results from the output file, and detect any lines that could not be parsed | |
| existing_records = list(_load_records(output_path)) | |
| existing_results = {record.input.case_id: record for record in existing_records} | |
| # Compare parsed records against total lines to surface potential legacy/invalid rows | |
| if output_path.exists(): | |
| try: | |
| with output_path.open("r", encoding="utf-8") as f: | |
| total_lines = sum(1 for _ in f) | |
| except OSError: | |
| total_lines = None | |
| if total_lines is not None and total_lines > len(existing_records): | |
| logger.warning( | |
| "Detected %d/%d records in %s that could not be parsed. " | |
| "These may be legacy or invalid rows, and resume behavior may be affected.", | |
| total_lines - len(existing_records), | |
| total_lines, | |
| output_path, | |
| ) |
Summary
Rename fields in
CaseRecordfor consistency with langfuse evaluatorsClickup Ticket(s): N/A
Type of Change
Changes Made
casetoinput.groundtruthtoexpected_output.analysistooutput.Testing
uv run pytest tests/)uv run mypy <src_dir>)uv run ruff check src_dir/)Manual testing details:
N/A
Screenshots/Recordings
N/A
Related Issues
N/A
Deployment Notes
N/A
Checklist