Skip to content

port evaluation agent#223

Open
jtrayfield wants to merge 5 commits intomainfrom
jtray-175
Open

port evaluation agent#223
jtrayfield wants to merge 5 commits intomainfrom
jtray-175

Conversation

@jtrayfield
Copy link
Collaborator

I ran it against two trajectory cases from earlier.

I had to fetch the characteristic form from hugging face.

@ShuxinLin
Copy link
Collaborator

ShuxinLin commented Mar 20, 2026

did you run the refactored mcp and get the trajectories? I saw demonstration in trajectories which is completely removed in the new implementation. please don't add 0001.json and 0002.json. keep the repo clean. also add instructions in INSTRUCTIONS.md if needed.

@jtrayfield @DhavalRepo18

@jtrayfield
Copy link
Collaborator Author

where are the instructions for running MCP metaagent?

I removed 0001.json and 0002.json.

I added README.md

@ShuxinLin

@DhavalRepo18
Copy link
Collaborator

DhavalRepo18 commented Mar 23, 2026

@ShuxinLin I suggest we address this GitHub issue together with the Evaluation Agent:
#205

We should introduce a base Trajectory class and then extend it for specific workflows (e.g., plan-execute). Other agents can similarly derive their own trajectory implementations. The Evaluation Agent should rely on this base class for defining evaluations.

Additionally, the instructions.md file is becoming quite large, let’s consider breaking it into smaller, modular, and independently manageable sections.

If possible, please add the base class as soon as you can and enable exporting trajectories from the plan-execute workflow so they can be consumed by the evaluation pipeline. It might also help to have a quick sync on your end to align on this.

Also, we should have a 1/2 example just to even run the code (without running the real plan/execute). Even if it is a dummy, it may help. But I will leave the decision up to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants