Implement automated eval test suite for Angular Skills#17007
Implement automated eval test suite for Angular Skills#17007
Conversation
Add automated eval test suite for Angular Skills using the skill-eval framework: - grid-basic-setup task (igniteui-angular-grids skill) - component-combo-reactive-form task (igniteui-angular-components skill) - theming-palette-generation task (igniteui-angular-theming skill) - GitHub Actions workflow for CI integration - Baseline results JSON for regression comparison - README with local run and task authoring docs Co-authored-by: zdrawku <11193764+zdrawku@users.noreply.github.com>
Co-authored-by: zdrawku <11193764+zdrawku@users.noreply.github.com>
Co-authored-by: zdrawku <11193764+zdrawku@users.noreply.github.com>
📊 Skill Eval Results
Thresholds
|
1 similar comment
📊 Skill Eval Results
Thresholds
|
|
@copilot |
skill-eval is a reference repo, not an installable npm package. Replaced with a local run-eval.sh script that executes reference solutions and deterministic graders directly. Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Removed |
📊 Skill Eval Results
Thresholds
|
|
@copilot The skill tests passed, so the automation looks good, but the github-actions bot report in the PR show N/A for the pass rates |
…scores The workflow summary comment reads passRate and passAtK from per-task JSON files. The runner was only writing reward/status, causing N/A. Also skip baseline.json in the summary since it uses a different schema. Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Two issues caused the N/A values: (1) |
📊 Skill Eval Results
Thresholds
|
📊 Skill Eval Results
Thresholds
|
📊 Skill Eval Results
Thresholds
|
Removed installation of eval dependencies from workflow.
📊 Skill Eval Results
Thresholds
|
|
@copilot Verifications need to be performed agains a copilot agent. Use the copilot cli for this. Also add an option to verify against gemini, with the gemini-cli. The switch should be config based. I will provide the gemini API key as a secret |
📊 Skill Eval Results
Thresholds
|
Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Implemented in The eval runner now supports agent-based verification via # Run with GitHub Copilot CLI
bash run-eval.sh --all --agent copilot
# Run with Google Gemini CLI
bash run-eval.sh grid-basic-setup --agent gemini
# Multi-trial for statistical robustness
bash run-eval.sh --all --agent copilot --trials 3Agent configuration is in The CI workflow now has a Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
📊 Skill Eval Results
Thresholds
|
📊 Skill Eval Results
Thresholds
|
|
@copilot Rework the action to not trigger based on input for an agent, but to trigger against both copilot and gemini and to report results for both. |
Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Reworked in
|
|
@copilot Seems to me that there are instructions, but no actual test prompts for the agents. Review the whole implementation because there have been multiple iterations and changes. In the workflows, don't use the bash commands directly, instead call them through npm commands. Clean up everything that is obsolete and add missing items if you find such. I have provided the GEMINI_API_KEY secret, so the gemini workflow should run fine. |
📊 Skill Eval ResultsGrader Validation (reference solutions)
Agent Evaluation
Thresholds
|
Co-authored-by: kdinev <1472513+kdinev@users.noreply.github.com>
Fixed in
|
📊 Skill Eval ResultsGrader Validation (reference solutions)
Agent Evaluation
Thresholds
|
evals/directory structure withpackage.jsonandREADME.mdgrid-basic-setuptask (igniteui-angular-grids skill)component-combo-reactive-formtask (igniteui-angular-components skill)theming-palette-generationtask (igniteui-angular-theming skill)skill-eval.yml)skill-evalpackage dependency with self-contained local runner (run-eval.sh)run-eval.sh(--agent copilot)run-eval.sh(--agent gemini)eval-config.jsonfor config-based agent/provider selection--trials N) for statistical robustnessprompt.mdagent prompts for each task (concise, actionable prompts sent to CLI agents)Original prompt
This section details on the original issue you should resolve
<issue_title>Implement Automated Eval Test Suite for the Angular Skills</issue_title>
<issue_description>We have three Skills (
igniteui-angular-components,igniteui-angular-grids,igniteui-angular-theming) that teach coding agents how to correctly select, configure, and compose Ignite UI for Angular components. As these skills grow in complexity and more developers rely on them, silent regressions become a real risk rewording a step, reordering routing logic, or removing a "verify" clause can quietly degrade agent behavior with no signal until a user reports a wrong output.This work item establishes a structured eval process for these skills, directly inspired by Minko Gechev's Skill Eval framework, topic and extended with patterns from Anthropic's agent eval research and the Skills Best Practices guide.
Goals
main.Approach
Tooling: Adopt the
skill-evalTypeScript framework as the eval runner. It supports Docker-isolated agent execution, deterministic shell graders, LLM rubric graders, multi-trial runs, and JSON result persistence — all the properties needed here.Task Structure
Create an
evals/directory at the repo root. Each eval task is a self-contained directory:Example:
Tasks to Implement (per Skill)
igniteui-angular-gridsskill (highest priority — most complex routing)grid-basic-setup<igx-grid>present in template; correct module imported[data]binding correctly?grid-tree-vs-flat<igx-tree-grid>present;childDataKeyconfiguredgrid-hierarchical-setup<igx-hierarchical-grid>+<igx-row-island>presentgrid-remote-filtering[filterMode]="'externalFilterMode'"set; remote service stub presentonDataPreLoad/sortingExpressionsChangeinstead of local filtering?grid-pivot-config<igx-pivot-grid>+IgxPivotConfigurationpresentrows,columns,valuescorrectly vs a flat grid with groupBy?grid-state-persistenceIgxGridStateDirectivepresent; serialize/restore calls presentigniteui-angular-componentsskillcomponent-combo-reactive-form<igx-combo>present;[formControlName]wired; module imported<select>) for multi-select?component-date-picker-validation<igx-date-picker>present;minValue/maxValuei...✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.