-
Notifications
You must be signed in to change notification settings - Fork 13
[AIENG-23] Expose the programmatic goal setting as --goal-spec
#953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I was working on a separate change but when I tried to commit that change, it failed with pre-commit, so I'm first fixing those errors.
WalkthroughThe pull request introduces several modifications across multiple files in the Launchable project. Key changes include adding a new Changes
Sequence DiagramsequenceDiagram
participant User
participant SubsetCommand
participant LaunchableServer
User->>SubsetCommand: Invoke with --goal-spec
SubsetCommand->>SubsetCommand: Construct payload
SubsetCommand->>LaunchableServer: POST request with goal spec
LaunchableServer-->>SubsetCommand: Return subset results
SubsetCommand-->>User: Display subset results
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
tests/data/playwright/record_test_result_with_json.json (1)
Line range hint
1-2409: Review test execution patterns and reliability issuesThe test results reveal several concerns that should be addressed:
Inconsistent timeout settings:
- The timeout-example tests are using an extremely short timeout of 1ms which is causing artificial failures
- This is not a realistic timeout value for web navigation tests
Retry patterns show potential flakiness:
- Multiple tests are being retried up to 3 times
- The retry-example tests consistently fail first and succeed only on the final retry
- This indicates underlying stability issues that should be investigated
Consider the following improvements:
- test.setTimeout(1); + test.setTimeout(30000); // Use a realistic timeout for web navigation
Implement proper test stability:
- Add proper wait conditions instead of relying on retries
- Consider using test fixtures for setup/teardown
- Add logging to help debug intermittent failures
- Monitor and track flaky tests separately
Standardize test configuration:
- Define standard timeout values in a shared config
- Document retry policies and when they should be used
- Consider separating slow/unstable tests into different test suites
tests/data/playwright/report.json (1)
Line range hint
4-4: Test timeout configuration needs adjustmentThe test timeout of 1ms is unrealistically short and causing consistent test failures. This appears to be intentional for demonstration purposes, but in a real test suite, this would need to be increased to a more practical value.
Consider increasing the timeout to a more realistic value:
- test.setTimeout(1); + test.setTimeout(30000); // 30 seconds is a common default timeout
🧹 Nitpick comments (3)
launchable/commands/subset.py (1)
194-194: Ensure consistent parameter ordering for readability
Adding goal_spec toward the end of the parameter list is fine, but consider rearranging the function signature to keep related parameters closer, e.g., near target/confidence/duration for a more logical grouping.tests/data/playwright/record_test_result_with_json.json (1)
Line range hint
1-10: Improve test result data structureThe JSON structure could be enhanced to better support test analytics and debugging:
Consider adding:
- Test environment information
- Test run metadata (CI build number, git commit, etc.)
- Aggregated statistics (total duration, pass/fail counts)
- Links to test artifacts (screenshots, videos, logs)
tests/data/playwright/report.json (1)
Line range hint
1-3618: Test reliability improvements neededThe test report shows several flaky tests that require multiple retries before passing. While the retry mechanism is handling these cases, it would be better to address the root causes of the flakiness.
Consider:
- Adding proper wait conditions before assertions
- Implementing more stable test selectors
- Adding logging to help debug intermittent failures
- Setting up test data in a more reliable way
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
launchable/commands/inspect/tests.py(1 hunks)launchable/commands/record/case_event.py(1 hunks)launchable/commands/subset.py(3 hunks)launchable/test_runners/flutter.py(0 hunks)launchable/test_runners/prove.py(1 hunks)launchable/utils/sax.py(1 hunks)tests/commands/test_subset.py(1 hunks)tests/data/cucumber/report/result.json(1 hunks)tests/data/flutter/record_test_result.json(1 hunks)tests/data/playwright/record_test_result.json(1 hunks)tests/data/playwright/record_test_result_with_json.json(1 hunks)tests/data/playwright/report.json(1 hunks)tests/data/playwright/report.xml(1 hunks)
💤 Files with no reviewable changes (1)
- launchable/test_runners/flutter.py
✅ Files skipped from review due to trivial changes (6)
- tests/data/playwright/report.xml
- launchable/commands/inspect/tests.py
- tests/data/cucumber/report/result.json
- tests/data/flutter/record_test_result.json
- launchable/commands/record/case_event.py
- launchable/test_runners/prove.py
🔇 Additional comments (6)
launchable/utils/sax.py (1)
7-7: Validate new dependency import for CLI error handling
Importing click here looks good for raising CLI-related exceptions in TagMatcher. This improves consistency for error reporting within the CLI context.
tests/commands/test_subset.py (1)
211-243: Good test coverage for the --goal-spec functionality
This test method thoroughly verifies how the new '--goal-spec' command-line argument is included in the payload. The usage of mocked responses and assertions on the JSON payload effectively ensures correctness.
launchable/commands/subset.py (2)
46-51: Appropriate use of click.option for --goal-spec
Defining the '--goal-spec' option with a clear help message, string type, and variable name is well-structured.
441-445: Seamless integration of goal_spec in the payload
This block correctly checks if goal_spec is set and updates the “goal” object accordingly. It aligns well with the new requirement, ensuring backward compatibility with existing subsetting logic.
tests/data/playwright/record_test_result.json (1)
1683-1683: No substantive changes detected
The final closing brace simply marks the end of the file. No action needed.
tests/data/playwright/report.json (1)
Line range hint 5-7: Verify retry mechanism effectiveness
The retry mechanism is working as designed but the test conditions make it consistently fail. The test expects "Playflight" in the title but checks against the actual Playwright website which has a different title.
Run this script to analyze the retry patterns:
✅ Verification successful
The previous command didn't yield results because we need to focus on the specific report file. Let's analyze the test failures and verify if there are any tests with "Playflight" in their titles to confirm the review comment's claim.
The error logs show that the test is failing because it's expecting "Playflight" in the title but getting "Fast and reliable end-to-end testing for modern web apps | Playwright". Let's examine the test file to understand the retry mechanism and verify if it's working as intended.
Let's try to locate the test file with a broader search pattern to understand the retry mechanism implementation.
Retry mechanism is functioning correctly but test expectations need adjustment
Based on the error logs from the test results, the retry mechanism is working as designed. The test is consistently failing because it's looking for "Playflight" in the page title but the actual Playwright website returns "Fast and reliable end-to-end testing for modern web apps | Playwright". The test logic (testInfo.retry > 2 ? "Playwright" : "Playflight") shows this is intentional behavior to demonstrate retry functionality.
- The test expects "Playflight" for the first 3 retries (retry ≤ 2)
- It expects "Playwright" after that (retry > 2)
- Error logs show consistent behavior matching this logic
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Analyze retry patterns in test results
rg -A 5 "retry.*status.*failed" | grep -E "retry|duration|status"
Length of output: 67
Script:
#!/bin/bash
# Let's search for test failures and retries in the report
rg -A 10 '"status":\s*"failed"' tests/data/playwright/report.json
# Also search for any test titles containing "Playflight"
rg '"title".*Playflight' tests/data/playwright/report.json
Length of output: 70599
Script:
#!/bin/bash
# Let's examine the test file to understand the retry logic
fd "retry.*\.spec\.ts" | xargs cat
Length of output: 36
Script:
#!/bin/bash
# Try finding the test file with a more general pattern
fd "\.spec\.ts$" | xargs cat
# Also search for retry-related test files in case they're in a different location
rg -l "retry.*test" --type ts
Length of output: 211
Konboi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I believe the goal spec document will be published later



Summary by CodeRabbit
--goal-specfor thesubsetcommand, allowing users to define programmatic goals for test subsetting.ReportParserclass for clearer feedback on file not found and JSON decoding issues.TagMatcherclass for invalid tag specifications.--goal-specoption into a JSON request payload.