Skip to content

Commit 4fd84b5

Browse files
sjarmakclaude
andcommitted
fix: flink verifier path bug — flink-streaming-java → flink-runtime
WindowOperator.java lives in flink-runtime/ not flink-streaming-java/ in the sg-evals/flink--0cc95fcc mirror. The verifier was checking the wrong directory, causing legitimate agent work to score 0/6. Confirmed via Sourcegraph search and agent trajectory analysis (12 edits to the correct path). Also adds configs/smoke_scaffold_4.json for the 4 DOE scaffold tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a0d7471 commit 4fd84b5

File tree

4 files changed

+115
-6
lines changed

4 files changed

+115
-6
lines changed

benchmarks/ccb_fix/flink-window-late-data-fix-001/CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
Fix silent late data dropping in Flink's merging window operator side output path.
66

77
## Key Reference Files
8-
- `flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java` — main target
9-
- `flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/EvictingWindowOperator.java` — evicting variant
8+
- `flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java` — main target
9+
- `flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/EvictingWindowOperator.java` — evicting variant
1010
- `flink-runtime/src/main/java/org/apache/flink/streaming/api/windowing/assigners/MergingWindowAssigner.java` — merge base
1111
- `flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorTest.java` — tests
1212

benchmarks/ccb_fix/flink-window-late-data-fix-001/instruction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ When using event-time session windows with `allowedLateness(Time.seconds(0))` an
2020
2. **Fix the late element handling**:
2121
- Ensure late elements for merging windows check whether the element could be merged into an existing (non-expired) window BEFORE marking as late
2222
- OR ensure the side output path correctly emits the element via the registered OutputTag
23-
- The fix should be in `flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/`
23+
- The fix should be in `flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/`
2424

2525
3. **Ensure correct OutputTag wiring**:
2626
- Verify that `sideOutput()` receives the correct `OutputTag<T>` for late data
@@ -32,8 +32,8 @@ When using event-time session windows with `allowedLateness(Time.seconds(0))` an
3232
- Verify: late element appears in the side output stream, not silently dropped
3333

3434
## Key Reference Files
35-
- `flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java` — main window operator
36-
- `flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/EvictingWindowOperator.java` — evicting variant
35+
- `flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java` — main window operator
36+
- `flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/EvictingWindowOperator.java` — evicting variant
3737
- `flink-runtime/src/main/java/org/apache/flink/streaming/api/windowing/assigners/MergingWindowAssigner.java` — merging window base
3838
- `flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/SingleOutputStreamOperator.java` — sideOutputLateData API
3939
- `flink-runtime/src/main/java/org/apache/flink/streaming/api/operators/AbstractStreamOperator.java` — base operator with output handling

benchmarks/ccb_fix/flink-window-late-data-fix-001/tests/test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ SCORE=0
77
TOTAL=6
88
WORKSPACE="${VERIFY_REPO:-/workspace}"
99

10-
WINDOW_OP_DIR="$WORKSPACE/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing"
10+
WINDOW_OP_DIR="$WORKSPACE/flink-runtime/src/main/java/org/apache/flink/streaming/runtime/operators/windowing"
1111
WINDOW_TEST_DIR="$WORKSPACE/flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing"
1212

1313
# Check 1: WindowOperator.processElement modified (late element handling)

configs/smoke_scaffold_4.json

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
{
2+
"metadata": {
3+
"title": "Smoke test: 4 DOE rebalance scaffold tasks",
4+
"description": "Targeted smoke for newly scaffolded tasks (3 ccb_feature + 1 ccb_fix). Run both baseline and sg_only modes.",
5+
"generated_date": "2026-03-01",
6+
"total_tasks": 4,
7+
"note": "Use with validate_one_per_benchmark.sh --smoke-runtime (baseline) and --sg-only (MCP). Then run paired via run_selected_tasks.sh."
8+
},
9+
"methodology": {
10+
"sdlc_suites": ["ccb_feature", "ccb_fix"]
11+
},
12+
"statistics": {
13+
"total_tasks": 4,
14+
"per_suite": {
15+
"ccb_feature": 3,
16+
"ccb_fix": 1
17+
}
18+
},
19+
"tasks": [
20+
{
21+
"task_id": "postgres-copy-csv-header-feat-001",
22+
"benchmark": "ccb_feature",
23+
"sdlc_phase": "Implementation (feature)",
24+
"language": "c",
25+
"difficulty": "expert",
26+
"category": "feature_implementation",
27+
"repo": "postgres/postgres",
28+
"mcp_benefit_score": 0.88,
29+
"mcp_breakdown": {
30+
"context_complexity": 0.95,
31+
"cross_file_deps": 0.85,
32+
"semantic_search_potential": 0.85,
33+
"task_category_weight": 0.9
34+
},
35+
"selection_rationale": "DOE rebalance scaffold: C-language feature task for Neyman-optimal ccb_feature allocation (target 23)",
36+
"task_dir": "ccb_feature/postgres-copy-csv-header-feat-001",
37+
"context_length": 900000,
38+
"context_length_source": "mcp_breakdown_proxy",
39+
"files_count": 16,
40+
"files_count_source": "mcp_breakdown_proxy"
41+
},
42+
{
43+
"task_id": "servo-css-container-query-feat-001",
44+
"benchmark": "ccb_feature",
45+
"sdlc_phase": "Implementation (feature)",
46+
"language": "rust",
47+
"difficulty": "expert",
48+
"category": "feature_implementation",
49+
"repo": "servo/servo",
50+
"mcp_benefit_score": 0.89,
51+
"mcp_breakdown": {
52+
"context_complexity": 0.95,
53+
"cross_file_deps": 0.85,
54+
"semantic_search_potential": 0.9,
55+
"task_category_weight": 0.9
56+
},
57+
"selection_rationale": "DOE rebalance scaffold: Rust-language feature task for Neyman-optimal ccb_feature allocation (target 23)",
58+
"task_dir": "ccb_feature/servo-css-container-query-feat-001",
59+
"context_length": 950000,
60+
"context_length_source": "mcp_breakdown_proxy",
61+
"files_count": 16,
62+
"files_count_source": "mcp_breakdown_proxy"
63+
},
64+
{
65+
"task_id": "vscode-custom-fold-region-feat-001",
66+
"benchmark": "ccb_feature",
67+
"sdlc_phase": "Implementation (feature)",
68+
"language": "typescript",
69+
"difficulty": "hard",
70+
"category": "feature_implementation",
71+
"repo": "microsoft/vscode",
72+
"mcp_benefit_score": 0.87,
73+
"mcp_breakdown": {
74+
"context_complexity": 0.9,
75+
"cross_file_deps": 0.8,
76+
"semantic_search_potential": 0.85,
77+
"task_category_weight": 0.9
78+
},
79+
"selection_rationale": "DOE rebalance scaffold: TypeScript-language feature task for Neyman-optimal ccb_feature allocation (target 23)",
80+
"task_dir": "ccb_feature/vscode-custom-fold-region-feat-001",
81+
"context_length": 950000,
82+
"context_length_source": "mcp_breakdown_proxy",
83+
"files_count": 16,
84+
"files_count_source": "mcp_breakdown_proxy"
85+
},
86+
{
87+
"task_id": "flink-window-late-data-fix-001",
88+
"benchmark": "ccb_fix",
89+
"sdlc_phase": "Implementation (bug fix)",
90+
"language": "java",
91+
"difficulty": "hard",
92+
"category": "bug_fix",
93+
"repo": "apache/flink",
94+
"mcp_benefit_score": 0.87,
95+
"mcp_breakdown": {
96+
"context_complexity": 0.9,
97+
"cross_file_deps": 0.85,
98+
"semantic_search_potential": 0.85,
99+
"task_category_weight": 0.9
100+
},
101+
"selection_rationale": "DOE rebalance scaffold: Java-language fix task for Neyman-optimal ccb_fix allocation (target 26)",
102+
"task_dir": "ccb_fix/flink-window-late-data-fix-001",
103+
"context_length": 900000,
104+
"context_length_source": "mcp_breakdown_proxy",
105+
"files_count": 16,
106+
"files_count_source": "mcp_breakdown_proxy"
107+
}
108+
]
109+
}

0 commit comments

Comments
 (0)