Merge pull request #9 from AvdLee/learnings-daily-macos-cocoapods

AvdLee · web-flow · commit ec90d3df6bbe · 2026-03-25T11:32:55.000+01:00
Incorporate learnings from Daily macOS (ObjC + CocoaPods) test case
diff --git a/.github/scripts/sync-readme.js b/.github/scripts/sync-readme.js
@@ -15,43 +15,12 @@ const SKILL_DIRS = [
 const beginMarker = "<!-- BEGIN SKILL STRUCTURE -->";
 const endMarker = "<!-- END SKILL STRUCTURE -->";
 
-const describeReference = (fileName) => {
-  const descriptions = {
-    "benchmarking-workflow.md": "Benchmark contract, clean vs incremental rules, and artifact expectations",
-    "code-compilation-checks.md": "Swift compile hotspot checks and code-level heuristics",
-    "project-audit-checks.md": "Build setting, script phase, and dependency audit checklist",
-    "spm-analysis-checks.md": "Package graph, plugin overhead, and module variant review guide",
-    "orchestration-report-template.md": "Prioritization, approval, and verification report template",
-    "fix-patterns.md": "Concrete before/after patterns for each fix category",
-  };
-  return descriptions[fileName] || "Reference file";
-};
-
 const buildTree = () => {
-  const lines = [
-    "xcode-build-optimization-agent-skill/",
-    "  .claude-plugin/",
-    "    marketplace.json",
-    "    plugin.json",
-    "  references/",
-    "    benchmark-artifacts.md",
-    "    build-optimization-sources.md",
-    "    build-settings-best-practices.md",
-    "    recommendation-format.md",
-    "  schemas/",
-    "    build-benchmark.schema.json",
-    "  scripts/",
-    "    benchmark_builds.py",
-    "    diagnose_compilation.py",
-    "    generate_optimization_report.py",
-    "    render_recommendations.py",
-    "    summarize_build_timing.py",
-    "  skills/",
-  ];
+  const lines = ["skills/"];
 
   for (const skillDir of SKILL_DIRS) {
-    lines.push(`    ${skillDir}/`);
-    lines.push("      SKILL.md");
+    lines.push(`  ${skillDir}/`);
+    lines.push("    SKILL.md");
     const referencesDir = path.join(SKILLS_ROOT, skillDir, "references");
     if (!fs.existsSync(referencesDir)) {
       continue;
@@ -63,9 +32,9 @@ const buildTree = () => {
     if (references.length === 0) {
       continue;
     }
-    lines.push("      references/");
+    lines.push("    references/");
     for (const fileName of references) {
-      lines.push(`        ${fileName} - ${describeReference(fileName)}`);
+      lines.push(`      ${fileName}`);
     }
   }
 
diff --git a/OPTIMIZATION-CHECKS.md b/OPTIMIZATION-CHECKS.md
@@ -45,6 +45,14 @@ The `xcode-project-analyzer` audits project-level and target-level build setting
 - [Xcode Release Notes: Compilation Caching](https://developer.apple.com/documentation/xcode-release-notes/) (feature ID 149700201)
 - [Bitrise: Xcode Compilation Cache FAQ](https://docs.bitrise.io/en/bitrise-build-cache/build-cache-for-xcode/xcode-compilation-cache-faq.html)
 
+### CocoaPods Projects
+
+CocoaPods is deprecated. When a project uses CocoaPods, do not attempt CocoaPods-specific build optimizations (linkage mode changes, `COCOAPODS_PARALLEL_CODE_SIGN`, Podfile tweaks). These are unreliable and frequently regress build times.
+
+Recommend migrating to Swift Package Manager as the highest-impact long-term improvement. SPM advantages for build time: compilation caching works out of the box, better build parallelism from the dependency graph, no `pod install` xcconfig regeneration overhead, and native Xcode integration with full support for modern features like explicit modules.
+
+Focus analysis on first-party targets and build settings the project controls directly. Do not audit `Pods.xcodeproj` or the Podfile.
+
 ## Script Phase Analysis
 
 The `xcode-project-analyzer` inspects every Run Script phase in the project for missing metadata and unnecessary execution.
@@ -97,7 +105,7 @@ Even with no source edits, incremental builds incur fixed overhead. The agent me
 | `CopySwiftLibs` | Copies Swift standard libraries | Runs even when nothing changed |
 | `RegisterWithLaunchServices` | Registers the built app | Fast but always present |
 | `ProcessInfoPlistFile` | Re-processes Info.plist files | Scales with target count |
-| `ExtractAppIntentsMetadata` | Extracts App Intents metadata | Unnecessary overhead if the project does not use App Intents |
+| `ExtractAppIntentsMetadata` | Extracts App Intents metadata from all targets including third-party dependencies | Driven by Xcode, not by per-target project settings; unnecessary overhead if the project does not use App Intents but not cleanly suppressible from the repo (classify as `xcode-behavior`) |
 
 A zero-change build above 5 seconds on Apple Silicon typically indicates script phase overhead or excessive codesigning.
 
diff --git a/references/recommendation-format.md b/references/recommendation-format.md
@@ -8,13 +8,23 @@ Each recommendation should include:
 
 - `title`
 - `wait_time_impact` -- plain-language statement of expected wall-clock impact, e.g. "Expected to reduce your clean build by ~3s", "Reduces parallel compile work but unlikely to reduce build wait time", or "Impact on wait time is uncertain -- re-benchmark to confirm"
+- `actionability` -- classifies how fixable the issue is from the project (see values below)
 - `category`
 - `observed_evidence`
 - `estimated_impact`
 - `confidence`
 - `approval_required`
 - `benchmark_verification_status`
 
+### Actionability Values
+
+Every recommendation must include an `actionability` classification:
+
+- `repo-local` -- Fix lives entirely in project files, source code, or local configuration. The developer can apply it without side effects outside the repo.
+- `package-manager` -- Requires CocoaPods or SPM configuration changes that may have broad side effects (e.g., linkage mode, dependency restructuring). These should be benchmarked before and after.
+- `xcode-behavior` -- Observed cost is driven by Xcode internals and is not suppressible from the project. Report the finding for awareness but do not promise a fix.
+- `upstream` -- Requires changes in a third-party dependency or external tool. The developer cannot fix it locally.
+
 ## Suggested Optional Fields
 
 - `scope`
@@ -32,6 +42,7 @@ Each recommendation should include:
     {
       "title": "Guard a release-only symbol upload script",
       "wait_time_impact": "Expected to reduce your incremental build by approximately 6 seconds.",
+      "actionability": "repo-local",
       "category": "project",
       "observed_evidence": [
         "Incremental builds spend 6.3 seconds in a run script phase.",
@@ -54,11 +65,12 @@ When rendering for human review, preserve the same field order:
 
 1. title
 2. wait-time impact
-3. observed evidence
-4. estimated impact
-5. confidence
-6. approval required
-7. benchmark verification status
+3. actionability
+4. observed evidence
+5. estimated impact
+6. confidence
+7. approval required
+8. benchmark verification status
 
 That makes it easier for the developer to approve or reject specific items quickly.
 
diff --git a/scripts/generate_optimization_report.py b/scripts/generate_optimization_report.py
@@ -395,6 +395,7 @@ def _section_recommendations(recommendations: Optional[Dict[str, Any]]) -> str:
         lines.append(f"### {i}. {title}\n")
         for field, label in [
             ("wait_time_impact", "Wait-Time Impact"),
+            ("actionability", "Actionability"),
             ("category", "Category"),
             ("observed_evidence", "Evidence"),
             ("estimated_impact", "Impact"),
@@ -427,8 +428,10 @@ def _section_approval(recommendations: Optional[Dict[str, Any]]) -> str:
         wait_impact = item.get("wait_time_impact", "")
         impact = item.get("estimated_impact", "")
         risk = item.get("risk_level", "")
+        actionability = item.get("actionability", "")
         impact_str = wait_impact if wait_impact else impact
-        lines.append(f"- [ ] **{i}. {title}** -- Impact: {impact_str} | Risk: {risk}")
+        actionability_str = f" | Actionability: {actionability}" if actionability else ""
+        lines.append(f"- [ ] **{i}. {title}** -- Impact: {impact_str}{actionability_str} | Risk: {risk}")
     return "\n".join(lines)
 
 
diff --git a/skills/xcode-build-fixer/SKILL.md b/skills/xcode-build-fixer/SKILL.md
@@ -100,6 +100,46 @@ Before applying version pin changes:
    ```
 6. Compare post-change medians to the baseline and report deltas.
 
+## Evaluating Regressions
+
+Not every slower number is a true regression. The fixer must evaluate the full picture before recommending a revert.
+
+### Compilation caching trade-off
+
+A change like `COMPILATION_CACHING = YES` may make a standard clean build slightly slower (cache population overhead) while making cached clean builds significantly faster. Since cached clean builds reflect the realistic developer workflow (branch switching, pulling changes, Clean Build Folder with a warm cache), a slower standard clean build paired with a faster cached clean build is a net improvement, not a regression. The same logic applies to any change where the first-time cost is higher but subsequent builds benefit.
+
+### Compare all build types
+
+Evaluate standard clean, cached clean (when available), and incremental builds together. A change is only a clear regression if it makes the metrics that matter to the developer's daily workflow worse. If only the cold-start clean build is slower but the developer rarely does truly cold builds, that may be acceptable.
+
+### Per-change vs. cumulative
+
+A single change that regresses on its own may still be worth keeping if it enables a larger cumulative improvement (e.g., enabling compilation caching makes one build type slower but unlocks cache benefits for everything else).
+
+### Best-practice settings are not revert candidates
+
+Some build settings are Apple's recommended modern defaults. These should be applied and kept regardless of whether the immediate benchmark shows a measurable improvement. They are not speculative optimizations -- they align the project with Apple's toolchain direction and will compound in value over future Xcode releases. The fixer must never recommend reverting these settings based on a single benchmark result.
+
+Best-practice settings that should always be kept once applied:
+
+- `COMPILATION_CACHING = YES` -- Apple is actively investing in this; the cache improves with each Xcode release and compounds across real workflows
+- `EAGER_LINKING = YES` (Debug) -- allows the linker to overlap with compilation
+- `SWIFT_USE_INTEGRATED_DRIVER = YES` -- eliminates inter-process scheduling overhead
+- `DEBUG_INFORMATION_FORMAT = dwarf` (Debug) -- avoids unnecessary dSYM generation
+- `SWIFT_COMPILATION_MODE = singlefile` (Debug) -- incremental recompilation
+- `ONLY_ACTIVE_ARCH = YES` (Debug) -- no reason to build all architectures locally
+
+When reporting on these settings, use language like: "Applied recommended build setting. No immediate benchmark improvement measured, but this aligns with Apple's recommended configuration and positions the project for future Xcode improvements."
+
+### When to recommend revert (speculative changes only)
+
+For changes that are not best-practice settings (e.g., source refactors, linkage experiments, script phase modifications, dependency restructuring):
+
+- If the cumulative pass shows wall-clock regression across all measured build types (standard clean, cached clean, and incremental are all slower), recommend reverting all speculative changes unless the developer explicitly asks to keep specific items for non-performance reasons.
+- For each individual speculative change: if it shows no median improvement and no cached/incremental benefit either, flag it with `Recommend revert` and the measured delta.
+- Distinguish between "outlier reduction only" (improved worst-case but not median) and "median improvement" (improved typical developer wait).
+- When a change trades off one build type for another (e.g., slower standard clean but faster cached clean), present both numbers clearly and let the developer decide. Frame it as: "Standard clean builds are X.Xs slower, but cached clean builds (the realistic daily workflow) are Y.Ys faster."
+
 ## Reporting
 
 Lead with the wall-clock result in plain language:
@@ -124,6 +164,45 @@ For changes valuable for non-benchmark reasons (deterministic package resolution
 
 Note: `COMPILATION_CACHING` has been measured at 5-14% faster clean builds across tested projects (87 to 1,991 Swift files). The benefit compounds in real developer workflows where the cache persists between builds -- branch switching, pulling changes, and CI with persistent DerivedData. The benchmark script auto-detects this setting and runs a cached clean phase for validation.
 
+## Execution Report
+
+After the optimization pass is complete, produce a structured execution report. This gives the developer a clear summary of what was attempted, what worked, and what the final state is.
+
+Structure:
+
+```markdown
+## Execution Report
+
+### Baseline
+- Clean build median: X.Xs
+- Cached clean build median: X.Xs (if applicable)
+- Incremental build median: X.Xs
+
+### Changes Applied
+
+| # | Change | Actionability | Measured Result | Status |
+|---|--------|---------------|-----------------|--------|
+| 1 | Description | repo-local | Clean: X.Xs→Y.Ys, Incr: X.Xs→Y.Ys | Kept / Reverted / Blocked |
+| 2 | ... | ... | ... | ... |
+
+### Final Cumulative Result
+- Clean build median: X.Xs (was Y.Ys) -- Z.Zs faster/slower
+- Cached clean build median: X.Xs (was Y.Ys) -- Z.Zs faster/slower
+- Incremental build median: X.Xs (was Y.Ys) -- Z.Zs faster/slower
+- **Net result:** Faster / Slower / Unchanged
+
+### Blocked or Non-Actionable Findings
+- Finding: reason it could not be addressed from the repo
+```
+
+Status values:
+
+- `Kept` -- Change improved or maintained build times and was kept.
+- `Kept (best practice)` -- Change is a recommended build setting; kept regardless of immediate benchmark result.
+- `Reverted` -- Change regressed build times and was reverted.
+- `Blocked` -- Change could not be applied due to project structure, Xcode behavior, or external constraints.
+- `No improvement` -- Change compiled but showed no measurable wall-time benefit. Include whether it was kept (for non-performance reasons) or reverted.
+
 ## Escalation
 
 If during implementation you discover issues outside this skill's scope:
diff --git a/skills/xcode-build-orchestrator/SKILL.md b/skills/xcode-build-orchestrator/SKILL.md
@@ -27,6 +27,7 @@ Run this phase in agent mode because the agent needs to execute builds, run benc
 1. Collect the build target context: workspace or project, scheme, configuration, destination, and current pain point. When both `.xcworkspace` and `.xcodeproj` exist, prefer `.xcodeproj` unless the workspace contains sub-projects required for the build. Workspaces that reference external projects may fail if those projects are not checked out.
 2. Run `xcode-build-benchmark` to establish a baseline if no fresh benchmark exists. The benchmark script auto-detects `COMPILATION_CACHING = YES` and includes cached clean builds that measure the realistic developer experience (warm cache). If the build fails to compile, check `git log` for a recent buildable commit. When working in a worktree, cherry-picking a targeted build fix from a feature branch is acceptable to reach a buildable state. If SPM packages reference gitignored directories in their `exclude:` paths (e.g., `__Snapshots__`), create those directories before building -- worktrees do not contain gitignored content and `xcodebuild -resolvePackageDependencies` will crash otherwise.
 3. Verify the benchmark artifact has non-empty `timing_summary_categories`. If empty, the timing summary parser may have failed -- re-parse the raw logs or inspect them manually. If `COMPILATION_CACHING` is enabled, also verify the artifact includes `cached_clean` runs.
+   - **Benchmark confidence check**: For each build type (clean, cached clean, incremental), compare the min and max values. If the spread (max - min) exceeds 20% of the median, flag the benchmark as having high variance and recommend running additional repetitions (5+ runs) before drawing conclusions. High variance makes it difficult to distinguish real improvements from noise. After applying changes, only claim an improvement if the post-change median falls outside the baseline's min-max range.
 4. If incremental builds are the primary pain point and Xcode 16.4+ is available, recommend the developer enable **Task Backtraces** (Scheme Editor > Build tab > Build Debugging > "Task Backtraces"). This reveals why each task re-ran, which is critical for diagnosing unexpected replanning or input invalidation. Include any Task Backtrace evidence in the analysis.
 5. Determine whether compile tasks are likely blocking wall-clock progress or just consuming parallel CPU time. Compare the sum of all timing-summary category seconds against the wall-clock median: if the sum is 2x+ the median, most work is parallelized and compile hotspot fixes are unlikely to reduce wait time. If `SwiftCompile`, `CompileC`, `SwiftEmitModule`, or `Planning Swift module` dominate the timing summary **and** appear likely to be on the critical path, run `diagnose_compilation.py` to capture type-checking hotspots. If they are parallelized, still run diagnostics but label findings as "parallel efficiency improvements" rather than "build time improvements."
 6. Run the specialist analyses that fit the evidence by reading each skill's SKILL.md and applying its workflow:
@@ -104,7 +105,7 @@ Lead with the wall-clock result in plain language, e.g.: "Your clean build now t
 - absolute and percentage wall-clock deltas
 - what changed
 - what was intentionally left unchanged
-- confidence notes if noise prevents a strong conclusion
+- confidence notes if noise prevents a strong conclusion -- if benchmark variance is high (min-to-max spread exceeds 20% of median), say so explicitly rather than presenting noisy numbers as definitive improvements or regressions
 - if cumulative task metrics improved but wall-clock did not, say plainly: "Compiler workload decreased but build wait time did not improve. This is expected when Xcode runs these tasks in parallel with other equally long work."
 - a ready-to-paste community results row and a link to open a PR (see the report template)
 
diff --git a/skills/xcode-build-orchestrator/references/orchestration-report-template.md b/skills/xcode-build-orchestrator/references/orchestration-report-template.md
@@ -90,14 +90,33 @@ After implementing approved changes, re-benchmark with the same inputs:
 Compare the new wall-clock medians against the baseline. Report results as:
 "Your [clean/incremental] build now takes X.Xs (was Y.Ys) -- Z.Zs faster/slower."
 
-## Verification (post-approval)
+## Execution Report (post-approval)
 
+### Baseline
+- Clean build median: X.Xs
+- Cached clean build median: X.Xs (if applicable)
+- Incremental build median: X.Xs
+
+### Changes Applied
+
+| # | Change | Actionability | Measured Result | Status |
+|---|--------|---------------|-----------------|--------|
+| 1 | Description of change | repo-local | Clean: X.Xs→Y.Ys, Incr: X.Xs→Y.Ys | Kept / Reverted / Blocked |
+| 2 | ... | ... | ... | ... |
+
+Status values: `Kept`, `Kept (best practice)`, `Reverted`, `Blocked`, `No improvement`
+
+### Final Cumulative Result
 - Post-change clean build: X.Xs (was Y.Ys) -- Z.Zs faster/slower
 - Post-change cached clean build: X.Xs (was Y.Ys) -- Z.Zs faster/slower (when COMPILATION_CACHING enabled)
 - Post-change incremental build: X.Xs (was Y.Ys) -- Z.Zs faster/slower
+- **Net result:** Faster / Slower / Unchanged
 - If cumulative task metrics improved but wall-clock did not: "Compiler workload decreased but build wait time did not improve. This is expected when Xcode runs these tasks in parallel with other equally long work."
 - If standard clean builds are slower but cached clean builds are faster: "Standard clean builds show overhead from compilation cache population. Cached clean builds (the realistic developer workflow) are faster, confirming the net benefit."
 
+### Blocked or Non-Actionable Findings
+- Finding: reason it could not be addressed from the repo
+
 ## Remaining follow-up ideas
 - Item:
 - Why it was deferred:
diff --git a/skills/xcode-project-analyzer/SKILL.md b/skills/xcode-project-analyzer/SKILL.md
diff --git a/skills/xcode-project-analyzer/references/project-audit-checks.md b/skills/xcode-project-analyzer/references/project-audit-checks.md