Skip to content

Quality regression: generated visualizations are subpar and broken #58

@jerelvelarde

Description

@jerelvelarde

Problem

The quality of generated visualizations from the deployed agent has degraded. The same prompt ("sphere icosahedron morph") produces broken or low-quality outputs across multiple attempts.

Evidence

Two downloaded outputs from the same prompt attached as .html files in .chalk/:

Attempt 1 — CSS 3D divs (sphere-icosahedron-morph (1).html)

  • Uses 20 <div> elements with CSS transform-style: preserve-3d and clip-path transitions
  • No WebGL / Three.js despite it being available via the import map
  • Hover interaction conflicts with the CSS spin animation (style.animation = 'none' vs CSS animation: spin 10s)
  • The morph is cosmetic — just toggling border-radius: 50%clip-path: polygon() on flat divs
  • Faces positioned via manual JS math that approximates 3D but doesn't actually render proper geometry

Attempt 2 — Canvas 2D fake 3D (sphere-icosahedron-morph.html)

  • Uses canvas.getContext('2d') instead of WebGL
  • Sphere is just a radial gradient circle, not actual geometry
  • Icosahedron faces are projected triangles drawn with ctx.moveTo/lineTo — no real lighting
  • Uses color-mix() CSS which has mixed browser support
  • The "morph" is interpolating between a gradient blob and wireframe triangles — visually unconvincing

What a good output would look like

  • Use Three.js (available in the import map at https://esm.sh/three)
  • Proper IcosahedronGeometry with MeshStandardMaterial or custom shaders
  • Real WebGL lighting and smooth vertex-level morphing between sphere and icosahedron
  • Orbit controls or smooth auto-rotation

Possible causes

  1. Model regression — GPT-5.4 (gpt-5.4-2026-03-05) may be producing lower-quality code for complex 3D visualizations compared to earlier versions
  2. System prompt lacks quality guidance — The current prompt mentions widgetRenderer capabilities but doesn't guide the model toward using Three.js/WebGL for 3D content, or set quality expectations for interactive visualizations
  3. No few-shot examples — The agent has no reference for what "good" output looks like, so it falls back to simpler CSS/Canvas approaches

Suggested investigation

  • Compare output quality between GPT-5.4 and other models (e.g. Claude) for the same prompts
  • Add quality guidance to the system prompt (e.g. "For 3D visualizations, use Three.js from the import map")
  • Consider adding few-shot examples of high-quality widget HTML in the agent skills
  • Test a broader set of prompts to determine if regression is model-wide or specific to 3D content

Environment

  • Model: gpt-5.4-2026-03-05 via langchain_openai.ChatOpenAI
  • Agent: LangGraph with CopilotKit middleware
  • Widget renderer: sandboxed iframe with import map (Three.js, GSAP, D3, Chart.js available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions