Skip to content

feat(slides,drive): add createFromJson, drive primitives, and theme system#348

Open
n0012 wants to merge 3 commits into
gemini-cli-extensions:mainfrom
n0012:feat/slides-create-from-json
Open

feat(slides,drive): add createFromJson, drive primitives, and theme system#348
n0012 wants to merge 3 commits into
gemini-cli-extensions:mainfrom
n0012:feat/slides-create-from-json

Conversation

@n0012
Copy link
Copy Markdown

@n0012 n0012 commented Apr 25, 2026

What this adds

Five new tools for building Google Slides presentations programmatically, plus three new Drive primitives for safe-by-default image staging.


slides.createFromJson — blueprint-to-slides in one call

Callers describe a deck as a JSON blueprint; the server translates it into a Slides API batchUpdate. No knowledge of raw API shape required.

Color aliases — named colors, never RGB:

Alias Value Use
text #202124 near-black body text
primary #101828 dark dark backgrounds
primary_text #FFFFFF white text on dark
blue #1A73E8 Google Blue accents, labels
red/yellow/green Google brand colors brand bar
surface #F1F3F4 light gray card backgrounds
text_muted #757575 gray secondary text

Theme system — 12 named themes (google, exec, pitch, technical, workshop, dark, demo, hcls, customer, simple, google-dark, google-minimal) drive font family, accent color, and layout guidance in the tool description.

Speaker notes — include "speaker_notes" in each slide object and they're written automatically. Tool warns when notes are missing and requests a second pass.

Layer ordering — elements render shapes → images → text, then by layer value. Background shapes reliably appear behind text without manual sequencing.

Blueprint format:

{ "slides": [
    { "speaker_notes": "...", "elements": [...] },
    { "speaker_notes": "...", "elements": [...] }
  ]
}

Element schema: type (text | shape | image), position ({x,y,w,h} in points on 720×405 canvas), layer (z-index), content, url, and a style object with: size, bold, color, bg_color, no_border, align, vertical_align.


Drive primitives — split for safe-by-default uploads

The Slides API's createImage endpoint requires a publicly accessible URL (per Google's docs) — OAuth tokens in URLs are not honored. To support image-heavy workflows without making the upload tool itself dangerous, the share lifecycle is split across three tools:

  • drive.uploadFile — uploads a local file to Drive. File is PRIVATE by default (no share granted). Returns id, name, webViewLink.
  • drive.addPublicAccess — explicit opt-in: grants anyone:reader on an existing file, returns the public imageUrl and the permission ID. Surfaces Workspace publishOutNotPermitted clearly so callers can fall back to another hosting path (GCS signed URLs, etc.).
  • drive.removePublicAccess — revokes every anyone:* permission on a file. Idempotent. File stays in Drive — only the public link is closed.

Typical use for embedding a local image in a slide:

1. id       = drive.uploadFile(localPath)            # private
2. imageUrl = drive.addPublicAccess(id)              # explicit opt-in
3. slides.createFromJson(..., url: imageUrl, ...)
4. drive.removePublicAccess(id)                       # close the window

Supporting tools

  • slides.create — create a blank presentation, returns {presentationId, url}
  • slides.batchUpdate — raw Slides API request array passthrough
  • slides.getText / getMetadata / getImages / getSlideThumbnail — read tools
  • slides.getSpeakerNotes / updateSpeakerNotes — read and write speaker notes per slide

Design notes

  • All new tools registered via server.registerTool and gated through feature-config.ts. slides.write and drive.write groups carry the new tools with correct scope requirements.
  • Color alias system replaces raw RGB so output stays consistent with the active theme.
  • Speaker notes written inline from blueprint (no second pass required unless omitted).
  • Drive upload split into three primitives so the safe default is private; sharing is always explicit and reversible.

Validation

  • TypeScript build passes (npm run build)
  • Full round-trip exercised end-to-end on both personal and corporate Google accounts: presentation creation, multi-slide blueprints with shapes/text/images, color aliases, speaker notes, image upload + share + revoke.
  • On corporate Workspace domains where publishOutNotPermitted blocks addPublicAccess, the error is clearly surfaced so callers can route images through an alternative public host.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new Google Slides tools for creating presentations, performing batch updates, and generating slides from JSON blueprints. The review feedback highlights the need for consistency by using the registerTool wrapper to respect feature flags and suggests enhancing input schemas to support structured objects. Additionally, the feedback addresses a potential crash in the createFromJson service method and recommends adjusting the slide insertion logic to append new slides to the end of a presentation.

slidesService.getSlideThumbnail,
);

server.registerTool(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The slides.create tool is being registered using server.registerTool directly, which bypasses the registerTool wrapper defined on line 154. This wrapper is responsible for checking if the tool is enabled via feature flags (WORKSPACE_FEATURE_OVERRIDES). Using the wrapper ensures consistency and allows users to disable these tools if needed.

Suggested change
server.registerTool(
registerTool(

);

server.registerTool(
'slides.batchUpdate',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to slides.create, this tool should be registered using the registerTool wrapper to respect feature flags.

  registerTool(

});

server.registerTool(
'slides.createFromJson',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This tool should also use the registerTool wrapper to ensure it can be managed via feature flags.

  registerTool(

Comment on lines +482 to +486
requests: z
.string()
.describe(
'JSON string of an array of Slides API request objects (e.g., [{"createSlide":{}}, {"createShape":{...}}]). Will be parsed server-side.',
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requests field is restricted to a string, but the SlidesService.batchUpdate implementation (line 329) and most MCP clients support passing structured arrays directly. Allowing both a JSON string and an array of objects provides a better experience for AI agents.

Suggested change
requests: z
.string()
.describe(
'JSON string of an array of Slides API request objects (e.g., [{"createSlide":{}}, {"createShape":{...}}]). Will be parsed server-side.',
),
requests: z
.union([z.string(), z.array(z.any())])
.describe(
'An array of Slides API request objects or a JSON string of that array (e.g., [{"createSlide":{}}, {"createShape":{...}}]).',
),

Comment on lines +602 to +606
slideJson: z
.string()
.describe(
'JSON string of the slide blueprint. Use {"slides":[{"elements":[...]},...]} for multiple slides or {"elements":[...]} for one slide. Will be parsed server-side.',
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The slideElementSchema defined on lines 493-591 is currently unused. It should be applied to the slideJson input schema to provide the AI agent with structured validation and clear documentation of the expected blueprint format. Additionally, allowing both objects and strings makes the tool more robust.

        slideJson: z
          .union([
            z.object({
              slides: z.array(z.object({ elements: z.array(slideElementSchema) })),
            }),
            z.object({
              elements: z.array(slideElementSchema),
            }),
            z.string(),
          ])
          .describe(
            'The slide blueprint. Use {"slides":[{"elements":[...]}]} for multiple slides or {"elements":[...]} for one slide. Can be a JSON string or object.',
          ),

Comment on lines +679 to +681
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides
: [{ elements: (slideJson as any).elements || [] }];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the slides format is used but an individual slide object is missing the elements property (e.g., { "slides": [{}] }), slideDefs[i].elements will be undefined. This will cause a crash in buildSlideRequests when it attempts to spread or iterate over elements (line 421).

Suggested change
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides
: [{ elements: (slideJson as any).elements || [] }];
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides.map((s: any) => ({ ...s, elements: s.elements || [] }))
: [{ elements: (slideJson as any).elements || [] }];

requests.push({
createSlide: {
objectId: slideId,
insertionIndex: i + 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding insertionIndex: i + 1 causes new slides to always be inserted at the beginning of the presentation (after the first slide). For an 'add slides' tool, the expected behavior is usually to append slides to the end. Omitting insertionIndex entirely will cause the Slides API to append the new slides to the end of the presentation.

Suggested change
insertionIndex: i + 1,
slideLayoutReference: { predefinedLayout: 'BLANK' },

@n0012
Copy link
Copy Markdown
Author

n0012 commented Apr 25, 2026

test with gemini extensions install https://github.com/n0012/workspace

new tool that can batch create slides is slides.createFromJson

@n0012
Copy link
Copy Markdown
Author

n0012 commented Apr 29, 2026

Bug fix bundled in this PR: docs.getText field mask error

While testing this PR I hit a systematic breakage in docs_getText (and writeText/replaceText) affecting all Google Docs — including single-tab docs with no suggestions. Root cause and fix:

Root cause: docs.documents.get with includeTabsContent: true and a wildcard field mask (tabs, tabs.documentTab.body, etc.) causes the API to validate the mask as including suggestion/comment sub-fields (suggestedInsertionIds, suggestedParagraphStyleChanges, etc.). The API then rejects with:

Field mask cannot retrieve comment-specific fields when include_comments is false.

A previous attempt added suggestionsViewMode: 'PREVIEW_WITHOUT_SUGGESTIONS' to suppress suggestion data, but this doesn't help — the field mask is validated before the view-mode filter is applied.

Fix (commit 02b3502): Replaced all three documents.get calls in DocsService with explicit field masks (DOCS_READ_FIELDS, DOCS_END_INDEX_FIELDS) that enumerate only the fields _readStructuralElement actually reads — no suggestion or comment sub-fields anywhere in the tree. Handles up to 3 levels of tab nesting (Google Docs maximum) and one level of table nesting.

@n0012 n0012 force-pushed the feat/slides-create-from-json branch from 02b3502 to abf1ecd Compare May 9, 2026 19:10
@n0012 n0012 changed the title feat(slides): add slides.createFromJson — agent-friendly blueprint-to-slides tool feat(slides,drive): add createFromJson, insertImageSlide, uploadFile, and theme system May 9, 2026
@n0012 n0012 force-pushed the feat/slides-create-from-json branch from abf1ecd to 7379d66 Compare May 9, 2026 19:17
@n0012
Copy link
Copy Markdown
Author

n0012 commented May 9, 2026

@allenhutchison — would appreciate a review when you get a chance! This adds slides.createFromJson, slides.insertImageSlide, and drive.uploadFile — tools we've been using in production with Claude Code + Gemini CLI for building AI-generated slide decks. CLA is green. Happy to address any feedback.

… and theme system

## slides.createFromJson
Agent-friendly blueprint-to-slides tool. Agents describe slides as JSON;
the server translates to Slides API batchUpdate in one round trip.

- Color alias system: named colors (blue, red, green, yellow, text, text_muted,
  primary, primary_text, background, surface, secondary) → Google brand RGB values.
  Agents never need to specify RGB directly.
- Theme system: 12 named themes (google, exec, pitch, technical, workshop, dark,
  demo, hcls, customer, simple, google-dark, google-minimal) drive font, accent
  color, and footer guidance.
- Speaker notes: include "speaker_notes" in each slide object → written automatically.
  Tool description warns when notes are missing and prompts a second pass.
- Layer ordering: shapes render before images before text, then by layer value.
  Background shapes reliably appear behind text without manual sequencing.
- Auto-deletes default blank slide "p" created by Google on new presentations.
- Sanitizes template placeholder URLs from LLM output (replaces with info icon).
- Addresses review feedback: uses server.registerTool, registered in feature-config,
  slide insertion appends to end by default.

## slides.insertImageSlide
Inserts a local image as a full-bleed slide. Handles the full lifecycle:
upload to Drive → OAuth-embedded URL (file stays private) → createImage via
batchUpdate → delete Drive file. No manual Drive sharing required.
Optional label chip rendered in top-right corner.

## drive.uploadFile
Uploads a local file to Drive. Returns fileId and an OAuth-embedded imageUrl
suitable for use in slides.createFromJson image elements. File stays private —
access token embedded in URL so Slides API can fetch without public sharing.

## slides.create / slides.batchUpdate / slides.get* / slides.updateSpeakerNotes
- slides.create: create a blank presentation
- slides.batchUpdate: raw Slides API request passthrough
- slides.getText / getMetadata / getImages / getSlideThumbnail: read tools
- slides.getSpeakerNotes / updateSpeakerNotes: read and write speaker notes

## feature-config.ts
- drive.uploadFile added to drive write group
- slides read group: getSpeakerNotes added
- slides write group: create, batchUpdate, createFromJson, updateSpeakerNotes,
  insertImageSlide all registered (defaultEnabled: false, requires opt-in)
@n0012 n0012 force-pushed the feat/slides-create-from-json branch from 7379d66 to 119b16a Compare May 10, 2026 03:19
n0012 added 2 commits May 19, 2026 05:47
drive.uploadFile previously granted anyone:reader on every upload as a
convenience for the Slides API workflow. This violates least-privilege:
files become public-link-readable by default, and there is no symmetric
"close the share" primitive.

Three changes:

- drive.uploadFile now uploads PRIVATE. No share is granted; response
  drops the imageUrl field (file is not fetchable without further action).

- drive.addPublicAccess (new): grants anyone:reader on an existing file
  and returns the public imageUrl. Explicit opt-in. Returns the
  permission ID for symmetric revocation. Surfaces the Workspace
  publishOutNotPermitted policy clearly so callers can fall back to
  GCS staging or another host.

- drive.removePublicAccess (new): revokes every anyone:* permission on
  a file. Idempotent (returns empty list if none exist). File stays in
  Drive — only the public link is closed.

Callers that used the old uploadFile-grants-share behavior should now
call uploadFile + addPublicAccess together, and pair every
addPublicAccess with removePublicAccess when the share is no longer
needed.
The bundled-lifecycle tool is redundant with the new Drive primitives.
The same outcome — local image → full-bleed slide — is now composable
from four small explicit calls:

  drive.uploadFile      (private upload)
  drive.addPublicAccess (explicit share)
  slides.createFromJson (image element)
  drive.removePublicAccess (close the share)

In addition, the old insertImageSlide implementation embedded an OAuth
access token in the Drive download URL, but the Slides API rejects
authenticated URLs for createImage (publicly accessible URL required,
per https://developers.google.com/workspace/slides/api/guides/add-image).
So the tool was effectively non-functional anyway.

Removing reduces surface area, eliminates a broken primitive, and keeps
the public API consistent: every share is explicit, every share is
reversible.
@n0012 n0012 changed the title feat(slides,drive): add createFromJson, insertImageSlide, uploadFile, and theme system feat(slides,drive): add createFromJson, drive primitives, and theme system May 19, 2026

// Speaker notes tools — approach adapted from PR #235
// https://github.com/gemini-cli-extensions/workspace/pull/235 by @stefanoamorelli
server.registerTool(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] These five new tools (this one plus slides.updateSpeakerNotes at 495, slides.create at 515, slides.batchUpdate at 527, and slides.createFromJson at 652) call server.registerTool(...) directly, but every other tool in this file goes through the wrapped registerTool(...) helper defined at lines 169–182. The wrapper is what honors enabledTools from feature-config — without it, slides.write will register regardless of defaultEnabled: false. Should be a s/server.registerTool/registerTool/ on all five.

}

// Delete the default blank slide ("p") that Google creates with new presentations
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] Two concerns on this delete-'p' block: (1) createFromJson takes an arbitrary presentationId, so if a caller appends to an existing deck and 'p' happens to be a real slide, we silently delete it; (2) the catch {} at 869 swallows everything — auth revocation, quota, 5xx — with no log.

Could we gate the delete on an explicit isNewPresentation flag (or fetch the slide list first and only delete if 'p' exists), and at minimum log unexpected catches? Silent catches are something we try to avoid repo-wide.

],
};
} catch (error) {
return this.handleError('drive.addPublicAccess', error);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] The PR description calls out publishOutNotPermitted as a feature surface, but here we fall through to the generic handleError that just stringifies error.message — the caller gets an opaque "insufficient permissions" string with no signal that it's org policy. Could we detect error.errors?.[0]?.reason === 'publishOutNotPermitted' (and cannotShareOutsideDomain) before the generic path and return a structured response with an actionable hint?

}

// Bold phrases
if (style.bold_phrases) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] content.indexOf("", n) returns n for every position, so an empty phrase runs content.length + 1 times and emits invalid updateTextStyle requests with startIndex === endIndex — Slides API rejects the whole batchUpdate. Same shape applies to the links loop further down. Either if (!phrase) continue; at the top of each loop, or add .min(1) to the schema.

accent4?: RGB;
}

const THEMES: Record<string, Theme> = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking] The PR description advertises 12 themes (exec, pitch, technical, workshop, dark, demo, hcls, customer, simple, google-dark, google-minimal) but I only see google defined here. THEMES['exec'] returns undefined and createFromJson hardcodes THEMES['google'] so the others are unreachable anyway. Either ship the missing themes or pare the description and tool docs back to what actually exists. Also are we sure we want to ship a "Google" theme in a public extension?

};
}
const removed: string[] = [];
for (const p of anyonePerms) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] If permission #3 of 5 fails mid-loop (race, revoked auth), we throw and removed never gets returned — caller can't tell which perms were actually deleted, which breaks the "idempotent" contract on retry. Could each delete be in its own try/catch, collecting {removed, failed} in the response?

// Sanitize URLs that contain unresolved template placeholders (e.g. from LLM output)
let imageUrl = el.url ?? '';
if (imageUrl.includes('{') || imageUrl.includes('%7B')) {
imageUrl = 'https://img.icons8.com/m_rounded/512/4285F4/info.png';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] Substituting an icons8 icon when the URL contains { is a reasonable fallback, but please add a warnings: [{slideIndex, elementIndex, issue: "unresolved url placeholder, substituted fallback"}] entry on the response so the caller knows it happened. Right now a malformed blueprint silently fills the deck with info icons.

content: [
{
type: 'text' as const,
text: JSON.stringify({ error: errorMessage }),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] Error returns from the new SlidesService methods (this create plus batchUpdate and createFromJson) don't set isError: true on the MCP response — DriveService.handleError does this correctly at line 47. MCP clients use that flag to distinguish failure from success, so without it these look like successful responses that happen to contain an error string.

return `${prefix}_${Date.now()}_${objCounter.value}`;
};

// Sort: shapes first, then images, then text; within each group by layer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Comment says "shapes first, then images, then text", but the sort key layerVal * 10 + typeVal puts layer first with type as the tiebreaker — the opposite. The behavior matches the tool description's "lower layers render first" promise, so the code is right; just the inline comment is misleading.

}
}

// Bold until (legacy)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] // Bold until (legacy) — calling it "legacy" is confusing since it's brand new in this PR. Drop the tag, or explain why it's deprecated on arrival?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants