Skip to content

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109

Merged
bedus-creation merged 8 commits into
mainfrom
feat/ai-image-audio
Jun 13, 2026
Merged

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109
bedus-creation merged 8 commits into
mainfrom
feat/ai-image-audio

Conversation

@bedus-creation

Copy link
Copy Markdown
Contributor

Summary

  • Image.of(prompt).generate() — text-to-image via OpenAI DALL-E 3; returns ImageResponse with raw PNG bytes
  • Image editing.attachments([Files.Image.fromPath(...)]) switches to DALL-E 2 image editing
  • Image modifiers.landscape() (1792×1024), .portrait() (1024×1792), .square(), .quality(), .model()
  • Audio.of(text).generate() — text-to-speech via OpenAI TTS; returns AudioResponse
  • Audio modifiers.female() (nova), .male() (onyx), .voice("shimmer"), .speed(), .format(), .model()
  • Files.Image — factory with .fromStorage(), .fromPath(), .fromUrl() for image attachment sources
  • Storage integrationImageResponse / AudioResponse expose async .store(), .storeAs(), .storePublicly(), .storePubliclyAs() backed by the Storage facade (with fallback to temp dir)

API examples

from fastapi_startkit.ai import Image, Audio, Files

# Text to image
image = await Image.of("A donut on a counter").generate()

# Image edit with attachments + size modifier
image = await (
    Image.of("Make this impressionist")
    .attachments([
        Files.Image.fromStorage("photo.jpg"),
        Files.Image.fromPath("/tmp/photo.jpg"),
        Files.Image.fromUrl("https://example.com/photo.jpg"),
    ])
    .landscape()
    .generate()
)

# Store image
path = await image.store()
path = await image.storeAs("result.png")
path = await image.storePublicly()
path = await image.storePubliclyAs("result.png")

# Text to speech
audio = await Audio.of("Hello world").generate()
audio = await Audio.of("Hello world").female().generate()
audio = await Audio.of("Hello world").male().generate()
audio = await Audio.of("Hello world").voice("nova").generate()

# Store audio
path = await audio.store()
path = await audio.storeAs("greeting.mp3")

Files changed

File Change
ai/files.py New — ImageAttachment, Files.Image factory
ai/image.py New — Image builder, ImageResponse
ai/audio.py New — Audio builder, AudioResponse
ai/__init__.py Updated — exports Image, Audio, Files, ImageAttachment, ImageResponse, AudioResponse
tests/ai/test_image.py New — 21 unit tests
tests/ai/test_audio.py New — 24 unit tests

Test plan

  • All 45 new unit tests pass (fully mocked, no real API calls)
  • Full AI test suite: 154 tests pass, 0 failures

🤖 Generated with Claude Code

Implements a Laravel-style fluent API for image generation (DALL-E 3),
image editing (DALL-E 2 with attachments), and text-to-speech (OpenAI TTS).

- `Image.of(prompt)` — text-to-image via DALL-E 3; `.landscape()`, `.portrait()`,
  `.square()`, `.quality()`, `.model()` modifiers; `.attachments([…])` switches
  to DALL-E 2 image editing
- `Audio.of(text)` — TTS via OpenAI; `.female()` / `.male()` voice shortcuts,
  `.voice()`, `.speed()`, `.format()`, `.model()` modifiers
- `Files.Image.fromStorage/fromPath/fromUrl` — image attachment factories
- `ImageResponse` / `AudioResponse` — async `.store()`, `.storeAs()`,
  `.storePublicly()`, `.storePubliclyAs()` backed by Storage facade
- 45 new unit tests (all mocked, no real API calls)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +164 to +174
def _generate_sync(self) -> AudioResponse:
client = OpenAI(api_key=self._resolve_api_key())
response = client.audio.speech.create(
model=self._model,
voice=self._voice,
input=self._text,
speed=self._speed,
response_format=self._response_format,
)
data = response.read()
return AudioResponse(data=data, fmt=self._response_format)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use langchain or other library so we can support multiple provider ?

Comment on lines +9 to +23
class ImageAttachment:
"""Represents an image file to attach to an Image editing request.

Instances are created via the :class:`Files.Image` factory, not directly::

attachment = Files.Image.fromPath("/tmp/photo.jpg")
attachment = Files.Image.fromStorage("photo.jpg")
attachment = Files.Image.fromUrl("https://example.com/photo.jpg")
"""

def __init__(
self,
data: bytes,
name: str = "",
media_type: str = "image/jpeg",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have Document class in AI, can't we use that ?

Comment on lines +190 to +200
def _create(self) -> ImageResponse:
"""Generate a new image from a text prompt."""
client = OpenAI(api_key=self._resolve_api_key())
params: dict = {
"model": self._model,
"prompt": self._prompt,
"size": self._size,
"n": self._n,
"response_format": "b64_json",
}
if self._model == "dall-e-3":

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use other package so we can support multiple provider ?

assert isinstance(audio, Audio)
assert audio._text == "Hello world"


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write class based test ?

bedus-creation and others added 2 commits June 10, 2026 17:07
… class-based tests

Comment 1 — Document extended for binary image attachments:
- content field now accepts str | bytes
- from_path() auto-detects binary (UnicodeDecodeError fallback to rb mode)
- New async from_url() downloads bytes via httpx
- New async from_storage() reads binary via Storage facade (or direct path)
- New to_bytes() returns binary content regardless of how it was loaded
- files.py (ImageAttachment/Files) no longer exported; Document is the single type

Comment 2 — Multi-provider support for Image:
- New ai/image_providers.py: ImageGenerationProvider ABC,
  OpenAIImageProvider (AsyncOpenAI), StabilityImageProvider (stub)
- Image.generate() is now truly async via provider abstraction
- Provider resolved from AIConfig.image_provider (AI_IMAGE_PROVIDER env var)

Comment 3 — Multi-provider support for Audio:
- New ai/audio_providers.py: AudioSynthesisProvider ABC,
  OpenAIAudioProvider (AsyncOpenAI), ElevenLabsAudioProvider (stub)
- Audio.generate() is now truly async via provider abstraction
- Provider resolved from AIConfig.audio_provider (AI_AUDIO_PROVIDER env var)
- AIConfig gains image_provider and audio_provider fields

Comment 4 — Class-based tests:
- test_image.py: TestDocumentImageAttachment, TestImageBuilder,
  TestImageGeneration, TestImageResult
- test_audio.py: TestAudioBuilder, TestAudioGeneration, TestAudioResult

All 156 AI tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete ai/files.py (ImageAttachment + Files namespace) — dead code,
  not imported anywhere; Document already covers all its concerns
- Add Document.to_base64() next to to_bytes() for callers that need
  base64-encoded image/audio payloads
- Add base64 import at module level
- Port to_base64 coverage into TestDocumentImageAttachment (2 new tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bedus-creation

Copy link
Copy Markdown
Contributor Author

refactor(ai): removed dead files.py, added Document.to_base64()

files.py (ImageAttachment + Files namespace) was dead code — not imported anywhere in the PR; image.py already used Document.to_bytes() directly.

Changes in this commit:

  • 🗑️ Deleted ai/files.py entirely
  • ➕ Added Document.to_base64() -> str next to to_bytes() (the one genuinely new method from files.py)
  • ✅ 2 new tests in TestDocumentImageAttachment covering bytes and text inputs
  • All 158 AI tests green, ruff clean

bedus-creation and others added 5 commits June 12, 2026 17:12
…; rename ABCs to Factory

- Add GoogleImageProvider (Imagen 3 via google-genai, aspect ratio mapping)
- Add GoogleAudioProvider (Gemini TTS, PCM16→WAV wrapping, voice alias map)
- Add ElevenLabsAudioProvider (full synthesis via elevenlabs SDK, voice ID map)
- Add ElevenLabsConfig dataclass with ELEVENLABS_API_KEY env var
- Add Document.to_base64() helper
- Rename ImageGenerationProvider → ImageFactory, AudioSynthesisProvider → AudioFactory
- Remove concrete providers from __init__.py exports (only ABCs exposed)
- Use `from fastapi_startkit import Config` (direct, not facade) for provider resolution
- Add tests for all new providers with SDK mocking via patch.dict(sys.modules)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- OpenAIAudioProvider → OpenAIAudioFactory
- GoogleAudioProvider → GoogleAudioFactory
- ElevenLabsAudioProvider → ElevenLabsAudioFactory
- OpenAIImageProvider → OpenAIImageFactory
- GoogleImageProvider → GoogleImageFactory
- StabilityImageProvider → StabilityImageFactory

All 172 tests passing, ruff clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- audio_providers.py → audio_factory.py
- image_providers.py → image_factory.py
- Update all imports across src and tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
asyncio_mode = "auto" in pyproject.toml makes them unnecessary.
Also drops the now-unused `pytest` import from test_audio.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-level imports

- All test classes now inherit from IsolatedAsyncioTestCase
- Imports use fastapi_startkit.ai (not submodules) for Audio, AudioResponse, Image, ImageResponse, Document
- Plain assert statements (no self.assertEqual style)
- Replace pytest tmp_path/monkeypatch fixtures with tempfile + os.chdir directly
- Replace pytest.raises with self.assertRaisesRegex for exception tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bedus-creation bedus-creation merged commit 95e34e4 into main Jun 13, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant