feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS) by bedus-creation · Pull Request #109 · fastapi-startkit/fastapi-startkit-framework

bedus-creation · 2026-06-10T23:32:54Z

Summary

Image.of(prompt).generate() — text-to-image via OpenAI DALL-E 3; returns ImageResponse with raw PNG bytes
Image editing — .attachments([Files.Image.fromPath(...)]) switches to DALL-E 2 image editing
Image modifiers — .landscape() (1792×1024), .portrait() (1024×1792), .square(), .quality(), .model()
Audio.of(text).generate() — text-to-speech via OpenAI TTS; returns AudioResponse
Audio modifiers — .female() (nova), .male() (onyx), .voice("shimmer"), .speed(), .format(), .model()
Files.Image — factory with .fromStorage(), .fromPath(), .fromUrl() for image attachment sources
Storage integration — ImageResponse / AudioResponse expose async .store(), .storeAs(), .storePublicly(), .storePubliclyAs() backed by the Storage facade (with fallback to temp dir)

API examples

from fastapi_startkit.ai import Image, Audio, Files

# Text to image
image = await Image.of("A donut on a counter").generate()

# Image edit with attachments + size modifier
image = await (
    Image.of("Make this impressionist")
    .attachments([
        Files.Image.fromStorage("photo.jpg"),
        Files.Image.fromPath("/tmp/photo.jpg"),
        Files.Image.fromUrl("https://example.com/photo.jpg"),
    ])
    .landscape()
    .generate()
)

# Store image
path = await image.store()
path = await image.storeAs("result.png")
path = await image.storePublicly()
path = await image.storePubliclyAs("result.png")

# Text to speech
audio = await Audio.of("Hello world").generate()
audio = await Audio.of("Hello world").female().generate()
audio = await Audio.of("Hello world").male().generate()
audio = await Audio.of("Hello world").voice("nova").generate()

# Store audio
path = await audio.store()
path = await audio.storeAs("greeting.mp3")

Files changed

File	Change
`ai/files.py`	New — `ImageAttachment`, `Files.Image` factory
`ai/image.py`	New — `Image` builder, `ImageResponse`
`ai/audio.py`	New — `Audio` builder, `AudioResponse`
`ai/__init__.py`	Updated — exports `Image`, `Audio`, `Files`, `ImageAttachment`, `ImageResponse`, `AudioResponse`
`tests/ai/test_image.py`	New — 21 unit tests
`tests/ai/test_audio.py`	New — 24 unit tests

Test plan

All 45 new unit tests pass (fully mocked, no real API calls)
Full AI test suite: 154 tests pass, 0 failures

🤖 Generated with Claude Code

Implements a Laravel-style fluent API for image generation (DALL-E 3), image editing (DALL-E 2 with attachments), and text-to-speech (OpenAI TTS). - `Image.of(prompt)` — text-to-image via DALL-E 3; `.landscape()`, `.portrait()`, `.square()`, `.quality()`, `.model()` modifiers; `.attachments([…])` switches to DALL-E 2 image editing - `Audio.of(text)` — TTS via OpenAI; `.female()` / `.male()` voice shortcuts, `.voice()`, `.speed()`, `.format()`, `.model()` modifiers - `Files.Image.fromStorage/fromPath/fromUrl` — image attachment factories - `ImageResponse` / `AudioResponse` — async `.store()`, `.storeAs()`, `.storePublicly()`, `.storePubliclyAs()` backed by Storage facade - 45 new unit tests (all mocked, no real API calls) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bedus-creation · 2026-06-10T23:46:44Z

+    def _generate_sync(self) -> AudioResponse:
+        client = OpenAI(api_key=self._resolve_api_key())
+        response = client.audio.speech.create(
+            model=self._model,
+            voice=self._voice,
+            input=self._text,
+            speed=self._speed,
+            response_format=self._response_format,
+        )
+        data = response.read()
+        return AudioResponse(data=data, fmt=self._response_format)


can we use langchain or other library so we can support multiple provider ?

bedus-creation · 2026-06-10T23:47:10Z

+class ImageAttachment:
+    """Represents an image file to attach to an Image editing request.
+
+    Instances are created via the :class:`Files.Image` factory, not directly::
+
+        attachment = Files.Image.fromPath("/tmp/photo.jpg")
+        attachment = Files.Image.fromStorage("photo.jpg")
+        attachment = Files.Image.fromUrl("https://example.com/photo.jpg")
+    """
+
+    def __init__(
+        self,
+        data: bytes,
+        name: str = "",
+        media_type: str = "image/jpeg",


we already have Document class in AI, can't we use that ?

bedus-creation · 2026-06-10T23:48:05Z

+    def _create(self) -> ImageResponse:
+        """Generate a new image from a text prompt."""
+        client = OpenAI(api_key=self._resolve_api_key())
+        params: dict = {
+            "model": self._model,
+            "prompt": self._prompt,
+            "size": self._size,
+            "n": self._n,
+            "response_format": "b64_json",
+        }
+        if self._model == "dall-e-3":


can we use other package so we can support multiple provider ?

bedus-creation · 2026-06-10T23:48:54Z

+    assert isinstance(audio, Audio)
+    assert audio._text == "Hello world"
+
+


can we write class based test ?

… class-based tests Comment 1 — Document extended for binary image attachments: - content field now accepts str | bytes - from_path() auto-detects binary (UnicodeDecodeError fallback to rb mode) - New async from_url() downloads bytes via httpx - New async from_storage() reads binary via Storage facade (or direct path) - New to_bytes() returns binary content regardless of how it was loaded - files.py (ImageAttachment/Files) no longer exported; Document is the single type Comment 2 — Multi-provider support for Image: - New ai/image_providers.py: ImageGenerationProvider ABC, OpenAIImageProvider (AsyncOpenAI), StabilityImageProvider (stub) - Image.generate() is now truly async via provider abstraction - Provider resolved from AIConfig.image_provider (AI_IMAGE_PROVIDER env var) Comment 3 — Multi-provider support for Audio: - New ai/audio_providers.py: AudioSynthesisProvider ABC, OpenAIAudioProvider (AsyncOpenAI), ElevenLabsAudioProvider (stub) - Audio.generate() is now truly async via provider abstraction - Provider resolved from AIConfig.audio_provider (AI_AUDIO_PROVIDER env var) - AIConfig gains image_provider and audio_provider fields Comment 4 — Class-based tests: - test_image.py: TestDocumentImageAttachment, TestImageBuilder, TestImageGeneration, TestImageResult - test_audio.py: TestAudioBuilder, TestAudioGeneration, TestAudioResult All 156 AI tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Delete ai/files.py (ImageAttachment + Files namespace) — dead code, not imported anywhere; Document already covers all its concerns - Add Document.to_base64() next to to_bytes() for callers that need base64-encoded image/audio payloads - Add base64 import at module level - Port to_base64 coverage into TestDocumentImageAttachment (2 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bedus-creation · 2026-06-13T00:00:38Z

refactor(ai): removed dead files.py, added Document.to_base64()

files.py (ImageAttachment + Files namespace) was dead code — not imported anywhere in the PR; image.py already used Document.to_bytes() directly.

Changes in this commit:

🗑️ Deleted ai/files.py entirely
➕ Added Document.to_base64() -> str next to to_bytes() (the one genuinely new method from files.py)
✅ 2 new tests in TestDocumentImageAttachment covering bytes and text inputs
All 158 AI tests green, ruff clean

…; rename ABCs to Factory - Add GoogleImageProvider (Imagen 3 via google-genai, aspect ratio mapping) - Add GoogleAudioProvider (Gemini TTS, PCM16→WAV wrapping, voice alias map) - Add ElevenLabsAudioProvider (full synthesis via elevenlabs SDK, voice ID map) - Add ElevenLabsConfig dataclass with ELEVENLABS_API_KEY env var - Add Document.to_base64() helper - Rename ImageGenerationProvider → ImageFactory, AudioSynthesisProvider → AudioFactory - Remove concrete providers from __init__.py exports (only ABCs exposed) - Use `from fastapi_startkit import Config` (direct, not facade) for provider resolution - Add tests for all new providers with SDK mocking via patch.dict(sys.modules) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- OpenAIAudioProvider → OpenAIAudioFactory - GoogleAudioProvider → GoogleAudioFactory - ElevenLabsAudioProvider → ElevenLabsAudioFactory - OpenAIImageProvider → OpenAIImageFactory - GoogleImageProvider → GoogleImageFactory - StabilityImageProvider → StabilityImageFactory All 172 tests passing, ruff clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- audio_providers.py → audio_factory.py - image_providers.py → image_factory.py - Update all imports across src and tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

asyncio_mode = "auto" in pyproject.toml makes them unnecessary. Also drops the now-unused `pytest` import from test_audio.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-level imports - All test classes now inherit from IsolatedAsyncioTestCase - Imports use fastapi_startkit.ai (not submodules) for Audio, AudioResponse, Image, ImageResponse, Document - Plain assert statements (no self.assertEqual style) - Replace pytest tmp_path/monkeypatch fixtures with tempfile + os.chdir directly - Replace pytest.raises with self.assertRaisesRegex for exception tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bedus-creation commented Jun 10, 2026

View reviewed changes

bedus-creation and others added 2 commits June 10, 2026 17:07

bedus-creation and others added 5 commits June 12, 2026 17:12

refactor(ai): rename provider files to factory naming convention

c01d152

- audio_providers.py → audio_factory.py - image_providers.py → image_factory.py - Update all imports across src and tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(tests): remove redundant @pytest.mark.asyncio decorators

c419b9b

asyncio_mode = "auto" in pyproject.toml makes them unnecessary. Also drops the now-unused `pytest` import from test_audio.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bedus-creation merged commit 95e34e4 into main Jun 13, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109

feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109
bedus-creation merged 8 commits into
mainfrom
feat/ai-image-audio

bedus-creation commented Jun 10, 2026

Uh oh!

bedus-creation Jun 10, 2026

Uh oh!

bedus-creation Jun 10, 2026

Uh oh!

bedus-creation Jun 10, 2026

Uh oh!

bedus-creation Jun 10, 2026

Uh oh!

bedus-creation commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		assert isinstance(audio, Audio)
		assert audio._text == "Hello world"

Conversation

bedus-creation commented Jun 10, 2026

Summary

API examples

Files changed

Test plan

Uh oh!

bedus-creation Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

bedus-creation Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

bedus-creation Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

bedus-creation Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

bedus-creation commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant