diff --git a/README.md b/README.md index efb9097..d8de89d 100644 --- a/README.md +++ b/README.md @@ -98,7 +98,7 @@ Add this package to your `Package.swift`: ```swift dependencies: [ - .package(url: "https://github.com/mattt/AnyLanguageModel", from: "0.5.0") + .package(url: "https://github.com/mattt/AnyLanguageModel", from: "0.7.0") ] ``` @@ -126,7 +126,7 @@ To enable specific traits, specify them in your package's dependencies: dependencies: [ .package( url: "https://github.com/mattt/AnyLanguageModel.git", - from: "0.5.0", + from: "0.7.0", traits: ["CoreML", "MLX"] // Enable CoreML and MLX support ) ] @@ -287,26 +287,14 @@ see OWASP's ## Usage -### Apple Foundation Models - -Uses Apple's [system language model](https://developer.apple.com/documentation/FoundationModels) -(requires macOS 26 / iOS 26 / visionOS 26 or later). - -```swift -let model = SystemLanguageModel.default -let session = LanguageModelSession(model: model) +### Guided Generation -let response = try await session.respond { - Prompt("Explain quantum computing in one sentence") -} -``` - -> [!NOTE] -> Image inputs are not yet supported by Apple Foundation Models. - -`SystemLanguageModel` supports guided generation, +All on-device models — Apple Foundation Models, Core ML, MLX, and llama.cpp — +support guided generation, letting you request strongly typed outputs using `@Generable` and `@Guide` instead of parsing raw strings. +Cloud providers (OpenAI, Open Responses, Anthropic, and Gemini) +also support guided generation. For more details, see [Generating Swift data structures with guided generation](https://developer.apple.com/documentation/foundationmodels/generating-swift-data-structures-with-guided-generation). @@ -323,7 +311,7 @@ struct CatProfile { var profile: String } -let session = LanguageModelSession(model: .default) +let session = LanguageModelSession(model: model) let response = try await session.respond( to: "Generate a cute rescue cat", generating: CatProfile.self @@ -331,9 +319,117 @@ let response = try await session.respond( print(response.content) ``` +### Image Inputs + +Many providers support image inputs, +letting you include images alongside text prompts. +Pass images using the `images:` or `image:` parameter on `respond`: + +```swift +let response = try await session.respond( + to: "Describe what you see", + images: [ + .init(url: URL(string: "https://example.com/photo.jpg")!), + .init(url: URL(fileURLWithPath: "/path/to/local.png")) + ] +) +``` + +Image support varies by provider: + +| Provider | Image Inputs | +| ----------------------- | :-------------: | +| Apple Foundation Models | — | +| Core ML | — | +| MLX | model-dependent | +| llama.cpp | — | +| Ollama | model-dependent | +| OpenAI | yes | +| Open Responses | yes | +| Anthropic | yes | +| Google Gemini | yes | + +For MLX and Ollama, +use a vision-capable model +(for example, a VLM or `-vl` variant). + +### Tool Calling + +Tool calling is supported by all providers except llama.cpp. +Define tools using the `Tool` protocol and pass them when creating a session: + +```swift +struct WeatherTool: Tool { + let name = "getWeather" + let description = "Retrieve the latest weather information for a city" + + @Generable + struct Arguments { + @Guide(description: "The city to fetch the weather for") + var city: String + } + + func call(arguments: Arguments) async throws -> String { + "The weather in \(arguments.city) is sunny and 72°F / 23°C" + } +} + +let session = LanguageModelSession(model: model, tools: [WeatherTool()]) + +let response = try await session.respond { + Prompt("How's the weather in Cupertino?") +} +print(response.content) +``` + +To observe or control tool execution, assign a delegate on the session: + +```swift +actor ToolExecutionObserver: ToolExecutionDelegate { + func didGenerateToolCalls(_ toolCalls: [Transcript.ToolCall], in session: LanguageModelSession) async { + print("Generated tool calls: \(toolCalls)") + } + + func toolCallDecision( + for toolCall: Transcript.ToolCall, + in session: LanguageModelSession + ) async -> ToolExecutionDecision { + // Return .stop to halt after tool calls, or .provideOutput(...) to bypass execution. + // This is a good place to ask the user for confirmation (for example, in a modal dialog). + .execute + } + + func didExecuteToolCall( + _ toolCall: Transcript.ToolCall, + output: Transcript.ToolOutput, + in session: LanguageModelSession + ) async { + print("Executed tool call: \(toolCall)") + } +} + +session.toolExecutionDelegate = ToolExecutionObserver() +``` + +## Providers + +### Apple Foundation Models + +Uses Apple's [system language model](https://developer.apple.com/documentation/FoundationModels) +(requires macOS 26 / iOS 26 / visionOS 26 or later). + +```swift +let model = SystemLanguageModel.default +let session = LanguageModelSession(model: model) + +let response = try await session.respond { + Prompt("Explain quantum computing in one sentence") +} +``` + ### Core ML -Run [Core ML](https://developer.apple.com/documentation/coreml) models +Runs [Core ML](https://developer.apple.com/documentation/coreml) models (requires `CoreML` trait): ```swift @@ -355,12 +451,9 @@ Enable the trait in Package.swift: ) ``` -> [!NOTE] -> Image inputs are not currently supported with `CoreMLLanguageModel`. - ### MLX -Run [MLX](https://github.com/ml-explore/mlx-swift) models on Apple Silicon +Runs [MLX](https://github.com/ml-explore/mlx-swift) models on Apple Silicon (requires `MLX` trait): ```swift @@ -398,12 +491,9 @@ Enable the trait in Package.swift: ) ``` -> [!NOTE] -> MLX supports guided generation (structured output via `@Generable`). - ### llama.cpp (GGUF) -Run GGUF quantized models via [llama.cpp](https://github.com/ggml-org/llama.cpp) +Runs GGUF quantized models via [llama.cpp](https://github.com/ggml-org/llama.cpp) (requires `Llama` trait): ```swift @@ -451,9 +541,55 @@ let response = try await session.respond( ) ``` -> [!NOTE] -> Image inputs are not currently supported with `LlamaLanguageModel`. -> Guided generation (structured output via `@Generable`) is supported. +### Ollama + +Run models locally via Ollama's +[HTTP API](https://github.com/ollama/ollama/blob/main/docs/api.md): + +```swift +// Default: connects to http://localhost:11434 +let model = OllamaLanguageModel(model: "qwen3") // `ollama pull qwen3:8b` + +// Custom endpoint +let model = OllamaLanguageModel( + endpoint: URL(string: "http://remote-server:11434")!, + model: "llama3.2" +) + +let session = LanguageModelSession(model: model) +let response = try await session.respond { + Prompt("Tell me a joke") +} +``` + +For local models, make sure you're using a vision‑capable model +(for example, a `-vl` variant). +You can combine multiple images: + +```swift +let model = OllamaLanguageModel(model: "qwen3-vl") // `ollama pull qwen3-vl:8b` +let session = LanguageModelSession(model: model) +let response = try await session.respond( + to: "Compare these posters and summarize their differences", + images: [ + .init(url: URL(string: "https://example.com/poster1.jpg")!), + .init(url: URL(fileURLWithPath: "/path/to/poster2.jpg")) + ] +) +print(response.content) +``` + +Pass any model-specific parameters using custom generation options: + +```swift +var options = GenerationOptions(temperature: 0.8) +options[custom: OllamaLanguageModel.self] = [ + "seed": .int(42), + "repeat_penalty": .double(1.2), + "num_ctx": .int(4096), + "stop": .array([.string("###")]) +] +``` ### OpenAI @@ -512,28 +648,32 @@ options[custom: OpenAILanguageModel.self] = .init( ### Open Responses -Connects to any API that conforms to the [Open Responses](https://www.openresponses.org) specification (e.g. OpenAI, OpenRouter, or other compatible providers). Base URL is required—use your provider’s endpoint: +Connects to any API that conforms to the +[Open Responses](https://www.openresponses.org) specification +(e.g. OpenAI, OpenRouter, or other compatible providers). +Base URL is required—use your provider’s endpoint: ```swift -// Example: OpenAI +// Example: OpenRouter (https://openrouter.ai/api/v1/) let model = OpenResponsesLanguageModel( - baseURL: URL(string: "https://api.openai.com/v1/")!, + baseURL: URL(string: "https://openrouter.ai/api/v1/")!, apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!, - model: "gpt-4o-mini" + model: "openai/gpt-4o-mini" ) -// Example: OpenRouter (https://openrouter.ai/api/v1/) +// Example: OpenAI let model = OpenResponsesLanguageModel( - baseURL: URL(string: "https://openrouter.ai/api/v1/")!, + baseURL: URL(string: "https://api.openai.com/v1/")!, apiKey: ProcessInfo.processInfo.environment["OPEN_RESPONSES_API_KEY"]!, - model: "openai/gpt-4o-mini" + model: "gpt-4o-mini" ) let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Say hello") ``` -Custom options support Open Responses–specific fields such as `tool_choice` (including `allowed_tools`) and `extraBody`: +Custom options support Open Responses–specific fields, +such as `tool_choice` (including `allowed_tools`) and `extraBody`: ```swift var options = GenerationOptions(temperature: 0.8) @@ -667,56 +807,6 @@ let response = try await session.respond( > [!TIP] > Gemini server tools are not available as client tools (`Tool`) for other models. -### Ollama - -Run models locally via Ollama's -[HTTP API](https://github.com/ollama/ollama/blob/main/docs/api.md): - -```swift -// Default: connects to http://localhost:11434 -let model = OllamaLanguageModel(model: "qwen3") // `ollama pull qwen3:8b` - -// Custom endpoint -let model = OllamaLanguageModel( - endpoint: URL(string: "http://remote-server:11434")!, - model: "llama3.2" -) - -let session = LanguageModelSession(model: model) -let response = try await session.respond { - Prompt("Tell me a joke") -} -``` - -For local models, make sure you're using a vision‑capable model -(for example, a `-vl` variant). -You can combine multiple images: - -```swift -let model = OllamaLanguageModel(model: "qwen3-vl") // `ollama pull qwen3-vl:8b` -let session = LanguageModelSession(model: model) -let response = try await session.respond( - to: "Compare these posters and summarize their differences", - images: [ - .init(url: URL(string: "https://example.com/poster1.jpg")!), - .init(url: URL(fileURLWithPath: "/path/to/poster2.jpg")) - ] -) -print(response.content) -``` - -Pass any model-specific parameters using custom generation options: - -```swift -var options = GenerationOptions(temperature: 0.8) -options[custom: OllamaLanguageModel.self] = [ - "seed": .int(42), - "repeat_penalty": .double(1.2), - "num_ctx": .int(4096), - "stop": .array([.string("###")]) -] -``` - ## Testing Run the test suite to verify everything works correctly: