┌─────────────────────────────────────────────────────────────────┐
│ React UI (Frontend) │
│ - AIPanel.tsx sends message via invoke() │
│ - User types → "What is Rust?" │
└────────────────────────────┬────────────────────────────────────┘
│
│ invoke("send_message")
↓
┌─────────────────────────────────────────────────────────────────┐
│ Tauri Bridge (src/store/aiStore.ts) │
│ - Calls invoke("grpc_ai_chat" or "ollama_chat") │
│ - Routes to Rust backend │
└────────────────────────────┬────────────────────────────────────┘
│
│ Tauri Command
↓
┌─────────────────────────────────────────────────────────────────┐
│ Rust Backend (src-tauri/src/ai/mod.rs) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ OLD: Direct HTTP Calls │ │
│ │ - ollama_chat → ollama_complete, etc. │ │
│ │ - api_chat → OpenAI/Anthropic/Groq APIs │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ NEW: gRPC Route ⭐ │ │
│ │ - grpc_ai_chat() creates gRPC client │ │
│ │ - Sends ChatRequest to Python AI Service │ │
│ │ - Returns ChatResponse back to React │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
│ gRPC (port 50051)
│ localhost:50051
↓
┌─────────────────────────────────────────────────────────────────┐
│ Python AI Service (python-ai-service/server.py) │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ AI Decision Engine │ │
│ │ - Receives ChatRequest with model, prompt, provider │ │
│ │ - Analyzes which model/provider to use │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Provider Routing │ │
│ │ if provider == "ollama": │ │
│ │ → OllamaProvider (localhost:11434) │ │
│ │ elif provider == "openai": │ │
│ │ → OpenAIProvider (api.openai.com) │ │
│ │ elif provider == "anthropic": │ │
│ │ → AnthropicProvider (api.anthropic.com) │ │
│ │ elif provider == "groq": │ │
│ │ → GroqProvider (api.groq.com) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Returns ChatResponse with: │ │
│ │ - content: AI's response text │ │
│ │ - tokens_used: Token count │ │
│ │ - model: Model that was used │ │
│ └─────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
│ gRPC Response
↓
┌─────────────────────────────────────────────────────────────────┐
│ Ollama or Cloud APIs (Decision Made) │
│ │
│ Local: Or Cloud: │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Ollama (localhost:11434) │ │
│ │ - mistral:latest │ │
│ │ - deepseek-coder:1.3b │ │
│ │ - codellama:7b │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ OpenAI API (https://api.openai.com) │ │
│ │ - gpt-4, gpt-4-turbo, gpt-3.5-turbo │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Anthropic API (https://api.anthropic.com) │ │
│ │ - claude-3-opus, Claude 3 Sonnet │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Groq API (https://api.groq.com) │ │
│ │ - mixtral-8x7b, llama2-70b │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
│ AI Response
↓
┌─────────────────────────────────────────────────────────────────┐
│ Response Flows Back Through Stack │
│ │
│ API Response → Python gRPC → Rust Client → React UI │
└─────────────────────────────────────────────────────────────────┘
│
↓
✅ Chat Updates in UI
React → Rust → (Ollama OR OpenAI OR Anthropic OR Groq)
↑
Duplicated provider routing logic
No centralized AI decision engine
React → Rust → gRPC → Python (AI Decision Engine) → Ollama/APIs
↑
Centralized model selection
Consistent caching/streaming
Easy to extend
cd python-ai-service
# Install dependencies
pip install -r requirements.txt
# Generate protobuf code
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ai_service.proto
# Verify generated files
ls -la ai_service_pb2.py ai_service_pb2_grpc.py# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Pull a model
ollama pull mistral# Terminal 3
cd python-ai-service
python server.pyExpected output:
======================================================================
NCode AI Service - gRPC Server
======================================================================
Version: 0.1.0
Starting AI Service gRPC server on port 50051
Service version: 0.1.0
# Terminal 4
npm run tauri:dev- Open Settings → Add API Key (optional)
- Switch to AI Panel
- Type a message: "What's the capital of France?"
- Select model: mistral (or another Ollama model)
- Send message
Watch the Python service logs. You should see:
Chat request: model=mistral, provider=ollama
import grpc
from ai_service_pb2 import ChatRequest, Message
from ai_service_pb2_grpc import AIServiceStub
# Connect to gRPC service
with grpc.secure_channel('127.0.0.1:50051', grpc.local_channel_credentials()) as channel:
stub = AIServiceStub(channel)
# Create a chat request
request = ChatRequest(
model='mistral',
prompt='Hello!',
provider='ollama',
temperature=0.7,
max_tokens=200
)
# Send request
response = stub.Chat(request)
print(f"Response: {response.content}")cd src-tauri
cargo test --lib grpc_client::tests -- --nocapture# From Rust
curl -X POST http://localhost:11235/api/health# gRPC Server
GRPC_HOST=127.0.0.1
GRPC_PORT=50051
# Ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_TIMEOUT=30
# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_TIMEOUT=30
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_TIMEOUT=30
# Groq
GROQ_API_KEY=gsk_...
GROQ_TIMEOUT=30
# Features
ENABLE_CACHING=true
ENABLE_STREAMING=true
LOG_LEVEL=INFOconst DEFAULT_HOST: &str = "127.0.0.1";
const DEFAULT_PORT: u16 = 50051;
const DEFAULT_TIMEOUT: Duration = Duration::from_secs(30);message ChatRequest {
string model = 1; // "mistral", "gpt-4", etc.
string prompt = 2; // User message
repeated Message history = 3; // Previous messages
string provider = 4; // "ollama", "openai", "anthropic", "groq"
string api_key = 5; // API key if needed
float temperature = 6; // 0.0-1.0
int32 max_tokens = 7; // Response length limit
}
message Message {
string role = 1; // "user", "assistant", "system"
string content = 2; // Message text
}message ChatResponse {
string content = 1; // AI response
int32 tokens_used = 2; // Token count
string model = 3; // Model that was used
}Problem: Rust can't connect to Python gRPC service
Solution:
- Verify Python service is running:
ps aux | grep server.py - Check port is listening:
lsof -i :50051 - Verify
GRPC_HOSTandGRPC_PORTmatch in both Rust and Python
Problem: ai_service_pb2.py and ai_service_pb2_grpc.py don't exist
Solution:
cd python-ai-service
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ai_service.protoProblem: gRPC service can't reach Ollama
Solution:
- Start Ollama:
ollama serve - Check it's running:
curl http://localhost:11434/api/tags - Update
OLLAMA_BASE_URLin.envif using different port
Problem: OpenAI/Anthropic/Groq API key fails
Solution:
- Verify key is correct
- Check API key hasn't expired
- Review logs:
LOG_LEVEL=DEBUG python server.py - Test API directly:
curl -H "Authorization: Bearer $APIKEY" https://api.openai.com/v1/models
| Metric | Value | Notes |
|---|---|---|
| Connection overhead | ~50ms | One-time per app launch |
| Chat request latency | 10-50ms | Network overhead |
| Token generation | 10-100ms per token | Depends on model |
| Streaming latency | ~5ms per chunk | Real-time feedback |
- ✅ Test gRPC flow with Ollama
- ✅ Add API key, test with OpenAI
- ✅ Test streaming responses
- ✅ Monitor token usage
- ✅ Implement caching in Python service
- ✅ Add health checks and monitoring