Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
## [Unreleased]

## [0.3.0] - 2025-11-06

- Added custom Langfuse client support for Tracer and PromptRepositories
- Tracer and PromptRepositories now accept optional `client:` parameter
- Langfuse adapters converted to instance-based for client injection
- Fixed ActiveSupport dependency issues (replaced `.blank?` and `.deep_stringify_keys`)
- Made `handle_response` public in PromptAdapters::Base
- Added comprehensive test coverage (49 new tests)

## [0.1.0] - 2024-11-26

- Initial release
Expand Down
133 changes: 133 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**llm_eval_ruby** is a Ruby gem that provides LLM evaluation functionality through two main features:
1. **Prompt Management**: Fetch and compile prompts using Liquid templating
2. **Tracing**: Track LLM calls with traces, spans, and generations

The gem supports two backend adapters:
- **Langfuse**: Cloud-based prompt and trace management via API
- **Local**: File-based storage for prompts and traces

## Development Commands

### Testing
```bash
bundle exec rspec # Run all tests
bundle exec rspec spec/path_spec.rb # Run specific test file
```

### Linting
```bash
bundle exec rubocop # Run RuboCop linter
bundle exec rubocop -a # Auto-correct offenses
```

### Build & Install
```bash
bundle exec rake build # Build the gem
bundle exec rake install # Install locally
bundle exec rake release # Build, tag, and push to RubyGems
```

### Default Task
```bash
bundle exec rake # Runs both spec and rubocop
```

## Architecture

### Core Components

**Configuration** (`lib/llm_eval_ruby/configuration.rb`)
- Global configuration via `LlmEvalRuby.configure`
- Attributes: `adapter` (`:langfuse` or `:local`), `langfuse_options`, `local_options`

**Adapter Pattern**
The gem uses an adapter pattern to support multiple backends:
- **Prompt Adapters**: `PromptAdapters::Base` → `PromptAdapters::Langfuse` / `PromptAdapters::Local`
- **Trace Adapters**: `TraceAdapters::Base` → `TraceAdapters::Langfuse` / `TraceAdapters::Local`

### Prompt Management

**Prompt Repositories** (`lib/llm_eval_ruby/prompt_repositories/`)
- `Text`: Single text prompts
- `Chat`: Multi-message chat prompts (system, user, assistant roles)
- Methods: `fetch(name:, version:)` and `fetch_and_compile(name:, variables:, version:)`

**Prompt Types** (`lib/llm_eval_ruby/prompt_types/`)
- `Base`: Abstract base class with `role` and `content`
- `System`, `User`, `Assistant`: Role-specific prompt types
- `Compiled`: Rendered prompt with Liquid variables substituted

**Liquid Templating**
All prompts support Liquid template syntax for variable interpolation. Variables are deep stringified before rendering.

### Tracing System

**Tracer** (`lib/llm_eval_ruby/tracer.rb`)
- Class methods: `trace(...)`, `span(...)`, `generation(...)`, `update_generation(...)`
- Each method instantiates a Tracer with the configured adapter and delegates to it
- Supports block syntax for automatic timing and result capture

**Trace Hierarchy**
- **Trace**: Top-level container (e.g., a user request)
- **Span**: A step within a trace (e.g., data preprocessing)
- **Generation**: An LLM API call within a trace or span

**Observable Module** (`lib/llm_eval_ruby/observable.rb`)
Include this module in classes to automatically trace methods via the `observe` decorator:
- `observe :method_name` → wraps as trace
- `observe :method_name, type: :span` → wraps as span
- `observe :method_name, type: :generation` → wraps as generation
- Requires instance variable `@trace_id` to link traces
- Automatically deep copies and sanitizes inputs (truncates base64 images)

### Langfuse Integration

**API Client** (`lib/llm_eval_ruby/api_clients/langfuse.rb`)
- HTTParty-based client for Langfuse API
- Endpoints: `fetch_prompt`, `get_prompts`, `create_trace`, `create_span`, `create_generation`, etc.
- All trace operations use the `/ingestion` endpoint with batched events
- Traces support upsert by ID (create or update based on ID presence)

**Serializers** (`lib/serializers/`)
- `PromptSerializer`: Converts prompt objects for API
- `TraceSerializer`: Converts trace objects for API
- `GenerationSerializer`: Converts generation objects with usage metadata

### Local Adapter

**File Structure**
Prompts are stored in directories named after the prompt:
```
app/prompts/
├── my_chat_prompt/
│ ├── system.txt
│ ├── user.txt
│ └── assistant.txt (optional)
└── my_text_prompt/
└── user.txt
```

## Key Implementation Notes

1. **Adapter Selection**: Determined at runtime based on `LlmEvalRuby.config.adapter`
2. **Custom Client Support**: Langfuse adapters support custom client injection via `client:` parameter
- `LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client)`
- `LlmEvalRuby::PromptRepositories::Text.new(adapter: :langfuse, client: custom_client)`
- If no client is provided, uses default from `langfuse_options` config
- Local adapter does not use clients
3. **Prompt Versioning**: Only supported by Langfuse adapter; local adapter ignores version parameter
4. **Trace IDs**: Must be manually managed when using Observable pattern via `@trace_id`
5. **Deep Copy**: Observable module deep copies inputs to prevent mutation; handles Marshal-incompatible objects gracefully
6. **Base64 Sanitization**: Automatically truncates base64-encoded images in traced inputs to 30 characters
7. **Ruby Version**: Requires Ruby >= 3.3.0

## Dependencies

- `httparty` (~> 0.22.0): HTTP client for Langfuse API
- `liquid` (~> 5.5.0): Template rendering engine
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
llm_eval_ruby (0.2.8)
llm_eval_ruby (0.3.0)
httparty (~> 0.22.0)
liquid (~> 5.5.0)

Expand Down
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,39 @@ Please summarize the following text for {{ user_name }}:

### Advanced Usage

#### Using Custom Langfuse Clients

You can pass custom Langfuse client instances to use different credentials per request:

```ruby
# Create a custom client with different credentials
custom_client = LlmEvalRuby::ApiClients::Langfuse.new(
host: "https://custom-langfuse.com",
username: "custom_public_key",
password: "custom_secret_key"
)

# Use custom client with Tracer
tracer = LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client)
tracer.trace(name: "custom_trace", input: { query: "test" })

# Use custom client with Text repository
text_repo = LlmEvalRuby::PromptRepositories::Text.new(
adapter: :langfuse,
client: custom_client
)
prompt = text_repo.fetch(name: "my_prompt")

# Use custom client with Chat repository
chat_repo = LlmEvalRuby::PromptRepositories::Chat.new(
adapter: :langfuse,
client: custom_client
)
messages = chat_repo.fetch(name: "chat_prompt")
```

If no client is provided, the default client from `langfuse_options` configuration is used.

#### Updating Generations

```ruby
Expand Down
2 changes: 1 addition & 1 deletion lib/llm_eval_ruby/api_clients/langfuse.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def fetch_prompt(name:, version:)
# tag
# page
# limit
def get_prompts(query={})
def get_prompts(query = {})
response = self.class.get("/v2/prompts", { query: query })
response["data"]
end
Expand Down
11 changes: 8 additions & 3 deletions lib/llm_eval_ruby/prompt_adapters/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ def compile(prompt:, variables:)
LlmEvalRuby::PromptTypes::Compiled.new(adapter: self, role: prompt.role, content: compiled)
end

private

def handle_response(response)
response.is_a?(Array) ? wrap_response(response) : wrap_response({ "role" => "system", "content" => response })
end
Expand All @@ -43,7 +41,14 @@ def wrap_response(response)

def render_template(template, variables)
template = Liquid::Template.parse(template)
template.render(variables.deep_stringify_keys)
stringified_variables = stringify_keys(variables)
template.render(stringified_variables)
end

def stringify_keys(hash)
hash.transform_keys(&:to_s).transform_values do |value|
value.is_a?(Hash) ? stringify_keys(value) : value
end
end
end
end
Expand Down
25 changes: 16 additions & 9 deletions lib/llm_eval_ruby/prompt_adapters/langfuse.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,24 @@
module LlmEvalRuby
module PromptAdapters
class Langfuse < Base
class << self
def fetch_prompt(name:, version: nil)
response = client.fetch_prompt(name:, version:)
handle_response(response)
end
def initialize(client: nil)
super()
@client = client
end

def fetch_prompt(name:, version: nil)
response = client.fetch_prompt(name:, version:)
self.class.handle_response(response)
end

def compile(prompt:, variables:)
self.class.compile(prompt:, variables:)
end

private
private

def client
@client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options)
end
def client
@client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options)
end
end
end
Expand Down
18 changes: 9 additions & 9 deletions lib/llm_eval_ruby/prompt_repositories/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ def self.fetch_and_compile(name:, variables:, version: nil)
new(adapter: LlmEvalRuby.config.adapter).fetch_and_compile(name: name, variables: variables, version: version)
end

def initialize(adapter:)
case adapter
when :langfuse
@adapter = PromptAdapters::Langfuse
when :local
@adapter = PromptAdapters::Local
else
raise "Unsupported adapter #{adapter}"
end
def initialize(adapter:, client: nil)
@adapter = case adapter
when :langfuse
PromptAdapters::Langfuse.new(client:)
when :local
PromptAdapters::Local
else
raise "Unsupported adapter #{adapter}"
end
end

def fetch(name:, version: nil)
Expand Down
Loading
Loading