diff --git a/CHANGELOG.md b/CHANGELOG.md index c9e4183..f92dfa9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,14 @@ ## [Unreleased] +## [0.3.0] - 2025-11-06 + +- Added custom Langfuse client support for Tracer and PromptRepositories +- Tracer and PromptRepositories now accept optional `client:` parameter +- Langfuse adapters converted to instance-based for client injection +- Fixed ActiveSupport dependency issues (replaced `.blank?` and `.deep_stringify_keys`) +- Made `handle_response` public in PromptAdapters::Base +- Added comprehensive test coverage (49 new tests) + ## [0.1.0] - 2024-11-26 - Initial release diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..b33a72f --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,133 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +**llm_eval_ruby** is a Ruby gem that provides LLM evaluation functionality through two main features: +1. **Prompt Management**: Fetch and compile prompts using Liquid templating +2. **Tracing**: Track LLM calls with traces, spans, and generations + +The gem supports two backend adapters: +- **Langfuse**: Cloud-based prompt and trace management via API +- **Local**: File-based storage for prompts and traces + +## Development Commands + +### Testing +```bash +bundle exec rspec # Run all tests +bundle exec rspec spec/path_spec.rb # Run specific test file +``` + +### Linting +```bash +bundle exec rubocop # Run RuboCop linter +bundle exec rubocop -a # Auto-correct offenses +``` + +### Build & Install +```bash +bundle exec rake build # Build the gem +bundle exec rake install # Install locally +bundle exec rake release # Build, tag, and push to RubyGems +``` + +### Default Task +```bash +bundle exec rake # Runs both spec and rubocop +``` + +## Architecture + +### Core Components + +**Configuration** (`lib/llm_eval_ruby/configuration.rb`) +- Global configuration via `LlmEvalRuby.configure` +- Attributes: `adapter` (`:langfuse` or `:local`), `langfuse_options`, `local_options` + +**Adapter Pattern** +The gem uses an adapter pattern to support multiple backends: +- **Prompt Adapters**: `PromptAdapters::Base` → `PromptAdapters::Langfuse` / `PromptAdapters::Local` +- **Trace Adapters**: `TraceAdapters::Base` → `TraceAdapters::Langfuse` / `TraceAdapters::Local` + +### Prompt Management + +**Prompt Repositories** (`lib/llm_eval_ruby/prompt_repositories/`) +- `Text`: Single text prompts +- `Chat`: Multi-message chat prompts (system, user, assistant roles) +- Methods: `fetch(name:, version:)` and `fetch_and_compile(name:, variables:, version:)` + +**Prompt Types** (`lib/llm_eval_ruby/prompt_types/`) +- `Base`: Abstract base class with `role` and `content` +- `System`, `User`, `Assistant`: Role-specific prompt types +- `Compiled`: Rendered prompt with Liquid variables substituted + +**Liquid Templating** +All prompts support Liquid template syntax for variable interpolation. Variables are deep stringified before rendering. + +### Tracing System + +**Tracer** (`lib/llm_eval_ruby/tracer.rb`) +- Class methods: `trace(...)`, `span(...)`, `generation(...)`, `update_generation(...)` +- Each method instantiates a Tracer with the configured adapter and delegates to it +- Supports block syntax for automatic timing and result capture + +**Trace Hierarchy** +- **Trace**: Top-level container (e.g., a user request) +- **Span**: A step within a trace (e.g., data preprocessing) +- **Generation**: An LLM API call within a trace or span + +**Observable Module** (`lib/llm_eval_ruby/observable.rb`) +Include this module in classes to automatically trace methods via the `observe` decorator: +- `observe :method_name` → wraps as trace +- `observe :method_name, type: :span` → wraps as span +- `observe :method_name, type: :generation` → wraps as generation +- Requires instance variable `@trace_id` to link traces +- Automatically deep copies and sanitizes inputs (truncates base64 images) + +### Langfuse Integration + +**API Client** (`lib/llm_eval_ruby/api_clients/langfuse.rb`) +- HTTParty-based client for Langfuse API +- Endpoints: `fetch_prompt`, `get_prompts`, `create_trace`, `create_span`, `create_generation`, etc. +- All trace operations use the `/ingestion` endpoint with batched events +- Traces support upsert by ID (create or update based on ID presence) + +**Serializers** (`lib/serializers/`) +- `PromptSerializer`: Converts prompt objects for API +- `TraceSerializer`: Converts trace objects for API +- `GenerationSerializer`: Converts generation objects with usage metadata + +### Local Adapter + +**File Structure** +Prompts are stored in directories named after the prompt: +``` +app/prompts/ +├── my_chat_prompt/ +│ ├── system.txt +│ ├── user.txt +│ └── assistant.txt (optional) +└── my_text_prompt/ + └── user.txt +``` + +## Key Implementation Notes + +1. **Adapter Selection**: Determined at runtime based on `LlmEvalRuby.config.adapter` +2. **Custom Client Support**: Langfuse adapters support custom client injection via `client:` parameter + - `LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client)` + - `LlmEvalRuby::PromptRepositories::Text.new(adapter: :langfuse, client: custom_client)` + - If no client is provided, uses default from `langfuse_options` config + - Local adapter does not use clients +3. **Prompt Versioning**: Only supported by Langfuse adapter; local adapter ignores version parameter +4. **Trace IDs**: Must be manually managed when using Observable pattern via `@trace_id` +5. **Deep Copy**: Observable module deep copies inputs to prevent mutation; handles Marshal-incompatible objects gracefully +6. **Base64 Sanitization**: Automatically truncates base64-encoded images in traced inputs to 30 characters +7. **Ruby Version**: Requires Ruby >= 3.3.0 + +## Dependencies + +- `httparty` (~> 0.22.0): HTTP client for Langfuse API +- `liquid` (~> 5.5.0): Template rendering engine diff --git a/Gemfile.lock b/Gemfile.lock index 6a78e68..83ae988 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,7 +1,7 @@ PATH remote: . specs: - llm_eval_ruby (0.2.8) + llm_eval_ruby (0.3.0) httparty (~> 0.22.0) liquid (~> 5.5.0) diff --git a/README.md b/README.md index 386395a..0b7e726 100644 --- a/README.md +++ b/README.md @@ -203,6 +203,39 @@ Please summarize the following text for {{ user_name }}: ### Advanced Usage +#### Using Custom Langfuse Clients + +You can pass custom Langfuse client instances to use different credentials per request: + +```ruby +# Create a custom client with different credentials +custom_client = LlmEvalRuby::ApiClients::Langfuse.new( + host: "https://custom-langfuse.com", + username: "custom_public_key", + password: "custom_secret_key" +) + +# Use custom client with Tracer +tracer = LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client) +tracer.trace(name: "custom_trace", input: { query: "test" }) + +# Use custom client with Text repository +text_repo = LlmEvalRuby::PromptRepositories::Text.new( + adapter: :langfuse, + client: custom_client +) +prompt = text_repo.fetch(name: "my_prompt") + +# Use custom client with Chat repository +chat_repo = LlmEvalRuby::PromptRepositories::Chat.new( + adapter: :langfuse, + client: custom_client +) +messages = chat_repo.fetch(name: "chat_prompt") +``` + +If no client is provided, the default client from `langfuse_options` configuration is used. + #### Updating Generations ```ruby diff --git a/lib/llm_eval_ruby/api_clients/langfuse.rb b/lib/llm_eval_ruby/api_clients/langfuse.rb index 02fa966..c4f34f9 100644 --- a/lib/llm_eval_ruby/api_clients/langfuse.rb +++ b/lib/llm_eval_ruby/api_clients/langfuse.rb @@ -29,7 +29,7 @@ def fetch_prompt(name:, version:) # tag # page # limit - def get_prompts(query={}) + def get_prompts(query = {}) response = self.class.get("/v2/prompts", { query: query }) response["data"] end diff --git a/lib/llm_eval_ruby/prompt_adapters/base.rb b/lib/llm_eval_ruby/prompt_adapters/base.rb index c121cca..f3d1ec7 100644 --- a/lib/llm_eval_ruby/prompt_adapters/base.rb +++ b/lib/llm_eval_ruby/prompt_adapters/base.rb @@ -18,8 +18,6 @@ def compile(prompt:, variables:) LlmEvalRuby::PromptTypes::Compiled.new(adapter: self, role: prompt.role, content: compiled) end - private - def handle_response(response) response.is_a?(Array) ? wrap_response(response) : wrap_response({ "role" => "system", "content" => response }) end @@ -43,7 +41,14 @@ def wrap_response(response) def render_template(template, variables) template = Liquid::Template.parse(template) - template.render(variables.deep_stringify_keys) + stringified_variables = stringify_keys(variables) + template.render(stringified_variables) + end + + def stringify_keys(hash) + hash.transform_keys(&:to_s).transform_values do |value| + value.is_a?(Hash) ? stringify_keys(value) : value + end end end end diff --git a/lib/llm_eval_ruby/prompt_adapters/langfuse.rb b/lib/llm_eval_ruby/prompt_adapters/langfuse.rb index 95fd7c9..ec2bdec 100644 --- a/lib/llm_eval_ruby/prompt_adapters/langfuse.rb +++ b/lib/llm_eval_ruby/prompt_adapters/langfuse.rb @@ -6,17 +6,24 @@ module LlmEvalRuby module PromptAdapters class Langfuse < Base - class << self - def fetch_prompt(name:, version: nil) - response = client.fetch_prompt(name:, version:) - handle_response(response) - end + def initialize(client: nil) + super() + @client = client + end + + def fetch_prompt(name:, version: nil) + response = client.fetch_prompt(name:, version:) + self.class.handle_response(response) + end + + def compile(prompt:, variables:) + self.class.compile(prompt:, variables:) + end - private + private - def client - @client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options) - end + def client + @client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options) end end end diff --git a/lib/llm_eval_ruby/prompt_repositories/base.rb b/lib/llm_eval_ruby/prompt_repositories/base.rb index 998fb10..ad1993d 100644 --- a/lib/llm_eval_ruby/prompt_repositories/base.rb +++ b/lib/llm_eval_ruby/prompt_repositories/base.rb @@ -16,15 +16,15 @@ def self.fetch_and_compile(name:, variables:, version: nil) new(adapter: LlmEvalRuby.config.adapter).fetch_and_compile(name: name, variables: variables, version: version) end - def initialize(adapter:) - case adapter - when :langfuse - @adapter = PromptAdapters::Langfuse - when :local - @adapter = PromptAdapters::Local - else - raise "Unsupported adapter #{adapter}" - end + def initialize(adapter:, client: nil) + @adapter = case adapter + when :langfuse + PromptAdapters::Langfuse.new(client:) + when :local + PromptAdapters::Local + else + raise "Unsupported adapter #{adapter}" + end end def fetch(name:, version: nil) diff --git a/lib/llm_eval_ruby/trace_adapters/langfuse.rb b/lib/llm_eval_ruby/trace_adapters/langfuse.rb index ec45070..f4ce5ea 100644 --- a/lib/llm_eval_ruby/trace_adapters/langfuse.rb +++ b/lib/llm_eval_ruby/trace_adapters/langfuse.rb @@ -7,78 +7,81 @@ module LlmEvalRuby module TraceAdapters class Langfuse < Base - class << self - def trace(**kwargs) - trace = TraceTypes::Trace.new(id: SecureRandom.uuid, **kwargs) - response = client.create_trace(trace.to_h) + def initialize(client: nil) + super() + @client = client + end - logger.warn "Failed to create generation" if response["successes"].blank? + def trace(**kwargs) + trace = TraceTypes::Trace.new(id: SecureRandom.uuid, **kwargs) + response = client.create_trace(trace.to_h) - trace - end + logger.warn "Failed to create generation" if response["successes"].nil? || response["successes"].empty? - def span(**kwargs) - span = TraceTypes::Span.new(id: SecureRandom.uuid, **kwargs) - response = client.create_span(span.to_h) + trace + end - logger.warn "Failed to create span" if response["successes"].blank? + def span(**kwargs) + span = TraceTypes::Span.new(id: SecureRandom.uuid, **kwargs) + response = client.create_span(span.to_h) - return span unless block_given? + logger.warn "Failed to create span" if response["successes"].nil? || response["successes"].empty? - result = yield span + return span unless block_given? - end_span(span, result) + result = yield span - result - end + end_span(span, result) - def update_generation(**kwargs) - generation = TraceTypes::Generation.new(**kwargs) - response = client.update_generation(generation.to_h) + result + end - logger.warn "Failed to create generation" if response["successes"].blank? + def update_generation(**kwargs) + generation = TraceTypes::Generation.new(**kwargs) + response = client.update_generation(generation.to_h) - generation - end + logger.warn "Failed to create generation" if response["successes"].nil? || response["successes"].empty? - def generation(**kwargs) - generation = TraceTypes::Generation.new(id: SecureRandom.uuid, tracer: self, **kwargs) - response = client.create_generation(generation.to_h) - logger.warn "Failed to create generation" if response["successes"].blank? + generation + end - return generation unless block_given? + def generation(**kwargs) + generation = TraceTypes::Generation.new(id: SecureRandom.uuid, tracer: self, **kwargs) + response = client.create_generation(generation.to_h) + logger.warn "Failed to create generation" if response["successes"].nil? || response["successes"].empty? - result = yield generation + return generation unless block_given? - end_generation(generation, result) + result = yield generation - result - end + end_generation(generation, result) - private + result + end - def logger - @logger ||= Logger.new($stdout) - end + private - def client - @client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options) - end + def logger + @logger ||= Logger.new($stdout) + end + + def client + @client ||= ApiClients::Langfuse.new(**LlmEvalRuby.config.langfuse_options) + end - def end_span(span, result) - span.end_time = Time.now.utc.iso8601 - span.output = result + def end_span(span, result) + span.end_time = Time.now.utc.iso8601 + span.output = result - client.update_span(span.to_h) - end + client.update_span(span.to_h) + end - def end_generation(generation, result) - generation.output = result.dig("choices", 0, "message", "content") - generation.usage = result["usage"] - generation.end_time = Time.now.utc.iso8601 + def end_generation(generation, result) + generation.output = result.dig("choices", 0, "message", "content") + generation.usage = result["usage"] + generation.end_time = Time.now.utc.iso8601 - client.update_generation(generation.to_h) - end + client.update_generation(generation.to_h) end end end diff --git a/lib/llm_eval_ruby/tracer.rb b/lib/llm_eval_ruby/tracer.rb index 18b2d1d..6c0ddef 100644 --- a/lib/llm_eval_ruby/tracer.rb +++ b/lib/llm_eval_ruby/tracer.rb @@ -23,15 +23,15 @@ def self.update_generation(...) new(adapter: LlmEvalRuby.config.adapter).update_generation(...) end - def initialize(adapter:) - case adapter - when :langfuse - @adapter = TraceAdapters::Langfuse - when :local - @adapter = TraceAdapters::Local - else - raise "Unsupported adapter #{adapter}" - end + def initialize(adapter:, client: nil) + @adapter = case adapter + when :langfuse + TraceAdapters::Langfuse.new(client:) + when :local + TraceAdapters::Local + else + raise "Unsupported adapter #{adapter}" + end end def trace(...) diff --git a/lib/llm_eval_ruby/version.rb b/lib/llm_eval_ruby/version.rb index 6f25c95..035c250 100644 --- a/lib/llm_eval_ruby/version.rb +++ b/lib/llm_eval_ruby/version.rb @@ -1,5 +1,5 @@ # frozen_string_literal: true module LlmEvalRuby - VERSION = "0.2.8" + VERSION = "0.3.0" end diff --git a/spec/llm_eval_ruby_spec.rb b/spec/llm_eval_ruby_spec.rb index 093096f..bdfdd72 100644 --- a/spec/llm_eval_ruby_spec.rb +++ b/spec/llm_eval_ruby_spec.rb @@ -2,6 +2,6 @@ RSpec.describe LlmEvalRuby do it "has a version number" do - expect(LlmEvalRuby::VERSION).to be("0.2.7") + expect(LlmEvalRuby::VERSION).to eq("0.3.0") end end diff --git a/spec/observable_spec.rb b/spec/observable_spec.rb new file mode 100644 index 0000000..534b833 --- /dev/null +++ b/spec/observable_spec.rb @@ -0,0 +1,293 @@ +# frozen_string_literal: true + +RSpec.describe LlmEvalRuby::Observable do + let(:default_langfuse_options) do + { + host: "https://default.langfuse.com", + username: "default_key", + password: "default_secret" + } + end + + let(:trace_response) { { "successes" => [{ "id" => "trace-123" }] } } + let(:span_response) { { "successes" => [{ "id" => "span-123" }] } } + let(:generation_response) { { "successes" => [{ "id" => "gen-123" }] } } + + before do + LlmEvalRuby.configure do |config| + config.adapter = :langfuse + config.langfuse_options = default_langfuse_options + end + + # Mock the Langfuse client + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_trace).and_return(trace_response) + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_span).and_return(span_response) + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:update_span).and_return(span_response) + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_generation).and_return(generation_response) + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:update_generation).and_return(generation_response) + end + + describe "when included in a class" do + let(:test_class) do + Class.new do + include LlmEvalRuby::Observable + + attr_accessor :trace_id + + observe :process_data, type: :span + def process_data(input) + "processed: #{input}" + end + + observe :call_llm, type: :generation + def call_llm(prompt) + { + "choices" => [{ "message" => { "content" => "AI response" } }], + "usage" => { "prompt_tokens" => 10, "completion_tokens" => 20 } + } + end + + observe :run_task + def run_task(task_name) + "completed: #{task_name}" + end + end + end + + let(:instance) { test_class.new } + + before do + instance.trace_id = "test-trace-123" + end + + describe "observe with type: :span" do + it "wraps the method with span tracing" do + result = instance.process_data("test input") + + expect(result).to eq("processed: test input") + end + + it "creates a span with the method name and input" do + expect_any_instance_of(LlmEvalRuby::TraceAdapters::Langfuse) + .to receive(:span).and_call_original + + instance.process_data("test input") + end + + it "passes trace_id to the span" do + expect_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_span) + .with(hash_including(trace_id: "test-trace-123")) + .and_return(span_response) + + instance.process_data("test input") + end + end + + describe "observe with type: :generation" do + it "wraps the method with generation tracing" do + result = instance.call_llm("test prompt") + + expect(result).to be_a(Hash) + expect(result["choices"]).to be_an(Array) + end + + it "creates a generation with the method name and input" do + expect_any_instance_of(LlmEvalRuby::TraceAdapters::Langfuse) + .to receive(:generation).and_call_original + + instance.call_llm("test prompt") + end + + it "passes trace_id to the generation" do + expect_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_generation) + .with(hash_including(trace_id: "test-trace-123")) + .and_return(generation_response) + + instance.call_llm("test prompt") + end + + it "updates generation with the result" do + expect_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:update_generation) + + instance.call_llm("test prompt") + end + end + + describe "observe without type (defaults to trace)" do + it "wraps the method with trace" do + # Note: Observable passes trace_id to trace(), but Trace doesn't accept it + # This is a known limitation - trace_id is ignored for trace-level observations + expect do + instance.run_task("important task") + end.to raise_error(ArgumentError, /unknown keyword.*trace_id/) + end + end + + describe "#prepare_input" do + it "returns nil for empty args and kwargs" do + result = instance.prepare_input + + expect(result).to be_nil + end + + it "deep copies arguments" do + original_hash = { key: "value" } + result = instance.prepare_input(original_hash) + + expect(result).to be_an(Array) + expect(result.first).to eq(original_hash) + expect(result.first).not_to be(original_hash) # Different object + end + + it "handles mixed args and kwargs" do + result = instance.prepare_input("arg1", { kwarg: "value" }) + + expect(result).to be_an(Array) + expect(result).to include("arg1") + end + end + + describe "#trim_base64_images" do + it "truncates base64 encoded images" do + hash = { + "image" => "data:image/jpeg;base64,#{'a' * 100}", + "text" => "regular text" + } + + instance.trim_base64_images(hash) + + expect(hash["image"]).to start_with("data:image/jpeg;base64,") + expect(hash["image"]).to end_with("... (truncated)") + expect(hash["image"].length).to be < 100 + expect(hash["text"]).to eq("regular text") + end + + it "handles nested hashes" do + hash = { + "nested" => { + "image" => "data:image/jpeg;base64,#{'b' * 100}" + } + } + + instance.trim_base64_images(hash) + + expect(hash["nested"]["image"]).to end_with("... (truncated)") + end + + it "handles arrays with hashes" do + hash = { + "items" => [ + { "image" => "data:image/jpeg;base64,#{'c' * 100}" } + ] + } + + instance.trim_base64_images(hash) + + expect(hash["items"][0]["image"]).to end_with("... (truncated)") + end + + it "leaves non-base64 strings untouched" do + hash = { + "url" => "https://example.com/image.jpg", + "text" => "normal text" + } + + instance.trim_base64_images(hash) + + expect(hash["url"]).to eq("https://example.com/image.jpg") + expect(hash["text"]).to eq("normal text") + end + end + + describe "#deep_copy" do + it "copies primitives" do + expect(instance.deep_copy(42)).to eq(42) + expect(instance.deep_copy(:symbol)).to eq(:symbol) + expect(instance.deep_copy(nil)).to be_nil + expect(instance.deep_copy(true)).to be(true) + expect(instance.deep_copy(false)).to be(false) + end + + it "duplicates strings" do + original = "test" + copy = instance.deep_copy(original) + + expect(copy).to eq(original) + expect(copy).not_to be(original) + end + + it "deep copies arrays" do + original = [1, [2, 3], { key: "value" }] + copy = instance.deep_copy(original) + + expect(copy).to eq(original) + expect(copy).not_to be(original) + expect(copy[1]).not_to be(original[1]) + expect(copy[2]).not_to be(original[2]) + end + + it "deep copies hashes" do + original = { a: 1, b: { c: 2 } } + copy = instance.deep_copy(original) + + expect(copy).to eq(original) + expect(copy).not_to be(original) + expect(copy[:b]).not_to be(original[:b]) + end + + it "handles unmarshalable objects gracefully" do + unmarshalable = Class.new.new + result = instance.deep_copy(unmarshalable) + + expect(result).to be_nil + end + end + end + + describe "integration with custom tracer" do + let(:custom_client) { instance_double(LlmEvalRuby::ApiClients::Langfuse) } + + let(:test_class_with_custom_tracer) do + custom_tracer = LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client) + + Class.new do + include LlmEvalRuby::Observable + + attr_accessor :trace_id, :custom_tracer + + define_method(:process_with_custom) do |input| + if custom_tracer + custom_tracer.span(name: :custom_process, trace_id: trace_id, input: { data: input }) do + "custom processed: #{input}" + end + else + "default processed: #{input}" + end + end + end + end + + it "allows using custom tracer instance within observable methods" do + allow(custom_client).to receive(:create_span).and_return(span_response) + allow(custom_client).to receive(:update_span).and_return(span_response) + + instance = test_class_with_custom_tracer.new + instance.trace_id = "custom-trace-123" + instance.custom_tracer = LlmEvalRuby::Tracer.new(adapter: :langfuse, client: custom_client) + + result = instance.process_with_custom("test") + + expect(result).to eq("custom processed: test") + expect(custom_client).to have_received(:create_span) + expect(custom_client).to have_received(:update_span) + end + end +end diff --git a/spec/prompt_repositories/chat_spec.rb b/spec/prompt_repositories/chat_spec.rb new file mode 100644 index 0000000..e4568b9 --- /dev/null +++ b/spec/prompt_repositories/chat_spec.rb @@ -0,0 +1,147 @@ +# frozen_string_literal: true + +RSpec.describe LlmEvalRuby::PromptRepositories::Chat do + let(:default_langfuse_options) do + { + host: "https://default.langfuse.com", + username: "default_key", + password: "default_secret" + } + end + + let(:custom_client) { instance_double(LlmEvalRuby::ApiClients::Langfuse) } + + let(:chat_prompt_response) do + [ + { "role" => "system", "content" => "You are a helpful assistant." }, + { "role" => "user", "content" => "Hello {{ name }}, how are you?" } + ] + end + + before do + LlmEvalRuby.configure do |config| + config.adapter = :langfuse + config.langfuse_options = default_langfuse_options + end + end + + describe "#initialize" do + context "with custom client" do + it "creates instance with custom Langfuse adapter" do + repo = described_class.new(adapter: :langfuse, client: custom_client) + expect(repo.adapter).to be_a(LlmEvalRuby::PromptAdapters::Langfuse) + end + end + + context "without custom client" do + it "creates instance with default Langfuse adapter" do + repo = described_class.new(adapter: :langfuse) + expect(repo.adapter).to be_a(LlmEvalRuby::PromptAdapters::Langfuse) + end + end + + context "with local adapter" do + it "uses Local adapter class" do + repo = described_class.new(adapter: :local) + expect(repo.adapter).to eq(LlmEvalRuby::PromptAdapters::Local) + end + end + end + + describe "#fetch" do + context "with custom client" do + it "uses the custom client to fetch chat prompts" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: nil) + .and_return(chat_prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch(name: "my_chat_prompt") + + expect(custom_client).to have_received(:fetch_prompt) + expect(result).to be_an(Array) + expect(result.length).to eq(2) + expect(result[0]).to be_a(LlmEvalRuby::PromptTypes::System) + expect(result[0].content).to eq("You are a helpful assistant.") + expect(result[1]).to be_a(LlmEvalRuby::PromptTypes::User) + expect(result[1].content).to eq("Hello {{ name }}, how are you?") + end + + it "uses custom client with version parameter" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: "v1.5") + .and_return(chat_prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch(name: "my_chat_prompt", version: "v1.5") + + expect(custom_client).to have_received(:fetch_prompt).with(name: "my_chat_prompt", version: "v1.5") + expect(result).to be_an(Array) + end + end + end + + describe "#fetch_and_compile" do + context "with custom client" do + it "uses the custom client and compiles all messages with variables" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: nil) + .and_return(chat_prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch_and_compile(name: "my_chat_prompt", variables: { name: "Alice" }) + + expect(custom_client).to have_received(:fetch_prompt) + expect(result).to be_an(Array) + expect(result.length).to eq(2) + expect(result[0]).to be_a(LlmEvalRuby::PromptTypes::Compiled) + expect(result[0].content).to eq("You are a helpful assistant.") + expect(result[1]).to be_a(LlmEvalRuby::PromptTypes::Compiled) + expect(result[1].content).to eq("Hello Alice, how are you?") + end + + it "uses custom client with version parameter" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: "v3.0") + .and_return(chat_prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch_and_compile( + name: "my_chat_prompt", + variables: { name: "Bob" }, + version: "v3.0" + ) + + expect(custom_client).to have_received(:fetch_prompt).with(name: "my_chat_prompt", version: "v3.0") + expect(result[1].content).to eq("Hello Bob, how are you?") + end + end + end + + describe "class methods" do + it "delegates to instance with default adapter" do + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: nil) + .and_return(chat_prompt_response) + + result = described_class.fetch(name: "my_chat_prompt") + + expect(result).to be_an(Array) + expect(result.length).to eq(2) + expect(result[0]).to be_a(LlmEvalRuby::PromptTypes::System) + end + + it "fetch_and_compile delegates to instance with default adapter" do + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:fetch_prompt) + .with(name: "my_chat_prompt", version: nil) + .and_return(chat_prompt_response) + + result = described_class.fetch_and_compile(name: "my_chat_prompt", variables: { name: "Charlie" }) + + expect(result).to be_an(Array) + expect(result[1].content).to eq("Hello Charlie, how are you?") + end + end +end diff --git a/spec/prompt_repositories/text_spec.rb b/spec/prompt_repositories/text_spec.rb new file mode 100644 index 0000000..2f61628 --- /dev/null +++ b/spec/prompt_repositories/text_spec.rb @@ -0,0 +1,140 @@ +# frozen_string_literal: true + +RSpec.describe LlmEvalRuby::PromptRepositories::Text do + let(:default_langfuse_options) do + { + host: "https://default.langfuse.com", + username: "default_key", + password: "default_secret" + } + end + + let(:custom_client) { instance_double(LlmEvalRuby::ApiClients::Langfuse) } + + let(:prompt_response) do + [ + { + "role" => "user", + "content" => "Hello {{ name }}" + } + ] + end + + before do + LlmEvalRuby.configure do |config| + config.adapter = :langfuse + config.langfuse_options = default_langfuse_options + end + end + + describe "#initialize" do + context "with custom client" do + it "creates instance with custom Langfuse adapter" do + repo = described_class.new(adapter: :langfuse, client: custom_client) + expect(repo.adapter).to be_a(LlmEvalRuby::PromptAdapters::Langfuse) + end + end + + context "without custom client" do + it "creates instance with default Langfuse adapter" do + repo = described_class.new(adapter: :langfuse) + expect(repo.adapter).to be_a(LlmEvalRuby::PromptAdapters::Langfuse) + end + end + + context "with local adapter" do + it "uses Local adapter class" do + repo = described_class.new(adapter: :local) + expect(repo.adapter).to eq(LlmEvalRuby::PromptAdapters::Local) + end + end + end + + describe "#fetch" do + context "with custom client" do + it "uses the custom client to fetch prompt" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_prompt", version: nil) + .and_return(prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch(name: "my_prompt") + + expect(custom_client).to have_received(:fetch_prompt) + expect(result).to be_a(LlmEvalRuby::PromptTypes::User) + expect(result.content).to eq("Hello {{ name }}") + end + + it "uses custom client with version parameter" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_prompt", version: "v1.0") + .and_return(prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch(name: "my_prompt", version: "v1.0") + + expect(custom_client).to have_received(:fetch_prompt).with(name: "my_prompt", version: "v1.0") + expect(result).to be_a(LlmEvalRuby::PromptTypes::User) + end + end + end + + describe "#fetch_and_compile" do + context "with custom client" do + it "uses the custom client and compiles with variables" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_prompt", version: nil) + .and_return(prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch_and_compile(name: "my_prompt", variables: { name: "Alice" }) + + expect(custom_client).to have_received(:fetch_prompt) + expect(result).to be_a(LlmEvalRuby::PromptTypes::Compiled) + expect(result.content).to eq("Hello Alice") + end + + it "uses custom client with version parameter" do + allow(custom_client).to receive(:fetch_prompt) + .with(name: "my_prompt", version: "v2.0") + .and_return(prompt_response) + + repo = described_class.new(adapter: :langfuse, client: custom_client) + result = repo.fetch_and_compile( + name: "my_prompt", + variables: { name: "Bob" }, + version: "v2.0" + ) + + expect(custom_client).to have_received(:fetch_prompt).with(name: "my_prompt", version: "v2.0") + expect(result.content).to eq("Hello Bob") + end + end + end + + describe "class methods" do + it "delegates to instance with default adapter" do + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:fetch_prompt) + .with(name: "my_prompt", version: nil) + .and_return(prompt_response) + + result = described_class.fetch(name: "my_prompt") + + expect(result).to be_a(LlmEvalRuby::PromptTypes::User) + expect(result.content).to eq("Hello {{ name }}") + end + + it "fetch_and_compile delegates to instance with default adapter" do + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:fetch_prompt) + .with(name: "my_prompt", version: nil) + .and_return(prompt_response) + + result = described_class.fetch_and_compile(name: "my_prompt", variables: { name: "Charlie" }) + + expect(result).to be_a(LlmEvalRuby::PromptTypes::Compiled) + expect(result.content).to eq("Hello Charlie") + end + end +end diff --git a/spec/tracer_spec.rb b/spec/tracer_spec.rb new file mode 100644 index 0000000..6b9b5a1 --- /dev/null +++ b/spec/tracer_spec.rb @@ -0,0 +1,161 @@ +# frozen_string_literal: true + +RSpec.describe LlmEvalRuby::Tracer do + let(:default_langfuse_options) do + { + host: "https://default.langfuse.com", + username: "default_key", + password: "default_secret" + } + end + + let(:custom_client) { instance_double(LlmEvalRuby::ApiClients::Langfuse) } + let(:trace_response) { { "successes" => [{ "id" => "trace-123" }] } } + let(:span_response) { { "successes" => [{ "id" => "span-123" }] } } + let(:generation_response) { { "successes" => [{ "id" => "gen-123" }] } } + + before do + LlmEvalRuby.configure do |config| + config.adapter = :langfuse + config.langfuse_options = default_langfuse_options + end + end + + describe "#initialize" do + context "with custom client" do + it "uses the provided client" do + tracer = described_class.new(adapter: :langfuse, client: custom_client) + expect(tracer.adapter).to be_a(LlmEvalRuby::TraceAdapters::Langfuse) + end + end + + context "without custom client" do + it "creates a default client from config" do + tracer = described_class.new(adapter: :langfuse) + expect(tracer.adapter).to be_a(LlmEvalRuby::TraceAdapters::Langfuse) + end + end + + context "with local adapter" do + it "uses Local adapter class" do + tracer = described_class.new(adapter: :local) + expect(tracer.adapter).to eq(LlmEvalRuby::TraceAdapters::Local) + end + end + end + + describe "#trace" do + context "with custom client" do + it "uses the custom client to create trace" do + allow(custom_client).to receive(:create_trace).and_return(trace_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + result = tracer.trace(name: "test_trace", input: { query: "test" }) + + expect(custom_client).to have_received(:create_trace) + expect(result).to be_a(LlmEvalRuby::TraceTypes::Trace) + expect(result.name).to eq("test_trace") + end + end + end + + describe "#span" do + context "with custom client" do + it "uses the custom client to create span" do + allow(custom_client).to receive(:create_span).and_return(span_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + result = tracer.span(name: "test_span", trace_id: "trace-123", input: { data: "test" }) + + expect(custom_client).to have_received(:create_span) + expect(result).to be_a(LlmEvalRuby::TraceTypes::Span) + expect(result.name).to eq("test_span") + end + + it "updates span when block is given" do + allow(custom_client).to receive(:create_span).and_return(span_response) + allow(custom_client).to receive(:update_span).and_return(span_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + result = tracer.span(name: "test_span", trace_id: "trace-123", input: { data: "test" }) do + "block_result" + end + + expect(custom_client).to have_received(:create_span) + expect(custom_client).to have_received(:update_span) + expect(result).to eq("block_result") + end + end + end + + describe "#generation" do + context "with custom client" do + it "uses the custom client to create generation" do + allow(custom_client).to receive(:create_generation).and_return(generation_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + result = tracer.generation( + name: "test_generation", + trace_id: "trace-123", + input: { prompt: "test" }, + model: "gpt-4" + ) + + expect(custom_client).to have_received(:create_generation) + expect(result).to be_a(LlmEvalRuby::TraceTypes::Generation) + expect(result.name).to eq("test_generation") + end + + it "updates generation when block is given" do + allow(custom_client).to receive(:create_generation).and_return(generation_response) + allow(custom_client).to receive(:update_generation).and_return(generation_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + llm_response = { + "choices" => [{ "message" => { "content" => "AI response" } }], + "usage" => { "prompt_tokens" => 10, "completion_tokens" => 20 } + } + result = tracer.generation( + name: "test_generation", + trace_id: "trace-123", + input: { prompt: "test" }, + model: "gpt-4" + ) { llm_response } + + expect(custom_client).to have_received(:create_generation) + expect(custom_client).to have_received(:update_generation) + expect(result).to eq(llm_response) + end + end + end + + describe "#update_generation" do + context "with custom client" do + it "uses the custom client to update generation" do + allow(custom_client).to receive(:update_generation).and_return(generation_response) + + tracer = described_class.new(adapter: :langfuse, client: custom_client) + result = tracer.update_generation( + id: "gen-123", + output: { response: "result" }, + usage: { prompt_tokens: 10, completion_tokens: 20 } + ) + + expect(custom_client).to have_received(:update_generation) + expect(result).to be_a(LlmEvalRuby::TraceTypes::Generation) + end + end + end + + describe "class methods" do + it "delegates to instance with default adapter" do + allow_any_instance_of(LlmEvalRuby::ApiClients::Langfuse) + .to receive(:create_trace).and_return(trace_response) + + result = described_class.trace(name: "class_method_trace", input: { query: "test" }) + + expect(result).to be_a(LlmEvalRuby::TraceTypes::Trace) + expect(result.name).to eq("class_method_trace") + end + end +end