Quick Start Guide

This guide will get you up and running with the LetheAISharp LLMEngine in just a few minutes.

Prerequisites

Backend Server: You need a running LLM backend server. Popular options:
- KoboldCpp (heavily recommended)
- LM Studio
- Text Generation WebUI

You can also use the integrated backend, LLamaSharp. In that case replace the URL field by the path toward the GGUF model. But for this demonstration, just use KoboldCpp, it's a reliable backend.

Model: Any instruction tuned model in the GGUF format will do. You can use Qwen 3.0 14B for instance. Load the model in your backend server (KoboldCpp).

If you RAM allows, put all layers on your GPU (GPU Layers = 255 will put everything on it). Enable Flash Attention for faster responses. And set Context Size to something like 16K (it really depends on available VRAM. You may need 8K if things don't load, or get too slow). Do NOT use context shifts (it tends to conflict with Lethe AI). For the other configs, check KoboldCpp docs, but defaults should work just fine.

API Access: Ensure the API is enabled and note the port number

Enable Multiuser and Websearch to "unlock" some of Lethe AI's advanced functions.

5-Minute Setup

Step 1: Basic Connection

using LetheAISharp.LLM;

// Connect to your backend (adjust URL/port as needed)
LLMEngine.Setup("http://localhost:5001", BackendAPI.KoboldAPI);
await LLMEngine.Connect();

// Verify connection
if (LLMEngine.Status == SystemStatus.Ready)
{
    Console.WriteLine($"✅ Connected to {LLMEngine.CurrentModel}");
}

Step 2: Simple Text Generation

// Non-streaming query
var builder = LLMEngine.GetPromptBuilder();
builder.AddMessage(AuthorRole.System, "You are a helpful assistant.");
builder.AddMessage(AuthorRole.User, "What is artificial intelligence?");
var query = builder.PromptToQuery(AuthorRole.Assistant);
var response = await LLMEngine.SimpleQuery(query);
Console.WriteLine(response);

// Streaming query with real-time output
LLMEngine.OnInferenceStreamed += (_, token) => Console.Write(token);

var streamBuilder = LLMEngine.GetPromptBuilder();
streamBuilder.AddMessage(AuthorRole.System, "You are a helpful assistant.");
streamBuilder.AddMessage(AuthorRole.User, "Write a haiku about programming.");
var streamQuery = streamBuilder.PromptToQuery(AuthorRole.Assistant);
await LLMEngine.SimpleQueryStreaming(streamQuery);

Full Chat Mode

Here's a minimal working chat application:

using LetheAISharp.LLM;
using LetheAISharp.Files;

class Program
{
    static async Task Main()
    {
        // Setup
        LLMEngine.Setup("http://localhost:5001", BackendAPI.KoboldAPI);
        await LLMEngine.Connect();
        
        if (LLMEngine.Status != SystemStatus.Ready)
        {
            Console.WriteLine("Failed to connect to LLM backend");
            return;
        }
        
        // Create persona
        LLMEngine.Bot = new BasePersona
        {
            Name = "ChatBot",
            Bio = "A friendly AI assistant",
            IsUser = false
        };

        // Technically you should also set LLMEngine.Instruct to the correct instruction template for your model.
        // The default is Alpaca wwhich will probably work anyway, but will give less good results. We're just skipping
        // this part for demonstration purposes.
       
        // Setup events
        LLMEngine.OnInferenceStreamed += (_, token) => Console.Write(token);
        LLMEngine.OnStatusChanged += (_, status) => 
        {
            if (status == SystemStatus.Busy) Console.Write("Bot: ");
        };

        LLMEngine.OnInferenceEnded += (_, response) => 
        {
            Console.WriteLine("\n");
            LLMEngine.History.LogMessage(AuthorRole.Assistant, response, user, bot);
        }
        
        // Chat loop
        Console.WriteLine("Chat started! Type 'quit' to exit.");
        
        while (true)
        {
            Console.Write("You: ");
            var input = Console.ReadLine();
            
            if (input == "quit") break;
            
            await LLMEngine.SendMessageToBot(AuthorRole.User, input);
            
            // Wait for response
            while (LLMEngine.Status == SystemStatus.Busy)
                await Task.Delay(50);
        }
        // End chat session and save log
        LLMEngine.Bot.EndChat();

    }
}

Common Issues

Connection Failed:

Verify your backend server is running
Check the URL and port number
Ensure API is enabled in your backend

Empty Responses:

Confirm a model is loaded in your backend
Check if the model supports your prompt format

Slow Responses:

Use streaming (SimpleQueryStreaming or full communication mode)
Check your model size vs available resources

Next Steps

Explore the complete documentation
Try the examples
Customize personas and conversation flow
Add RAG and web search capabilities

Backend-Specific Setup

KoboldCpp (Recommended)

# Download and run KoboldCpp with your model
./koboldcpp.exe --model your-model.gguf --port 5001 --api

LM Studio

Load a model in LM Studio
Go to "Local Server" tab
Start server (usually port 1234)
Use: LLMEngine.Setup("http://localhost:1234", BackendAPI.OpenAI);

Text Generation WebUI

Launch with --api flag
Load a model
Note the port (usually 5000)
Use: LLMEngine.Setup("http://localhost:5000", BackendAPI.OpenAI);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Prerequisites

5-Minute Setup

Step 1: Basic Connection

Step 2: Simple Text Generation

Full Chat Mode

Common Issues

Next Steps

Backend-Specific Setup

KoboldCpp (Recommended)

LM Studio

Text Generation WebUI

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Quick Start Guide

Prerequisites

5-Minute Setup

Step 1: Basic Connection

Step 2: Simple Text Generation

Full Chat Mode

Common Issues

Next Steps

Backend-Specific Setup

KoboldCpp (Recommended)

LM Studio

Text Generation WebUI