This guide will get you up and running with the LetheAISharp LLMEngine in just a few minutes.
- Backend Server: You need a running LLM backend server. Popular options:
- KoboldCpp (heavily recommended)
- LM Studio
- Text Generation WebUI
You can also use the integrated backend, LLamaSharp. In that case replace the URL field by the path toward the GGUF model. But for this demonstration, just use KoboldCpp, it's a reliable backend.
- Model: Any instruction tuned model in the GGUF format will do. You can use Qwen 3.0 14B for instance. Load the model in your backend server (KoboldCpp).
If you RAM allows, put all layers on your GPU (GPU Layers = 255 will put everything on it). Enable Flash Attention for faster responses. And set Context Size to something like 16K (it really depends on available VRAM. You may need 8K if things don't load, or get too slow). Do NOT use context shifts (it tends to conflict with Lethe AI). For the other configs, check KoboldCpp docs, but defaults should work just fine.
- API Access: Ensure the API is enabled and note the port number
Enable Multiuser and Websearch to "unlock" some of Lethe AI's advanced functions.
using LetheAISharp.LLM;
// Connect to your backend (adjust URL/port as needed)
LLMEngine.Setup("http://localhost:5001", BackendAPI.KoboldAPI);
await LLMEngine.Connect();
// Verify connection
if (LLMEngine.Status == SystemStatus.Ready)
{
Console.WriteLine($"✅ Connected to {LLMEngine.CurrentModel}");
}// Non-streaming query
var builder = LLMEngine.GetPromptBuilder();
builder.AddMessage(AuthorRole.System, "You are a helpful assistant.");
builder.AddMessage(AuthorRole.User, "What is artificial intelligence?");
var query = builder.PromptToQuery(AuthorRole.Assistant);
var response = await LLMEngine.SimpleQuery(query);
Console.WriteLine(response);
// Streaming query with real-time output
LLMEngine.OnInferenceStreamed += (_, token) => Console.Write(token);
var streamBuilder = LLMEngine.GetPromptBuilder();
streamBuilder.AddMessage(AuthorRole.System, "You are a helpful assistant.");
streamBuilder.AddMessage(AuthorRole.User, "Write a haiku about programming.");
var streamQuery = streamBuilder.PromptToQuery(AuthorRole.Assistant);
await LLMEngine.SimpleQueryStreaming(streamQuery);Here's a minimal working chat application:
using LetheAISharp.LLM;
using LetheAISharp.Files;
class Program
{
static async Task Main()
{
// Setup
LLMEngine.Setup("http://localhost:5001", BackendAPI.KoboldAPI);
await LLMEngine.Connect();
if (LLMEngine.Status != SystemStatus.Ready)
{
Console.WriteLine("Failed to connect to LLM backend");
return;
}
// Create persona
LLMEngine.Bot = new BasePersona
{
Name = "ChatBot",
Bio = "A friendly AI assistant",
IsUser = false
};
// Technically you should also set LLMEngine.Instruct to the correct instruction template for your model.
// The default is Alpaca wwhich will probably work anyway, but will give less good results. We're just skipping
// this part for demonstration purposes.
// Setup events
LLMEngine.OnInferenceStreamed += (_, token) => Console.Write(token);
LLMEngine.OnStatusChanged += (_, status) =>
{
if (status == SystemStatus.Busy) Console.Write("Bot: ");
};
LLMEngine.OnInferenceEnded += (_, response) =>
{
Console.WriteLine("\n");
LLMEngine.History.LogMessage(AuthorRole.Assistant, response, user, bot);
}
// Chat loop
Console.WriteLine("Chat started! Type 'quit' to exit.");
while (true)
{
Console.Write("You: ");
var input = Console.ReadLine();
if (input == "quit") break;
await LLMEngine.SendMessageToBot(AuthorRole.User, input);
// Wait for response
while (LLMEngine.Status == SystemStatus.Busy)
await Task.Delay(50);
}
// End chat session and save log
LLMEngine.Bot.EndChat();
}
}Connection Failed:
- Verify your backend server is running
- Check the URL and port number
- Ensure API is enabled in your backend
Empty Responses:
- Confirm a model is loaded in your backend
- Check if the model supports your prompt format
Slow Responses:
- Use streaming (
SimpleQueryStreamingor full communication mode) - Check your model size vs available resources
- Explore the complete documentation
- Try the examples
- Customize personas and conversation flow
- Add RAG and web search capabilities
# Download and run KoboldCpp with your model
./koboldcpp.exe --model your-model.gguf --port 5001 --api- Load a model in LM Studio
- Go to "Local Server" tab
- Start server (usually port 1234)
- Use:
LLMEngine.Setup("http://localhost:1234", BackendAPI.OpenAI);
- Launch with
--apiflag - Load a model
- Note the port (usually 5000)
- Use:
LLMEngine.Setup("http://localhost:5000", BackendAPI.OpenAI);